Signing GitHub Commits With A Passphrase-protected Key and GPG2

GitHub recently added support for signed commits. The instructions for setting it up can be found on their website and I do not intend to rehash them here. I followed those instructions and they work splendidly. However, when I set mine up, I had used the version of GPG that came with my Git installation. A side effect I noticed was that if I were rebasing some code and wanted to make sure the rebased commits were still signed (by running git rebase with the -S option), I would have to enter my passphrase for the GPG key for every commit (which gets a little tedious after the first five or so).

Shows some commits on GitHub with the Verified indicator showing those that have been signed
How GitHub shows signed commits

Now, there are a couple of ways to fix this. One is easy; just don't use a passphrase protected key. Of course, that would make it a lot easier for someone to sign commits as me if they got my key file, so I decided that probably was not the best option. Instead, I did a little searching and found that GPG2 supports passphrase protected keys a little better than the version of GPG I had installed as part of my original git installation.

Using the GPG4Win website, I installed the Vanilla version1. I then had to export the key I had already setup with GitHub from my old GPG and import it into the new. Using gpg --list-keys, I obtained the 8 character ID for my key (the bit that reads BAADF00D in this example output):

gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
/c/Users/Jeff/.gnupg/pubring.gpg
--------------------------------
pub   4096R/BAADF00D 2016-04-07
uid                  Jeff Yates <jeff.yates@example.com>
sub   4096R/DEADBEEF 2016-04-07

Which I then used to export my keys from a Git prompt:

gpg -a --export-secret-keys BAADF00D > privatekey.txt
gpg -a --export BAADF00D > publickey.txt

This gave me two files (privatekey.txt and publickey.txt) containing text representations of the private and public keys.

Using a shell in the GPG2 pub folder ("C:\Program Files (x86)\GNU\GnuPG\pub"), I then verified them (always a good practice, especially if you got the key from someone else) before importing them2:

> gpg privatekey.txt

And rather than give me details of the key, it showed me this error:

gpg: no valid OpenPGP data found.
gpg: processing message failed: Unknown system error

What was going on? I tried verifying it with the old GPG and it gave me a different but similar error:

gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
gpg: no valid OpenPGP data found.
gpg: processing message failed: eof

I tried the public key export and it too gave these errors. It did not make a whole heap of sense. Trying to get to the bottom of it, I opened the key files in Visual Studio Code. Everything looked fine until I saw this at the bottom of the screen.

Encoding information from Visual Studio Code showing UTF16
Encoding information from Visual Studio Code

It turns out that Powershell writes redirected output as UTF-16 and I had not bothered to check. Thinking this might be the problem, I resaved each file as UTF-8 and tried verifying privatekey.txt again:

sec  4096R/BAADF00D 2016-04-07
uid                            Jeff Yates <jeff.yates@example.com>
ssb  4096R/DEADBEEF 2016-04-07

Success! Repeating this for the publickey.txt file gave the exact same information. With the keys verified, I was ready to import them into GPG2:

> gpg --import publickey.txt
gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
gpg: key BAADF00D: public key "Jeff Yates <jeff.yates@example.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
> gpg --import privatekey.txt
gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
gpg: key BAADF00D: secret key imported
gpg: key BAADF00D: "Jeff Yates <jeff.yates@example.com>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
gpg:       secret keys read: 1
gpg:   secret keys imported: 1

With the keys imported, I ran gpg --list-keys to verify they were there and then made sure to delete the text files.

Finally, to make sure that Git used the new GPG2 instead of the version of GPG that it came with, I edited my Git configuration:

> git config --global gpg.program "C:\Program Files (x86)\GNU\GnuPG\pub\gpg.exe"

Now, when I sign commits and rebases, instead of needing to enter my passphrase for each commit, I am prompted for the passphrase once. Lovely.

  1. You could also look at installing the command line tools from https://www.gnupg.org/download/ though I do not know if the results will be the same []
  2. Note that I am not showing the path to the file here for the sake of brevity, though I am sure you get the idea that you'll need to provide it []

Octokit, Merge Commits, and the Story So Far

In the last post we had reduced our commits by matching them against pull requests; next, we can look for noise in the commit message content itself. Although I have been using the Octokit.NET repository as the target for testing with its low noise, high quality commit messages, we can envisage a less consistent repository that has some noisy commits. For example, how often have you seen or written commit messages like "Fixed spelling", "Fixed bug", or "Stuff"1?

How we detect these noisy commits is important; if our filtering is too simple, we remove too many things and if it is too strict, we remove too few. Rather than go deep into one specific implementation, I just want to introduce the idea of filtering based on message content. In the long term, I think it would be interesting to apply learning algorithms,  but I'm sure some simple, configurable pattern matching should suffice2.

If I run the filtering I have described so far3 on the Octokit.NET latest release, this is what we get:

Fix the credit format 
Release notes for release 0.17.0 
Merge pull request #972 from naveensrinivasan/json-serialization

Json serialization for Unicode 
Merge pull request #976 from octokit/elbaloo-better-merge-exception-rebased

better merge exception rebased 
Merge pull request #973 from naveensrinivasan/appveyornuget

Generate nuget packages on appveyor 
Merge pull request #917 from alfhenrik/feature-webhookhelper

Add helper class for creating web hooks 
Merge pull request #807 from octokit/codeformatter

added a tailored CodeFormatter to Octokit 
Merge pull request #956 from octokit/vs2015-support

VS2015 migration 
Merge pull request #921 from naveensrinivasan/samples

Adds octokit samples 
Merge branch 'gitignore-exception' 
Merge pull request #918 from willsb/download-timeout

Adds overloads to GetArchive for adding custom timeouts 
Merge pull request #957 from octokit/clean-up-some-fixes

clean up some pending PRs 
Merge pull request #943 from naveensrinivasan/AssetDownload

Fixes for Downloading ReleaseAsset zip File 
Merge pull request #942 from alfhenrik/bug-repohasissues

Make NewRepository.HasIssues nullable as it's optional 
Merge pull request #940 from naveensrinivasan/build-sh

Created build.sh 
Merge pull request #929 from elbaloo/issue-389

Add .com links to PrivateRepositoryQuotaExceededException 
Merge pull request #927 from naveensrinivasan/octokit-logo

Updated with the logo 
Merge pull request #922 from naveensrinivasan/fixes-for-fake-warning

Fixes for FAKE Xunit warning 
Merge pull request #919 from adamralph/system-framework-assembly

add System to required framework assemblies for net45 
Merge pull request #909 from willsb/disposable-repositories

Disposable repositories 
Merge pull request #916 from octokit/consolidate-committer-info

Consolidate committer info 
Merge pull request #915 from octokit/docs

Add a bunch of XML doc comments 
Merge pull request #907 from naveensrinivasan/encodedcontent-public-#861

Making Encodedcontent public #861 
Merge pull request #908 from khellang/clarify-failing-convention-tests

Clarify why convention tests are failing 
Merge pull request #906 from naveensrinivasan/update-readme

Updated the readme with reactive octokit. 
Merge pull request #903 from willsb/commit-committer

Changes GitHubCommit.Author/Committer 
Merge pull request #902 from naveensrinivasan/build-mono

Build fix for Xamarin Studio Solution 
Merge pull request #901 from alfhenrik/feature-issueeventsurl#885

Add Events URL to the Issue class. 
Merge pull request #900 from alfhenrik/update-testtargetnames-in-docs

Updated test target names in the shipping releases doc 
Merge pull request #898 from octokit/release

Release of v0.16 - ironic ties 
better merge exception rebased 
Generate nuget packages on appveyor  
Json serialization for Unicode 
Add helper class for creating web hooks 
added a tailored CodeFormatter to Octokit 
VS2015 migration 
clean up some pending PRs 
Fixes for Downloading ReleaseAsset zip File  
Adds overloads to GetArchive for adding custom timeouts  
Make NewRepository.HasIssues nullable as it's optional 
Created build.sh 
Gitignore exception 
Add .com links to PrivateRepositoryQuotaExceededException 
Updated with the logo 
Adds octokit samples 
Fixes for FAKE Xunit warning 
add System to required framework assemblies for net45 
Disposable repositories 
Consolidate committer info 
Add a bunch of XML doc comments 
Making Encodedcontent public #861 
Clarify why convention tests are failing 
Updated the readme with reactive octokit. 
Changes GitHubCommit.Author/Committer 
Build fix for Xamarin Studio Solution 
Add Events URL to the Issue class. 
Updated test target names in the shipping releases doc 
Release of v0.16 - ironic ties 

The value of this is clearer if we see the commit list before processing:

Fix the credit format 
Release notes for release 0.17.0 
Merge pull request #972 from naveensrinivasan/json-serialization

Json serialization for Unicode 
Merge pull request #976 from octokit/elbaloo-better-merge-exception-rebased

better merge exception rebased 
Merge branch 'better-merge-exception-rebased' of https://github.com/elbaloo/octokit.net into elbaloo-better-merge-exception-rebased 
Merge pull request #973 from naveensrinivasan/appveyornuget

Generate nuget packages on appveyor 
The test targets were deleting the nuget packages

The test targets were deleting the nuget packages so had to include the
CreatePackages at the end. 
Removed the disable on PR. 
Create packages in turn calls build app

Create packages in turn calls build app so no need to call it. 
appveyor nuget packages

appveyor nuget packages 
Checked for the serialized data

Compared if the serialized data has what was expected. Not just
deserialized data. 
Tests for Unicode character serialization

Tests for Unicode character serialization 
Fixes for json serialization bug

Fixes for json serialization issue when unicode is present. 
Merge pull request #917 from alfhenrik/feature-webhookhelper

Add helper class for creating web hooks 
A bit of code cleanup 
Add unit test to ensure correct message is returned when duplicate keys exists. 
Throw exception with helpful message if duplicate webhook config values exists. 
Fix up XML comments as per PR review 
Conform NewRepositoryWebHook to new request model guidelines 
Update existing integration test to use new web hook helper class 
Add unit tests 
Add helper class to create a web hook.

Fixes octokit/octokit.net#914 
Merge pull request #807 from octokit/codeformatter

added a tailored CodeFormatter to Octokit 
aaaand format the code 
skip unicode character editing 
format the code in the script 
local install of code formatter 
Merge pull request #956 from octokit/vs2015-support

VS2015 migration 
Merge pull request #921 from naveensrinivasan/samples

Adds octokit samples 
Merge branch 'gitignore-exception' 
Merge pull request #918 from willsb/download-timeout

Adds overloads to GetArchive for adding custom timeouts 
a bit more cleanup of the README 
one more malformed xml-docs tag 
Merge branch 'master' into vs2015-support 
Merge pull request #957 from octokit/clean-up-some-fixes

clean up some pending PRs 
address feedback 
added tests for the merged qualifier 
added "Merged" in searchissues which allows search repos by merged date with existing syntax.
it generates a CA1502 code excessive complexity warning and i suppressed it. 
Fixed the problem in the constructor. 
run build fixproject 
Added NewArbitraryMarkdown class, RenderArbitraryMakrdown method and unit tests for it. 
added assignee property to pull request. 
tidy up some xml-docs while i'm in here 
actually some real errors 
just suppressing some warnings, nbd 
update README to indicate we're using VS2015 
update the target to use netcore451 
bump the ToolsVersion 
bump to netcore451 
tweak ignore file 
update to the latest MSBuild scripts 
Merge branch 'master' into better-merge-exception-rebased 
Merge pull request #943 from naveensrinivasan/AssetDownload

Fixes for Downloading ReleaseAsset zip File 
Fixed the spacing

Fixed the spacing of comma and aligned the arguments. 
Fixes for Downloading ReleaseAsset zip File #854

This commit  addressed the `BuildResponse`  wasn't handling
response `content-type` `application/octet-stream` for binary items. 
Merge branch 'master' into download-timeout 
Make new merge exceptions inherit from 'Octokit.ApiException'
Affect 'Octokit.PullRequestNotMergeableException'
and 'Octokit.PullRequestMismatchException' 
Merge pull request #942 from alfhenrik/bug-repohasissues

Make NewRepository.HasIssues nullable as it's optional 
Make HasIssues nullable as it's optional 
Merge pull request #940 from naveensrinivasan/build-sh

Created build.sh 
Created build.sh

Included build.sh to build form non-windows 
:poop:

brainfart 
Add tests for merge exceptions to PullRequestsClientTests 
Add System.Net namespace used to check for HttpStatusCode in PullRequestClient.Task<PullRequestMerge> Merge(string, string, int, MergePullRequest) 
sketching out the exception necessary when raising specific merge exceptions 
Changes the way the exception is verified 
Merge pull request #929 from elbaloo/issue-389

Add .com links to PrivateRepositoryQuotaExceededException 
Add .com links to PrivateRepositoryQuotaExceededException

Add following links:
- 'Deleting a repository' at https://help.github.com/articles/deleting-a-repository/
- 'What plan should I use?' at https://help.github.com/articles/what-plan-should-i-choose/ 
Merge pull request #927 from naveensrinivasan/octokit-logo

Updated with the logo 
Changed the octokit logo to smaller size 
Updated with the logo

Updated it with the logo 
Validate Linqpad Samples as part of CI

Validates Linqpad Samples as part of CI for every commit. 
Removed the integration test options

Removed the integration test options because lprun has compileonly
option. 
The nuget package includes the samples

This will include the samples in the nuget package. 
Throwing proper exception on RepositoresClient 
Merge pull request #922 from naveensrinivasan/fixes-for-fake-warning

Fixes for FAKE Xunit warning 
Merge remote-tracking branch 'origin/fixes-for-fake-warning' into fixes-for-fake-warning

Conflicts:
  build.fsx 
Fixes for fake warning

Fixes for the FAKE warning 
Adds InvalidGitIgnoreTemplateException 
Fixes for fake warning

Fixes for the FAKE warning 
Including LINQPad.exe

Including LINQPad.exe to compile the samples after every commit 
Fixed the command line args

Fixed the args parameter to compile using lprun.exe 
linqpad samples

Linqpad samples 
Removes integer overload

Plus extra ensures 
Merge pull request #919 from adamralph/system-framework-assembly

add System to required framework assemblies for net45 
add System to required framework assemblies for net45 
Adds overloads for adding custom timeouts 
Merge pull request #909 from willsb/disposable-repositories

Disposable repositories 
Merge pull request #916 from octokit/consolidate-committer-info

Consolidate committer info 
Merge remote-tracking branch 'octokit/master' into disposable-repositories

Conflicts:
  Octokit.Tests.Integration/Clients/DeploymentStatusClientTests.cs
  Octokit.Tests.Integration/Clients/DeploymentsClientTests.cs
  Octokit.Tests.Integration/Clients/PullRequestsClientTests.cs 
Refactors the remaining test classes 
Add doc comments for Author and Committer 
Move Committer into Common folder

This object is used both in requests and responses. 
Add a README for model objects 
Replace SignatureResponse and CommitEntity with Committer

A recent PR added CommitEntity but we already had
SignatureResponse expressly for this purpose.

So this commit renames SignatureResponse to Committer
and removes CommitEntity and replaces it with Committer. 
Merge pull request #915 from octokit/docs

Add a bunch of XML doc comments 
Add this PR number for these fixes

So meta! 
Add Description to OrganizationUpdate 
Add Before property to NotificationsRequest 
Added Description property to NewTeam

Teams can have descriptions! 
Added Content property to NewTreeItem 
Add a bunch of doc comments

We get a lot of build output because of missing XML comments that we
ignore. I'd like to stop ignoring them. To do that, we need to doc the
:poop: out of everything. 
Deployment state is required for deployment status

Breaking change. This constructor parameter is now required. 
Add missing properties to NewDeployment

Added `RequiredContexts`, `Environment`, and `Task` parameters. Removed
the obsolete `Force` parameter.
Also made ref a required constructor parameter. This is a breaking
change. 
Add the ability to create a readonly deploy key 
Rename Message to CommitMessage

According to the docs
(https://developer.github.com/v3/pulls/#merge-a-pull-request-merge-button),
this should be sent as "commit_message" thus we need to name it
`CommitMessage`
Fixes #913 
Refactors tests up to PullRequestsClientTests 
Adds common properties to RepositoryContext

A lot of classes use the name and the owner of the repository, so for
consistency I added those as properties of the Context 
Refactors a whole bunch of tests 
Refactors AssigneesClient and CommitsClient tests 
Refactors BranchesClientTests 
Refactors StatisticsClient 
Refactors GithubClient and RepositoryContents 
Merge pull request #907 from naveensrinivasan/encodedcontent-public-#861

Making Encodedcontent public #861 
Refactors RepositoriesClientTests

Changes the tests in RepositoriesClientTests to use the new using block
syntax 
RepositoryContext class and Extension methods 
fix for making the setter private

fix for making the setter private 
Merge pull request #908 from khellang/clarify-failing-convention-tests

Clarify why convention tests are failing 
Clarify why convention tests are failing 
Making EncodedContent public

Making EncodedContent public to get the raw bytes of a file. #861 
Merge branch 'octokit/master' of https://github.com/naveensrinivasan/octokit.net into octokit/master 
Merge pull request #906 from naveensrinivasan/update-readme

Updated the readme with reactive octokit. 
Update read with reactive octokit.

Updated the readme to include the nuget reference to Octokit.Reactive 
Merge pull request #903 from willsb/commit-committer

Changes GitHubCommit.Author/Committer 
Merge pull request #902 from naveensrinivasan/build-mono

Build fix for Xamarin Studio Solution 
Merge pull request #901 from alfhenrik/feature-issueeventsurl#885

Add Events URL to the Issue class. 
Makes integrations tests happy 
Build fix for Xamarin Studio Solution

Build fix for Xamarin Studio Solution 
Creates CommitEntity for GitHubCommit

Creates the entity that corresponds to the actual payload returned by
the server to represent the Author and Committer of a commit 
Merge pull request #900 from alfhenrik/update-testtargetnames-in-docs

Updated test target names in the shipping releases doc 
Add Events URL to the Issue class. 
Update the names of the test targets 
Merge branch 'master' into octokit/master 
Merge pull request #898 from octokit/release

Release of v0.16 - ironic ties 
Update FAKE and SourceLink 

The work so far has reduced a list of 135 commits down to 58, and so far, it looks like we have not lost any really useful "release note"-worthy information. However, the eagle-eyed among you may noticed that our 58 messages contain duplicate information. This is because each pull request is listed twice; once for the pull request title I inserted in place of its individual commits, and again for the merge commit that merged that pull request. These merge commits are not filtered out because they do not belong to the commits inside the pull request. Instead, they are an artifact of merging the pull request4.

At first, I thought the handy `MergeCommitSha` property of the pull request would help, but it turns out this refers to a test merge and is to be deprecated5. Instead, I realised that the messages I wanted to remove all had "Merge pull request #" in them, followed by the pull request number. This seems like a perfect use case for our pattern matching filtering. Since we have the pull requests, we could use their numbers to match each merge message exactly, but I decided to do the simpler thing of excluding any message starting with "Merge pull request #".

Filtering for messages that begin with "Merge pull request #" gives us a shortlist of just 31 messages:

Fix the credit format 
Release notes for release 0.17.0 
Merge branch 'gitignore-exception' 
better merge exception rebased 
Generate nuget packages on appveyor  
Json serialization for Unicode 
Add helper class for creating web hooks 
added a tailored CodeFormatter to Octokit 
VS2015 migration 
clean up some pending PRs 
Fixes for Downloading ReleaseAsset zip File  
Adds overloads to GetArchive for adding custom timeouts  
Make NewRepository.HasIssues nullable as it's optional 
Created build.sh 
Gitignore exception 
Add .com links to PrivateRepositoryQuotaExceededException 
Updated with the logo 
Adds octokit samples 
Fixes for FAKE Xunit warning 
add System to required framework assemblies for net45 
Disposable repositories 
Consolidate committer info 
Add a bunch of XML doc comments 
Making Encodedcontent public #861 
Clarify why convention tests are failing 
Updated the readme with reactive octokit. 
Changes GitHubCommit.Author/Committer 
Build fix for Xamarin Studio Solution 
Add Events URL to the Issue class. 
Updated test target names in the shipping releases doc 
Release of v0.16 - ironic ties 

I think this is a pretty good improvement over the raw commit list. Combining this list with links back to the relevant commits and pull requests should enable someone to discern the content of a release note much faster than using the raw commit list alone. I will leave that as an exercise or perhaps a future post. As always, thanks for reading. If you find yourself using Octokit to trawl your own repositories for release note information, I would love to hear about it in the comments.

  1. We're all friends here, you can admit it []
  2. The filtering should be configurable so that we can tailor it to the repository we are processing []
  3. excluding the last step of filtering by message content []
  4. Perhaps stating the obvious []
  5. https://developer.github.com/v3/pulls/ []

Octokit and Noise Reduction with Pull Requests

Last time in this series on Octokit we looked at how to get the commits that have been made between one release and another. Usually, these commits will contain noise such as lazy commit messages and merge flog ("Fixed it", "Corrected spelling", etc.), merge commits, or commits that formed part of a larger feature change submitted via pull request. Rather than include all this noise in our release note generation, I want to filter those commits and either remove them entirely, or replace them with their associated pull request (which hopefully will be a little less noisy).

Before we filter out the noise, it seems prudent to reduce the commits to be filtered by matching them to pull requests. As with commits, we can query pull requests using a specific set of criteria; however, though we can request the results be sorted a certain way, we cannot specify a date range. To get all the pull requests that were merged before our release, we need to query for all the pull requests and then filter by date locally.

This query can be slow, since we are getting all closed pull requests in the repository. We could speed it up by providing a base branch name in the query criteria. However, to remove as much commit noise as possible, I would like to include pull requests that were merged to a different branch besides just the release branch1. We could make things more performant by managing a list of active release branches and then querying pull requests for each of those branches only rather than the entire repository, but for now, we will stick with the less optimal approach as it keeps the code examples a little cleaner.

var prRequest = new PullRequestRequest
{
    State = ItemState.Closed,
    SortDirection = SortDirection.Descending,
    SortProperty = PullRequestSort.Updated
};

var pullRequests = await gitHubClient.PullRequest.GetAllForRepository("RepositoryOwner", "RepositoryName", prRequest);
var pullRequestsPriorToRelease = pullRequests
    .Where(pr => pr.MergedAt < mostRecentTwoReleases[0].CreatedAt);

Before we can start filtering our commits against the pull requests, we need to get the commits that comprise each pull request. When requesting a collection of items (like we did for pull requests), the GitHub API returns just enough information about each item so that we can filter and identify the ones we really care about. Before we can do things with other properties on the items, we have to request additional information. More information on each pull request can be obtained about a specific pull request by using the `Get`, `Commits`, `Files`, and `Merged` calls. The `Get` call returns the same type of objects as the `GetAllForRepository` method, except that all the data is now populated instead of just a few select properties; the `Merged` call returns a Boolean value indicating if the PR has been merged (equivalent to the `Merged` property populated by `Get`); the `Files` method returns the files changed by that pull request; and the `Commits` method returns the commits.

var commitsForPullRequest = await gitHubClient.PullRequest.Commits("RepositoryOwner", "RepositoryName", pullRequest.Number);

At this point, things are looking pretty good: we can get a list of commits in the release and a list of pull requests that might be in the release. Now, we want to filter that list of commits to remove items that are covered by a pull request. This is easy; we just compare the hashes and remove the matches.

var commitsNotInPullRequest = from commit in commitsInRelease
                              join prCommit in prCommits on commit.Sha equals prCommit.Sha into matchedCommits
                              from match in matchedCommits.DefaultIfEmpty()
                              where match == null
                              select commit;

Using the collection of commits for the latest release, we join the commits from the pull requests using the SHA hash and then select all release commits that have no matching commit in the pull requests2. However, we don't want to lose information just because we're losing noise, so we have to maintain a list of the pull requests that were matched so that we can build our release note history. To keep track, we will hold off on discarding any information by pairing up commits in the release with their prospective pull requests instead of just dropping them.

Going back to where we had a list of pull requests merged prior to our release, let us revisit getting the commits for those pull requests and this time, pairing them with the commits in the release to retain information.

var commitsFromPullRequests = from pr in pullRequestsPriorToRelease
                              from commit in github.PullRequest.Commits("RepositoryOwner", "RepositoryName", pr.Number).Result
                              select new {commit,pr};

var commitsWithPRs = from commit in commitsInRelease
                     join prCommit in commitsFromPullRequests on commit.Sha equals prCommit.commit.Sha into matchedPrCommits
                     from matchedPrCommit in  matchedPrCommits.DefaultIfEmpty()
                     select new
                     {
                         PullRequest = match?.pr,
                         Commit = commit
                     };

Now we have a list of commits paired with their parent pull request, if there is one. Using this we can build a more meaningful set of changes for a release. If I run this on the latest release of the Octokit.NET repository and then group the commits by their paired pull request, I can see that the original list of 135 commits would be reduced to just 58 if each commit that belonged to a pull request were bundled into just one entry.

Next, we need to process the commits to remove those representing merges and other noise. These are things to discuss in the next post of this series where perhaps we will take stock and see whether this effort has been valuable in producing more meaningful release note generation. Until then, thanks for reading and don't forget to leave a comment.

  1. often changes are merged forward from one branch to another, especially if there are multiple release branches to support patch development and such []
  2. The `join` in this example is an outer join; we are taking the join results and using `DefaultIfEmpty()` to supply an empty collection when there was nothing to join []

Octokit and the Authenticated Access

Last week, I introduced Octokit and my plans to write a tool that will mine our GitHub repositories for information that can be used to craft release notes. This week, we will look at the first step; authentication. I am using Octokit.NET for my hackery; if you choose to use another variant of Octokit, some of the types and methods available may be different, but you should be able to follow along. In addition, I have no intention of documenting every aspect of Octokit and the GitHub API, so if you are intrigued by anything that I do not discuss, I encourage you to explore the relevant documentation.

The main `GitHubClient` class, used to access the GitHub APIs, has several constructors, some that take credentials (sort of) and some that do not. All but one of the constructors take a `ProductHeaderValue` instance, which provides some basic information about the application that is accessing the API. According to the documentation, this information is used by GitHub for analytics purposes and can be whatever you want.

Now, if you only want to read information about publicly accessible repositories, you do not need to provide any authentication at all. You can create a client instance and just get stuck in, like this:

var githubClient = new GitHubClient(new ProductHeaderValue("Tinkering"));
var repo = await githubClient.Repository.Get("octokit", "octokit.net" );
Console.WriteLine(repo.Name);

However, you can only perform some read-only tasks on public repositories and, unless you are performing the most trivial of tasks, you will hit rate limits for unauthenticated access.

NOTE: All of the Octokit.NET calls are awaitable

Authentication can be achieved in a several ways; via an implementation of `ICredentialStore` passed to a constructor of `GitHubClient`, by providing credentials to the `GitHubClient.Connection.Credentials` property, or by using the `GitHubClient.Oauth`. The `OAuth` API allows an application to authenticate without ever having access to a user's credentials; it is understandably a little more complex than approaches that just take credentials. Since, at this point, our focus is to craft some methods for extending the API functionality, we will worry about the `OAuth` workflow another time. The other two approaches are quite similar, although the constructor-based approach requires a little extra effort. The following two examples will both give you authenticated access, though I think the constructor-based access feels a little less hacky:

// Without the constructor
var githubClient = new GitHubClient(new ProductHeaderValue("tinkering"));
githubClient.Connection.Credentials = new Credentials("username", "password");
// With the constructor
public class CredentialsStore : ICredentialsStore
{
    public Task<Credentials> GetCredentials()
    {
        return Task.Run(() => new Credentials("username","password"));
    }
}

var githubClient = new GitHubClient(new ProductHeaderValue("tinkering"), new CredentialsStore());

Two-factor Authentication

Of course, using your username and password is futile because you have two-factor authentication enabled1. Luckily there is a constructor on the `Credentials` class that takes a token, which you can generate on GitHub.

First, log into your GitHub account and choose Settings from the drop-down at the upper-right. On the fight, select Personal Access Tokens.

The right-hand side will change to the list of personal access tokens you have already created for your account (you may have created these yourself or an application may have created them via OAuth). Click the Generate New Token button and give it a useful name. You can now use this token as your credentials when using Octokit. I keep my token in the LINQPad password manager2 so that I can reference it in my code using the name I gave it, like this:

Util.GetPassword("the.name.I.gave.my.oauth.token")

In conclusion…

And that is it for this week. In the next entry of this series on Octokit, we will start getting to grips with releases and some of the basic pieces for my release note utility library.

  1. If you do not, you should rectify that []
  2. The LINQPad password manager is available via the File menu in LINQPad []

Octokit and the Documentation Nightmare

Before I get into the meat of this series of posts, I would like to set the scene. Like many organisations that perform some level of software development these days, we use GitHub. Here at CareEvolution, some developers use the web interface extensively, some use the command line, and others use the GitHub desktop client1, but most use a combination of two or more, depending on the task. This works great for developers, who have each found a comfortable workflow for getting things done, but it is not so great for those involved with DevOps, QA, or documentation where there is a need to find out user-friendly details of what the developers did. Quite often, a feature or bug fix involves several commits and while each has a comment or two, and perhaps an associated pull request (PR) or issue has a general description, but there is no definitive list of "this is what release X contains" that can be presented to a customer. Not only that but sometimes a PR or issue is resolved in an earlier release and merged forward. While we have lists of what a release is going to include, quite often there is more detail that we would like to include, and we often have additional changes as we adapt to the changing requirements of our customers. All this means that one or more people end up trawling the commits, trying to determine what the changes are. It is not a happy task.

"There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things."

Niccolo Machiavelli
The Prince (1532)

Now, I know that this could all be avoided if people documented changes more clearly, perhaps added release notes to commits, raised issues for documentation changes, or created release notes on the release when it is made. However, no matter how noble change may be, anyone who has worked in process definition for any length of time will know that changing the behaviour of people is the hardest task of all, and therefore it should be avoided unless absolutely necessary. It was with that in mind that I decided mining the existing data for information would be an easier first step than jumping straight to asking people to change. So, with the aim of making life a little easier, I started looking at ways to automate the trawling.

I figured that by throwing out noisy and typical developer non-descriptive commits like "fixed spelling" or "updated comment", and by combining commits under the corresponding PR or issue, I could create useful summary of changes. This would not be customer-ready, but it would be ready for someone to turn into a release note without needing to trawl git history. In fact, if I included details of who committed the changes, it might even provide a feedback loop that would improve the quality of developer commit messages; developers do not like interruptions, so anyone asking for more detail on a commit they made should start to reinforce that if they wrote better commits, PRs, issues, they would get less interruptions.

Octokitty2

Octokit .NET logoAfter a dismissing using git locally to perform this task (I figured those who might need this tool would probably not want to get the repository locally) and reading up on the GitHub API a little, I cracked open LINQPad —my tool of choice for hacking— and went looking for a Nuget package to help. It was during that search that I happily stumbled on Octokit, the official GitHub library for interacting with the GitHub API. At the time of writing, Octokit reflects the polyglot nature of GitHub users, providing variants for Ruby, .NET, and Objective C, as well as experimental versions for Python, and Go. I installed the Octokit Nuget package into LINQPad and started hacking (there is also a reactive version for `IObservable` fans).

Poking around the various objects, and reading some documentation on GitHub (Octokit is open source), I got a feel for how the library wrapped the APIs. Though, I had not yet got any code running, I was making progress. Confident that this would enable me to create the tool I wanted to create, I started writing some code to gather a list of releases for a specific repository and stumbled over my first hurdle; authentication. It turns out it is not quite as straight-forward as I thought (the days of username and password are quite rightly behind us3), and so, my adventure began.

And then…

This is a good place to stop for this week, I think. As the series progresses, I will be piecing together the various parts of my "release note guidance" tool and hopefully, end up with a .NET library to augment Octokit with some useful history mining functionality. Next time, we will take a look at authentication with Octokit (and there will be code).

  1. OSX and Windows variants []
  2. or, James Bond for kids []
  3. OK, that's a lie, but I want to encourage good behaviour []