Octokit, Merge Commits, and the Story So Far

In the last post we had reduced our commits by matching them against pull requests; next, we can look for noise in the commit message content itself. Although I have been using the Octokit.NET repository as the target for testing with its low noise, high quality commit messages, we can envisage a less consistent repository that has some noisy commits. For example, how often have you seen or written commit messages like "Fixed spelling", "Fixed bug", or "Stuff"1?

How we detect these noisy commits is important; if our filtering is too simple, we remove too many things and if it is too strict, we remove too few. Rather than go deep into one specific implementation, I just want to introduce the idea of filtering based on message content. In the long term, I think it would be interesting to apply learning algorithms,  but I'm sure some simple, configurable pattern matching should suffice2.

If I run the filtering I have described so far3 on the Octokit.NET latest release, this is what we get:

The value of this is clearer if we see the commit list before processing:

The work so far has reduced a list of 135 commits down to 58, and so far, it looks like we have not lost any really useful "release note"-worthy information. However, the eagle-eyed among you may noticed that our 58 messages contain duplicate information. This is because each pull request is listed twice; once for the pull request title I inserted in place of its individual commits, and again for the merge commit that merged that pull request. These merge commits are not filtered out because they do not belong to the commits inside the pull request. Instead, they are an artifact of merging the pull request4.

At first, I thought the handy MergeCommitSha property of the pull request would help, but it turns out this refers to a test merge and is to be deprecated5. Instead, I realised that the messages I wanted to remove all had "Merge pull request #" in them, followed by the pull request number. This seems like a perfect use case for our pattern matching filtering. Since we have the pull requests, we could use their numbers to match each merge message exactly, but I decided to do the simpler thing of excluding any message starting with "Merge pull request #".

Filtering for messages that begin with "Merge pull request #" gives us a shortlist of just 31 messages:

I think this is a pretty good improvement over the raw commit list. Combining this list with links back to the relevant commits and pull requests should enable someone to discern the content of a release note much faster than using the raw commit list alone. I will leave that as an exercise or perhaps a future post. As always, thanks for reading. If you find yourself using Octokit to trawl your own repositories for release note information, I would love to hear about it in the comments.

  1. We're all friends here, you can admit it 

  2. The filtering should be configurable so that we can tailor it to the repository we are processing 

  3. excluding the last step of filtering by message content 

  4. Perhaps stating the obvious 

  5. https://developer.github.com/v3/pulls/