🤖 Setting up Dependabot with GitHub actions to approve and merge

Photo by Denys Nevozhai on Unsplash

Hello, everyone! It has been a long time. As the pandemic got into full swing last year, my bandwidth for cognitive tasks took a big hit and some things fell by the wayside, such as this blog. I did not intend to ghost y'all like that; alas, here we are. Perhaps I'll write more on that another time, for now, let's jump into something that I was recently tackling1.

What is Dependabot?

Dependabot is a handy tool now owned and incorporated into GitHub that monitors your repositories dependencies and updates them, a chore that many maintainers can find laborious. It is not the only tool available to perform this task, but it is the one that many will use if they are already on GitHub since it is easy to setup and manage.

When Dependabot identifies a package you use that can be updated (based on your semver rule in package.json, for example), or it knows of a security issue in a package you use or referenced by a package you use, Dependabot can create a PR (pull request) to update that package. It is pretty smart and will only ever have one PR per package, so if multiple releases occur, it will close out any pending PR for that package and make a new one. It is a really handy tool with a variety of configuration options; I'm not going to delve into those here – you can read about them in the official documentation.

When Dependabot was in preview, it would create the PR, wait for all relevant checks to have been performed – such as your CI (continuous integration) processes like linting and testing, and then on success of these checks, auto-merge the change into the branch you configured it to target. However, this is a security issue, especially if you also have CD (continuous deployment) set up as a malicious package could be published, causing Dependabot to trigger a new deployment, which could then propagate that malicious package to all your package users, and their users, etc. Quite rightly, the decision was made to take that feature out of Dependabot and leave it to each individual use-case to decide what to do with the PRs created by Dependabot. However, this then led to a new problem – managing the PRs.

To avoid security issues in your releases, make sure that there is a manual QA (quality assurance) step somewhere between any automated update and an actual release of the updated code.

Dependabot has the ability to limit how many open PRs it creates, which is helpful, but they still require manual intervention. If you have rules like "every PR needs at least X reviewers", then it can quickly become a chore almost as annoying as the one it tries to address.

So what to do? Auto-merge is a potential security issue, not auto-merging is a time sap.

Do not enable auto-merging on a branch used for CD. It really is a bad idea and a big security risk. Packages do get hacked sometimes and tidying up after a malicious publish is not easy. To avoid security issues in your releases, make sure that there is a manual QA (quality assurance) step somewhere between any automated update and an actual release of the updated code. For example, you could do this by having Dependabot operate on a branch that is a copy of the CD branch and then have a process for merging the Dependabot updates across to your main branch before a release.

💡Check out CodeQL, another GitHub feature, if you want to add some automated vulnerability checking to your repository

For the remainder of this entry, we will assume that you are using Dependabot on a main branch that is not continuously deployed. However, just as with licensing, it is ultimately your responsibility to make sure the code you release, including its dependencies do not introduce vulnerabilities, so make sure to consider your specific scenario before enabling things like Dependabot and auto-merging of the changes it makes.

What are GitHub Actions?

GitHub Actions are GitHub's approach to supporting the kinds of tasks that have traditionally been performed by CI and CD platforms like Travis, CircleCI, and Jenkins. Using a combination of YAML configurations and scripts referred to as actions, you can build workflows that perform all kinds of automated processes from running tests to managing your GitHub repository issues. They are incredibly powerful (and therefore, should be used responsibly).

Many first and third-party actions exist to help you build your workflows. I have used actions to run tests across multiple platforms, update code coverage stats in CodeCov, and, most recently, help manage Dependabot PRs. In addition, GitHub Actions have access to the incredibly powerful gh CLI tool.

💡Checkout the documentation on GitHub Actions regarding security hardening to learn how to use GitHub Actions more securely.

GitHub Actions are free for public repositories, see the GitHub Actions documentation for more information, including pricing for private repository usage.

Setting Things Up

1. Your GitHub Repository Settings

Before you setup the merging workflow, you need to make a few changes to your GitHub repository.

Auto-merge

A screenshot showing a zoomed in portion of the GitHub repository Settings tab with the Options section selected
The Settings tab of a repository on GitHub with the Options section selected
A screenshot of the "Allow auto-merge" repository setting. Text reads: "You can allow setting pull requests to merge automatically once all required reviews and status checks have passed." and there is a checkbox checked, labelled "Allow auto-merge" and the text "Waits for merge requirements to be met and then merges automatically." with a link labelled "Learn more".
The Allow auto-merge repository setting

First, go to your repository Settings tab and under the Options section, ensure that Allow auto-merge is checked. This does not make every PR auto-merge, but it does allow for specific PRs to be set to auto-merge – this will be important.

Status Checks

If you don't have status checks enabled for your repository, then it means that a PR can just be merged without any reviews or code quality checks occurring. I highly recommend setting status checks as it ensures at least some level of code quality assurance before your code or anyone else's is merged.

For the purposes of this discussion, it is assumed that you have set your repository to require at least one review per PR before it can be merged, and at least one non-instant code quality check (such as a test run, or lint check).

Status checks are mandated for PRs to specific branches by using Branch Protection Rules. These are configured in your repositories Settings under Branches. In the following screenshot, the main branch – the default branch, has branch protection rules applied. Branch protection rules can be applied to specific branches, or a range of branches by using a selector like feature/*.

A screenshot of the Branches options section of the GitHub repository Settings tab. It shows the repository's default branch as well as any branches that have protection rules setup, along with options to add, modify, and delete those rules.
The Branches section of a GitHub repository with rules applied to the main default branch

If you add rules for a branch (or set of branches) or edit an existing rule, you can specify all sorts of measures to control when code is suitable for merging in the branches that match that rule. In the following screen, the rule has been configured such that code can only be merged when:

  • It comes from a PR
  • It has at least one approving reviewer
  • It is up-to-date with the target branch
  • The codecov/project status check has passed
A screenshot of a portion of the branch protection rules screen in GitHub. There are some unchecked options and some checked options, along with text describing what those options do.
A subset of the rules one can apply to protect your branches in GitHub

Why at least one non-instant quality check?

The auto-merge setting for GitHub PRs is only useful for PRs that are not already passing all status checks. I do not know if this is still the case, but at one time it was the case that the command we are going to use to tell GitHub to auto-merge the PR would fail if the PR is already in a mergeable state. If you want to auto-merge PRs that are already mergeable when our new workflow runs, you will need to call a different command. This is left as an exercise for the reader.

2. Dependabot

You will need to enable Dependabot on your repository. Follow GitHub instructions to set it up how you want it. This blog assumes defaults, but you should be able to make it work with other configurations.

3. GitHub Actions

With Dependabot in place (and probably creating PRs for you already) and your status checks running, we can now setup our automation.

There are two things we need our automation to do.

  1. We need it to approve the PR as we have mandated that we need at least 1 reviewer in order for code to be allowed to merge.
  2. We need to enable auto-merge for the PR so that it will merge once our status checks are completed.

To add a GitHub Actions workflow, all you need to do is add a YAML file describing the workflow to the .github/workflows folder of your repository. Each YAML file describes a specific workflow, including what triggers the workflow, what permissions it has, and the jobs that it performs. Triggers can be specific events in your repository (such as creating a PR or raising an issue), webhooks, on a specific schedule such as once a week, or even via events fired from another workflow.

Let's take a look at the workflow for our approve and auto-merge workflow, and then we can discuss some of the important pieces. Since this isn't a deep dive into GitHub Actions, I will skim over some of the details to get to the pertinent info.

name: Dependabot Pull Request Approve and Merge

on: pull_request_target

permissions:
  pull-requests: write
  contents: write

jobs:
  dependabot:
    runs-on: ubuntu-latest
    # Checking the actor will prevent your Action run failing on non-Dependabot
    # PRs but also ensures that it only does work for Dependabot PRs.
    if: ${{ github.actor == 'dependabot[bot]' }}
    steps:
      # This first step will fail if there's no metadata and so the approval
      # will not occur.
      - name: Dependabot metadata
        id: dependabot-metadata
        uses: dependabot/fetch-metadata@v1.1.1
        with:
          github-token: "${{ secrets.GITHUB_TOKEN }}"
      # Here the PR gets approved.
      - name: Approve a PR
        run: gh pr review --approve "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      # Finally, this sets the PR to allow auto-merging for patch and minor
      # updates if all checks pass
      - name: Enable auto-merge for Dependabot PRs
        if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}
        run: gh pr merge --auto --squash "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

There is a bit to unpack there, so let's go over it.

name: Dependabot Pull Request Approve and Merge

First, we have the name of the workflow, which is "Dependabot Pull Request Approve and Merge". This will be shown in the GitHub user interface when referring to your workflow.

on: pull_request_target

Next, we have the triggers. In this case, we have just one trigger; pull_request_target. This trigger should rarely be used and, when it is used, used with care as it provides a read/write access token. We need this as it allows us to perform tasks to update our PR. There are specific types of each trigger if you need to narrow down exactly when your workflow occurs; pull_request_target defaults to opened, reopened, and synchronize, which means our workflow will trigger when a PR is opened, updated, or reopened. For more information on this trigger and its types, see the GitHub documentation, also check out this blog on security implications of misusing this trigger.

permissions:
  pull-requests: write
  contents: write

After specifying the trigger for the workflow, we specify the scope of permissions we are granting the workflow. Every workflow has a secret available, GITHUB_TOKEN, which is used to authenticate the actions that the workflow wants to perform. Each trigger type has a restricted level of permissions, and while we cannot elevate permissions outside of those restrictions, we can control the scope of permissions allowed within the restrictions.

In our case, we need write access to the pull requests so that we can modify the PR itself, and we need write access to the repository contents because we need to be able to request merging. Even though setting a PR to auto-merge may seem like we are just editing the PR, because it results in the code getting merged, we have to make sure we have permission to do that future merge too.

jobs:
  dependabot:
    runs-on: ubuntu-latest
    # Checking the actor will prevent your Action run failing on non-Dependabot
    # PRs but also ensures that it only does work for Dependabot PRs.
    if: ${{ github.actor == 'dependabot[bot]' }}
    steps:
      # This first step will fail if there's no metadata and so the approval
      # will not occur.
      - name: Dependabot metadata
        id: dependabot-metadata
        uses: dependabot/fetch-metadata@v1.1.1
        with:
          github-token: "${{ secrets.GITHUB_TOKEN }}"
      # Here the PR gets approved.
      - name: Approve a PR
        run: gh pr review --approve "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      # Finally, this sets the PR to allow auto-merging for patch and minor
      # updates if all checks pass
      - name: Enable auto-merge for Dependabot PRs
        if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}
        run: gh pr merge --auto --squash "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

At the end of the file, we have the jobs themselves. In this case, we have a single job named dependabot. This job runs on an instance of the latest Ubuntu image, as specified by runs-on: ubuntu-latest. GitHub Actions support a range of operating systems and versions, and you can even configure a job to run on a matrix of these things, but we do not need that fanciness – the Ubuntu images tend to be the cheapest and the fastest, so we that is what we are using.

We control when the job runs with a condition, if: ${{ github.actor == 'dependabot[bot]' }}. This means that if the PR was created by some entity other than dependabot[bot], we won't do anything, preventing us from auto-approving other folks code contributions.

Finally, we describe the steps in the job. In this case there are three steps:

  1. name: Dependabot metadata
    This step uses an action from Dependabot that gets us information about the update.
  2. name: Approve a PR
    This step performs the review approval. We do this using the awesome gh CLI.
  3. name: Enable auto-merge for Dependabot PRs
    This step sets the PR to auto-merge using a squash merge strategy. You could change this strategy to whatever you prefer – possibly omitting it if you want the repository default to be used.

Versions

You may have noticed that the last step in our job has a condition:

if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}

This is why we have the "Dependabot metadata" step. In this condition, we use that metadata to ensure that we only allow auto-merging of minor and patch level updates. After all, a major version change is likely a breaking a change, and we don't want to automatically include those, even if they do pass our status checks. So, this condition ensures that we leave major updates as open PRs for manual verification before merging.

If you decided not to merge these PRs, you can tell Dependabot exactly how to handle this dependency in future, even preventing it suggesting major updates to that package again.

Conclusion

This was a long post, but hopefully a useful one. I want to acknowledge that I did not come up with this all on my own. My strategy here was built from the excellent GitHub documentation that also goes into detail about workflows and Dependabot.

Thanks for reading. If you have feedback or questions, don't forget to leave a comment.

It's nice to be back. 🙂

  1. Don't worry, I haven't forgotten about our series on React server-side rendering – I'll get back into that soon []

Signing GitHub Commits With A Passphrase-protected Key and GPG2

GitHub recently added support for signed commits. The instructions for setting it up can be found on their website and I do not intend to rehash them here. I followed those instructions and they work splendidly. However, when I set mine up, I had used the version of GPG that came with my Git installation. A side effect I noticed was that if I were rebasing some code and wanted to make sure the rebased commits were still signed (by running git rebase with the -S option), I would have to enter my passphrase for the GPG key for every commit (which gets a little tedious after the first five or so).

Shows some commits on GitHub with the Verified indicator showing those that have been signed
How GitHub shows signed commits

Now, there are a couple of ways to fix this. One is easy; just don't use a passphrase protected key. Of course, that would make it a lot easier for someone to sign commits as me if they got my key file, so I decided that probably was not the best option. Instead, I did a little searching and found that GPG2 supports passphrase protected keys a little better than the version of GPG I had installed as part of my original git installation.

Using the GPG4Win website, I installed the Vanilla version1. I then had to export the key I had already setup with GitHub from my old GPG and import it into the new. Using gpg --list-keys, I obtained the 8 character ID for my key (the bit that reads BAADF00D in this example output):

gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
/c/Users/Jeff/.gnupg/pubring.gpg
--------------------------------
pub   4096R/BAADF00D 2016-04-07
uid                  Jeff Yates <jeff.yates@example.com>
sub   4096R/DEADBEEF 2016-04-07

Which I then used to export my keys from a Git prompt:

gpg -a --export-secret-keys BAADF00D > privatekey.txt
gpg -a --export BAADF00D > publickey.txt

This gave me two files (privatekey.txt and publickey.txt) containing text representations of the private and public keys.

Using a shell in the GPG2 pub folder ("C:\Program Files (x86)\GNU\GnuPG\pub"), I then verified them (always a good practice, especially if you got the key from someone else) before importing them2:

> gpg privatekey.txt

And rather than give me details of the key, it showed me this error:

gpg: no valid OpenPGP data found.
gpg: processing message failed: Unknown system error

What was going on? I tried verifying it with the old GPG and it gave me a different but similar error:

gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
gpg: no valid OpenPGP data found.
gpg: processing message failed: eof

I tried the public key export and it too gave these errors. It did not make a whole heap of sense. Trying to get to the bottom of it, I opened the key files in Visual Studio Code. Everything looked fine until I saw this at the bottom of the screen.

Encoding information from Visual Studio Code showing UTF16
Encoding information from Visual Studio Code

It turns out that Powershell writes redirected output as UTF-16 and I had not bothered to check. Thinking this might be the problem, I resaved each file as UTF-8 and tried verifying privatekey.txt again:

sec  4096R/BAADF00D 2016-04-07
uid                            Jeff Yates <jeff.yates@example.com>
ssb  4096R/DEADBEEF 2016-04-07

Success! Repeating this for the publickey.txt file gave the exact same information. With the keys verified, I was ready to import them into GPG2:

> gpg --import publickey.txt
gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
gpg: key BAADF00D: public key "Jeff Yates <jeff.yates@example.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
> gpg --import privatekey.txt
gpg: WARNING: using insecure memory!
gpg: please see http://www.gnupg.org/documentation/faqs.html for more information
gpg: key BAADF00D: secret key imported
gpg: key BAADF00D: "Jeff Yates <jeff.yates@example.com>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
gpg:       secret keys read: 1
gpg:   secret keys imported: 1

With the keys imported, I ran gpg --list-keys to verify they were there and then made sure to delete the text files.

Finally, to make sure that Git used the new GPG2 instead of the version of GPG that it came with, I edited my Git configuration:

> git config --global gpg.program "C:\Program Files (x86)\GNU\GnuPG\pub\gpg.exe"

Now, when I sign commits and rebases, instead of needing to enter my passphrase for each commit, I am prompted for the passphrase once. Lovely.

  1. You could also look at installing the command line tools from https://www.gnupg.org/download/ though I do not know if the results will be the same []
  2. Note that I am not showing the path to the file here for the sake of brevity, though I am sure you get the idea that you'll need to provide it []

Octokit and Noise Reduction with Pull Requests

Last time in this series on Octokit we looked at how to get the commits that have been made between one release and another. Usually, these commits will contain noise such as lazy commit messages and merge flog ("Fixed it", "Corrected spelling", etc.), merge commits, or commits that formed part of a larger feature change submitted via pull request. Rather than include all this noise in our release note generation, I want to filter those commits and either remove them entirely, or replace them with their associated pull request (which hopefully will be a little less noisy).

Before we filter out the noise, it seems prudent to reduce the commits to be filtered by matching them to pull requests. As with commits, we can query pull requests using a specific set of criteria; however, though we can request the results be sorted a certain way, we cannot specify a date range. To get all the pull requests that were merged before our release, we need to query for all the pull requests and then filter by date locally.

This query can be slow, since we are getting all closed pull requests in the repository. We could speed it up by providing a base branch name in the query criteria. However, to remove as much commit noise as possible, I would like to include pull requests that were merged to a different branch besides just the release branch1. We could make things more performant by managing a list of active release branches and then querying pull requests for each of those branches only rather than the entire repository, but for now, we will stick with the less optimal approach as it keeps the code examples a little cleaner.

var prRequest = new PullRequestRequest
{
    State = ItemState.Closed,
    SortDirection = SortDirection.Descending,
    SortProperty = PullRequestSort.Updated
};

var pullRequests = await gitHubClient.PullRequest.GetAllForRepository("RepositoryOwner", "RepositoryName", prRequest);
var pullRequestsPriorToRelease = pullRequests
    .Where(pr => pr.MergedAt < mostRecentTwoReleases[0].CreatedAt);

Before we can start filtering our commits against the pull requests, we need to get the commits that comprise each pull request. When requesting a collection of items (like we did for pull requests), the GitHub API returns just enough information about each item so that we can filter and identify the ones we really care about. Before we can do things with other properties on the items, we have to request additional information. More information on each pull request can be obtained about a specific pull request by using the `Get`, `Commits`, `Files`, and `Merged` calls. The `Get` call returns the same type of objects as the `GetAllForRepository` method, except that all the data is now populated instead of just a few select properties; the `Merged` call returns a Boolean value indicating if the PR has been merged (equivalent to the `Merged` property populated by `Get`); the `Files` method returns the files changed by that pull request; and the `Commits` method returns the commits.

var commitsForPullRequest = await gitHubClient.PullRequest.Commits("RepositoryOwner", "RepositoryName", pullRequest.Number);

At this point, things are looking pretty good: we can get a list of commits in the release and a list of pull requests that might be in the release. Now, we want to filter that list of commits to remove items that are covered by a pull request. This is easy; we just compare the hashes and remove the matches.

var commitsNotInPullRequest = from commit in commitsInRelease
                              join prCommit in prCommits on commit.Sha equals prCommit.Sha into matchedCommits
                              from match in matchedCommits.DefaultIfEmpty()
                              where match == null
                              select commit;

Using the collection of commits for the latest release, we join the commits from the pull requests using the SHA hash and then select all release commits that have no matching commit in the pull requests2. However, we don't want to lose information just because we're losing noise, so we have to maintain a list of the pull requests that were matched so that we can build our release note history. To keep track, we will hold off on discarding any information by pairing up commits in the release with their prospective pull requests instead of just dropping them.

Going back to where we had a list of pull requests merged prior to our release, let us revisit getting the commits for those pull requests and this time, pairing them with the commits in the release to retain information.

var commitsFromPullRequests = from pr in pullRequestsPriorToRelease
                              from commit in github.PullRequest.Commits("RepositoryOwner", "RepositoryName", pr.Number).Result
                              select new {commit,pr};

var commitsWithPRs = from commit in commitsInRelease
                     join prCommit in commitsFromPullRequests on commit.Sha equals prCommit.commit.Sha into matchedPrCommits
                     from matchedPrCommit in  matchedPrCommits.DefaultIfEmpty()
                     select new
                     {
                         PullRequest = match?.pr,
                         Commit = commit
                     };

Now we have a list of commits paired with their parent pull request, if there is one. Using this we can build a more meaningful set of changes for a release. If I run this on the latest release of the Octokit.NET repository and then group the commits by their paired pull request, I can see that the original list of 135 commits would be reduced to just 58 if each commit that belonged to a pull request were bundled into just one entry.

Next, we need to process the commits to remove those representing merges and other noise. These are things to discuss in the next post of this series where perhaps we will take stock and see whether this effort has been valuable in producing more meaningful release note generation. Until then, thanks for reading and don't forget to leave a comment.

  1. often changes are merged forward from one branch to another, especially if there are multiple release branches to support patch development and such []
  2. The `join` in this example is an outer join; we are taking the join results and using `DefaultIfEmpty()` to supply an empty collection when there was nothing to join []

Octokit and the Documentation Nightmare

Before I get into the meat of this series of posts, I would like to set the scene. Like many organisations that perform some level of software development these days, we use GitHub. Here at CareEvolution, some developers use the web interface extensively, some use the command line, and others use the GitHub desktop client1, but most use a combination of two or more, depending on the task. This works great for developers, who have each found a comfortable workflow for getting things done, but it is not so great for those involved with DevOps, QA, or documentation where there is a need to find out user-friendly details of what the developers did. Quite often, a feature or bug fix involves several commits and while each has a comment or two, and perhaps an associated pull request (PR) or issue has a general description, but there is no definitive list of "this is what release X contains" that can be presented to a customer. Not only that but sometimes a PR or issue is resolved in an earlier release and merged forward. While we have lists of what a release is going to include, quite often there is more detail that we would like to include, and we often have additional changes as we adapt to the changing requirements of our customers. All this means that one or more people end up trawling the commits, trying to determine what the changes are. It is not a happy task.

"There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things."

Niccolo Machiavelli
The Prince (1532)

Now, I know that this could all be avoided if people documented changes more clearly, perhaps added release notes to commits, raised issues for documentation changes, or created release notes on the release when it is made. However, no matter how noble change may be, anyone who has worked in process definition for any length of time will know that changing the behaviour of people is the hardest task of all, and therefore it should be avoided unless absolutely necessary. It was with that in mind that I decided mining the existing data for information would be an easier first step than jumping straight to asking people to change. So, with the aim of making life a little easier, I started looking at ways to automate the trawling.

I figured that by throwing out noisy and typical developer non-descriptive commits like "fixed spelling" or "updated comment", and by combining commits under the corresponding PR or issue, I could create useful summary of changes. This would not be customer-ready, but it would be ready for someone to turn into a release note without needing to trawl git history. In fact, if I included details of who committed the changes, it might even provide a feedback loop that would improve the quality of developer commit messages; developers do not like interruptions, so anyone asking for more detail on a commit they made should start to reinforce that if they wrote better commits, PRs, issues, they would get less interruptions.

Octokitty2

Octokit .NET logoAfter a dismissing using git locally to perform this task (I figured those who might need this tool would probably not want to get the repository locally) and reading up on the GitHub API a little, I cracked open LINQPad —my tool of choice for hacking— and went looking for a Nuget package to help. It was during that search that I happily stumbled on Octokit, the official GitHub library for interacting with the GitHub API. At the time of writing, Octokit reflects the polyglot nature of GitHub users, providing variants for Ruby, .NET, and Objective C, as well as experimental versions for Python, and Go. I installed the Octokit Nuget package into LINQPad and started hacking (there is also a reactive version for `IObservable` fans).

Poking around the various objects, and reading some documentation on GitHub (Octokit is open source), I got a feel for how the library wrapped the APIs. Though, I had not yet got any code running, I was making progress. Confident that this would enable me to create the tool I wanted to create, I started writing some code to gather a list of releases for a specific repository and stumbled over my first hurdle; authentication. It turns out it is not quite as straight-forward as I thought (the days of username and password are quite rightly behind us3), and so, my adventure began.

And then…

This is a good place to stop for this week, I think. As the series progresses, I will be piecing together the various parts of my "release note guidance" tool and hopefully, end up with a .NET library to augment Octokit with some useful history mining functionality. Next time, we will take a look at authentication with Octokit (and there will be code).

  1. OSX and Windows variants []
  2. or, James Bond for kids []
  3. OK, that's a lie, but I want to encourage good behaviour []

Getting Information About Your Git Repository With C#

During a hackathon not so long ago, I wanted to incorporate some source control data into my .NET assembly version information for the purposes of troubleshooting installations, making it easier for people to report the code in which they found a bug, and making it easier for people to find the code in which a bug was found1. The plan was to automatically encode the branch, the commit hash, and whether there were local commits or local changes into the `AssemblyConfiguration` attribute of my assemblies during the build.

At the time, I hacked together the `RepositoryInformation` class below that wraps the command line tool to extract the required information. This class supported detecting if the directory is a repository, checking for local commits and changes, getting the branch name and the name of the upstream branch, and enumerating the log. Though it felt a little wrong just wrapping the command line (and seemed pretty fragile too), it worked. Unfortunately, it was dependent on git being installed on the build system; I would prefer the build to get everything it needs using package management like NuGet and npm2.

class RepositoryInformation : IDisposable
{
    public static RepositoryInformation GetRepositoryInformationForPath(string path, string gitPath = null)
    {
        var repositoryInformation = new RepositoryInformation(path, gitPath);
        if (repositoryInformation.IsGitRepository)
        {
            return repositoryInformation;
        }
        return null;
    }
    
    public string CommitHash
    {
        get
        {
            return RunCommand("rev-parse HEAD");
        }
    }
    
    public string BranchName
    {
        get
        {
            return RunCommand("rev-parse --abbrev-ref HEAD");
        }
    }
    
    public string TrackedBranchName
    {
        get
        {
            return RunCommand("rev-parse --abbrev-ref --symbolic-full-name @{u}");
        }
    }
    
    public bool HasUnpushedCommits
    {
        get
        {
            return !String.IsNullOrWhiteSpace(RunCommand("log @{u}..HEAD"));
        }
    }
    
    public bool HasUncommittedChanges
    {
        get
        {
            return !String.IsNullOrWhiteSpace(RunCommand("status --porcelain"));
        }
    }
    
    public IEnumerable<string> Log
    {
        get
        {
            int skip = 0;
            while (true)
            {
                string entry = RunCommand(String.Format("log --skip={0} -n1", skip++));
                if (String.IsNullOrWhiteSpace(entry))
                {
                    yield break;
                }
                
                yield return entry;
            }
        }
    }
    
    public void Dispose()
    {
        if (!_disposed)
        {
            _disposed = true;
            _gitProcess.Dispose();
        }
    }
    
    private RepositoryInformation(string path, string gitPath)
    {
        var processInfo = new ProcessStartInfo
        {
            UseShellExecute = false,
            RedirectStandardOutput = true,
            FileName = Directory.Exists(gitPath) ? gitPath : "git.exe",
            CreateNoWindow = true,
            WorkingDirectory = (path != null && Directory.Exists(path)) ? path : Environment.CurrentDirectory
        };
        
        _gitProcess = new Process();
        _gitProcess.StartInfo = processInfo;
    }
    
    private bool IsGitRepository
    {
        get
        {
            return !String.IsNullOrWhiteSpace(RunCommand("log -1"));
        }
    }
    
    private string RunCommand(string args)
    {
        _gitProcess.StartInfo.Arguments = args;
        _gitProcess.Start();
        string output = _gitProcess.StandardOutput.ReadToEnd().Trim();
        _gitProcess.WaitForExit();
        return output;
    }
    
    private bool _disposed;
    private readonly Process _gitProcess;
}

If I were to approach this again today, I would use the LibGit2Sharp NuGet package or something similar3. Below is an updated version of `RepositoryInformation` that uses LibGit2Sharp instead of git command line. Clearly, you could forego any type of wrapper for LibGit2Sharp and I probably would if I were incorporating this into a bigger task like the one I originally had planned.

class RepositoryInformation : IDisposable
{
    public static RepositoryInformation GetRepositoryInformationForPath(string path)
    {
        if (LibGit2Sharp.Repository.IsValid(path))
        {
            return new RepositoryInformation(path);
        }
        return null;
    }
    
    public string CommitHash
    {
        get
        {
            return _repo.Head.Tip.Sha;
        }
    }
    
    public string BranchName
    {
        get
        {
            return _repo.Head.Name;
        }
    }
    
    public string TrackedBranchName
    {
        get
        {
            return _repo.Head.IsTracking ? _repo.Head.TrackedBranch.Name : String.Empty;
        }
    }
    
    public bool HasUnpushedCommits
    {
        get
        {
            return _repo.Head.TrackingDetails.AheadBy > 0;
        }
    }
    
    public bool HasUncommittedChanges
    {
        get
        {
            return _repo.RetrieveStatus().Any(s => s.State != FileStatus.Ignored);
        }
    }
    
    public IEnumerable<Commit> Log
    {
        get
        {
            return _repo.Head.Commits;
        }
    }
    
    public void Dispose()
    {
        if (!_disposed)
        {
            _disposed = true;
            _repo.Dispose();
        }
    }
    
    private RepositoryInformation(string path)
    {
        _repo = new Repository(path);
    }

    private bool _disposed;
    private readonly Repository _repo;
}

I have yet to use any of this outside of my hackathon work or this blog entry, but now that I have resurrected it from my library of coding exploits past to write about, I might just resurrect the original plans I had too. Whether that happens or not, I hope you found this useful or at least a little interesting; if so, or if you have some suggestions related to this post, please let me know in the comments.

  1. Sometimes, like a squirrel, you want to know which branch you were on []
  2. I had looked at NuGet packages when I was working on the original hackathon project, but had decided not to use one for some reason or another (perhaps the available packages did not do everything I wanted at that time) []
  3. PowerShell could be a viable replacement for my initial approach, but it would suffer from the same issue of needing git on the build system; by using a NuGet package, the build includes everything it needs []