🤖 Setting up Dependabot with GitHub actions to approve and merge

Photo by Denys Nevozhai on Unsplash

Hello, everyone! It has been a long time. As the pandemic got into full swing last year, my bandwidth for cognitive tasks took a big hit and some things fell by the wayside, such as this blog. I did not intend to ghost y'all like that; alas, here we are. Perhaps I'll write more on that another time, for now, let's jump into something that I was recently tackling1.

What is Dependabot?

Dependabot is a handy tool now owned and incorporated into GitHub that monitors your repositories dependencies and updates them, a chore that many maintainers can find laborious. It is not the only tool available to perform this task, but it is the one that many will use if they are already on GitHub since it is easy to setup and manage.

When Dependabot identifies a package you use that can be updated (based on your semver rule in package.json, for example), or it knows of a security issue in a package you use or referenced by a package you use, Dependabot can create a PR (pull request) to update that package. It is pretty smart and will only ever have one PR per package, so if multiple releases occur, it will close out any pending PR for that package and make a new one. It is a really handy tool with a variety of configuration options; I'm not going to delve into those here – you can read about them in the official documentation.

When Dependabot was in preview, it would create the PR, wait for all relevant checks to have been performed – such as your CI (continuous integration) processes like linting and testing, and then on success of these checks, auto-merge the change into the branch you configured it to target. However, this is a security issue, especially if you also have CD (continuous deployment) set up as a malicious package could be published, causing Dependabot to trigger a new deployment, which could then propagate that malicious package to all your package users, and their users, etc. Quite rightly, the decision was made to take that feature out of Dependabot and leave it to each individual use-case to decide what to do with the PRs created by Dependabot. However, this then led to a new problem – managing the PRs.

To avoid security issues in your releases, make sure that there is a manual QA (quality assurance) step somewhere between any automated update and an actual release of the updated code.

Dependabot has the ability to limit how many open PRs it creates, which is helpful, but they still require manual intervention. If you have rules like "every PR needs at least X reviewers", then it can quickly become a chore almost as annoying as the one it tries to address.

So what to do? Auto-merge is a potential security issue, not auto-merging is a time sap.

Do not enable auto-merging on a branch used for CD. It really is a bad idea and a big security risk. Packages do get hacked sometimes and tidying up after a malicious publish is not easy. To avoid security issues in your releases, make sure that there is a manual QA (quality assurance) step somewhere between any automated update and an actual release of the updated code. For example, you could do this by having Dependabot operate on a branch that is a copy of the CD branch and then have a process for merging the Dependabot updates across to your main branch before a release.

💡Check out CodeQL, another GitHub feature, if you want to add some automated vulnerability checking to your repository

For the remainder of this entry, we will assume that you are using Dependabot on a main branch that is not continuously deployed. However, just as with licensing, it is ultimately your responsibility to make sure the code you release, including its dependencies do not introduce vulnerabilities, so make sure to consider your specific scenario before enabling things like Dependabot and auto-merging of the changes it makes.

What are GitHub Actions?

GitHub Actions are GitHub's approach to supporting the kinds of tasks that have traditionally been performed by CI and CD platforms like Travis, CircleCI, and Jenkins. Using a combination of YAML configurations and scripts referred to as actions, you can build workflows that perform all kinds of automated processes from running tests to managing your GitHub repository issues. They are incredibly powerful (and therefore, should be used responsibly).

Many first and third-party actions exist to help you build your workflows. I have used actions to run tests across multiple platforms, update code coverage stats in CodeCov, and, most recently, help manage Dependabot PRs. In addition, GitHub Actions have access to the incredibly powerful gh CLI tool.

💡Checkout the documentation on GitHub Actions regarding security hardening to learn how to use GitHub Actions more securely.

GitHub Actions are free for public repositories, see the GitHub Actions documentation for more information, including pricing for private repository usage.

Setting Things Up

1. Your GitHub Repository Settings

Before you setup the merging workflow, you need to make a few changes to your GitHub repository.

Auto-merge

A screenshot showing a zoomed in portion of the GitHub repository Settings tab with the Options section selected
The Settings tab of a repository on GitHub with the Options section selected
A screenshot of the "Allow auto-merge" repository setting. Text reads: "You can allow setting pull requests to merge automatically once all required reviews and status checks have passed." and there is a checkbox checked, labelled "Allow auto-merge" and the text "Waits for merge requirements to be met and then merges automatically." with a link labelled "Learn more".
The Allow auto-merge repository setting

First, go to your repository Settings tab and under the Options section, ensure that Allow auto-merge is checked. This does not make every PR auto-merge, but it does allow for specific PRs to be set to auto-merge – this will be important.

Status Checks

If you don't have status checks enabled for your repository, then it means that a PR can just be merged without any reviews or code quality checks occurring. I highly recommend setting status checks as it ensures at least some level of code quality assurance before your code or anyone else's is merged.

For the purposes of this discussion, it is assumed that you have set your repository to require at least one review per PR before it can be merged, and at least one non-instant code quality check (such as a test run, or lint check).

Status checks are mandated for PRs to specific branches by using Branch Protection Rules. These are configured in your repositories Settings under Branches. In the following screenshot, the main branch – the default branch, has branch protection rules applied. Branch protection rules can be applied to specific branches, or a range of branches by using a selector like feature/*.

A screenshot of the Branches options section of the GitHub repository Settings tab. It shows the repository's default branch as well as any branches that have protection rules setup, along with options to add, modify, and delete those rules.
The Branches section of a GitHub repository with rules applied to the main default branch

If you add rules for a branch (or set of branches) or edit an existing rule, you can specify all sorts of measures to control when code is suitable for merging in the branches that match that rule. In the following screen, the rule has been configured such that code can only be merged when:

  • It comes from a PR
  • It has at least one approving reviewer
  • It is up-to-date with the target branch
  • The codecov/project status check has passed
A screenshot of a portion of the branch protection rules screen in GitHub. There are some unchecked options and some checked options, along with text describing what those options do.
A subset of the rules one can apply to protect your branches in GitHub

Why at least one non-instant quality check?

The auto-merge setting for GitHub PRs is only useful for PRs that are not already passing all status checks. I do not know if this is still the case, but at one time it was the case that the command we are going to use to tell GitHub to auto-merge the PR would fail if the PR is already in a mergeable state. If you want to auto-merge PRs that are already mergeable when our new workflow runs, you will need to call a different command. This is left as an exercise for the reader.

2. Dependabot

You will need to enable Dependabot on your repository. Follow GitHub instructions to set it up how you want it. This blog assumes defaults, but you should be able to make it work with other configurations.

3. GitHub Actions

With Dependabot in place (and probably creating PRs for you already) and your status checks running, we can now setup our automation.

There are two things we need our automation to do.

  1. We need it to approve the PR as we have mandated that we need at least 1 reviewer in order for code to be allowed to merge.
  2. We need to enable auto-merge for the PR so that it will merge once our status checks are completed.

To add a GitHub Actions workflow, all you need to do is add a YAML file describing the workflow to the .github/workflows folder of your repository. Each YAML file describes a specific workflow, including what triggers the workflow, what permissions it has, and the jobs that it performs. Triggers can be specific events in your repository (such as creating a PR or raising an issue), webhooks, on a specific schedule such as once a week, or even via events fired from another workflow.

Let's take a look at the workflow for our approve and auto-merge workflow, and then we can discuss some of the important pieces. Since this isn't a deep dive into GitHub Actions, I will skim over some of the details to get to the pertinent info.

name: Dependabot Pull Request Approve and Merge

on: pull_request_target

permissions:
  pull-requests: write
  contents: write

jobs:
  dependabot:
    runs-on: ubuntu-latest
    # Checking the actor will prevent your Action run failing on non-Dependabot
    # PRs but also ensures that it only does work for Dependabot PRs.
    if: ${{ github.actor == 'dependabot[bot]' }}
    steps:
      # This first step will fail if there's no metadata and so the approval
      # will not occur.
      - name: Dependabot metadata
        id: dependabot-metadata
        uses: dependabot/fetch-metadata@v1.1.1
        with:
          github-token: "${{ secrets.GITHUB_TOKEN }}"
      # Here the PR gets approved.
      - name: Approve a PR
        run: gh pr review --approve "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      # Finally, this sets the PR to allow auto-merging for patch and minor
      # updates if all checks pass
      - name: Enable auto-merge for Dependabot PRs
        if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}
        run: gh pr merge --auto --squash "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

There is a bit to unpack there, so let's go over it.

name: Dependabot Pull Request Approve and Merge

First, we have the name of the workflow, which is "Dependabot Pull Request Approve and Merge". This will be shown in the GitHub user interface when referring to your workflow.

on: pull_request_target

Next, we have the triggers. In this case, we have just one trigger; pull_request_target. This trigger should rarely be used and, when it is used, used with care as it provides a read/write access token. We need this as it allows us to perform tasks to update our PR. There are specific types of each trigger if you need to narrow down exactly when your workflow occurs; pull_request_target defaults to opened, reopened, and synchronize, which means our workflow will trigger when a PR is opened, updated, or reopened. For more information on this trigger and its types, see the GitHub documentation, also check out this blog on security implications of misusing this trigger.

permissions:
  pull-requests: write
  contents: write

After specifying the trigger for the workflow, we specify the scope of permissions we are granting the workflow. Every workflow has a secret available, GITHUB_TOKEN, which is used to authenticate the actions that the workflow wants to perform. Each trigger type has a restricted level of permissions, and while we cannot elevate permissions outside of those restrictions, we can control the scope of permissions allowed within the restrictions.

In our case, we need write access to the pull requests so that we can modify the PR itself, and we need write access to the repository contents because we need to be able to request merging. Even though setting a PR to auto-merge may seem like we are just editing the PR, because it results in the code getting merged, we have to make sure we have permission to do that future merge too.

jobs:
  dependabot:
    runs-on: ubuntu-latest
    # Checking the actor will prevent your Action run failing on non-Dependabot
    # PRs but also ensures that it only does work for Dependabot PRs.
    if: ${{ github.actor == 'dependabot[bot]' }}
    steps:
      # This first step will fail if there's no metadata and so the approval
      # will not occur.
      - name: Dependabot metadata
        id: dependabot-metadata
        uses: dependabot/fetch-metadata@v1.1.1
        with:
          github-token: "${{ secrets.GITHUB_TOKEN }}"
      # Here the PR gets approved.
      - name: Approve a PR
        run: gh pr review --approve "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      # Finally, this sets the PR to allow auto-merging for patch and minor
      # updates if all checks pass
      - name: Enable auto-merge for Dependabot PRs
        if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}
        run: gh pr merge --auto --squash "$PR_URL"
        env:
          PR_URL: ${{ github.event.pull_request.html_url }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

At the end of the file, we have the jobs themselves. In this case, we have a single job named dependabot. This job runs on an instance of the latest Ubuntu image, as specified by runs-on: ubuntu-latest. GitHub Actions support a range of operating systems and versions, and you can even configure a job to run on a matrix of these things, but we do not need that fanciness – the Ubuntu images tend to be the cheapest and the fastest, so we that is what we are using.

We control when the job runs with a condition, if: ${{ github.actor == 'dependabot[bot]' }}. This means that if the PR was created by some entity other than dependabot[bot], we won't do anything, preventing us from auto-approving other folks code contributions.

Finally, we describe the steps in the job. In this case there are three steps:

  1. name: Dependabot metadata
    This step uses an action from Dependabot that gets us information about the update.
  2. name: Approve a PR
    This step performs the review approval. We do this using the awesome gh CLI.
  3. name: Enable auto-merge for Dependabot PRs
    This step sets the PR to auto-merge using a squash merge strategy. You could change this strategy to whatever you prefer – possibly omitting it if you want the repository default to be used.

Versions

You may have noticed that the last step in our job has a condition:

if: ${{ steps.dependabot-metadata.outputs.update-type != 'version-update:semver-major' }}

This is why we have the "Dependabot metadata" step. In this condition, we use that metadata to ensure that we only allow auto-merging of minor and patch level updates. After all, a major version change is likely a breaking a change, and we don't want to automatically include those, even if they do pass our status checks. So, this condition ensures that we leave major updates as open PRs for manual verification before merging.

If you decided not to merge these PRs, you can tell Dependabot exactly how to handle this dependency in future, even preventing it suggesting major updates to that package again.

Conclusion

This was a long post, but hopefully a useful one. I want to acknowledge that I did not come up with this all on my own. My strategy here was built from the excellent GitHub documentation that also goes into detail about workflows and Dependabot.

Thanks for reading. If you have feedback or questions, don't forget to leave a comment.

It's nice to be back. 🙂

  1. Don't worry, I haven't forgotten about our series on React server-side rendering – I'll get back into that soon []

🙇🏻‍♂️ Introducing checksync

Photo by Clint Adair on Unsplash

Have you ever written code in more than one place that needs to stay in sync? Perhaps there is a tool in your framework of choice that can generate multiple files from a single source of truth, like T4 templates in the .NET world; perhaps not. Even if there is such a tool, it adds a layer of complexity that is not necessarily easy to grok. If you look at the output files or the template itself, it may not be clear what files are affected or related.

At Khan Academy, we have a linter, written in Python, that is executed whenever we create a new diff for review. It runs across a subset of our files and looks for blocks of text that are marked up with a custom comment format that identifies those blocks as being synchronized with other target blocks. Included in that markup is a checksum of the target block content such that if the target changes, we will get an error from the linter. This is our signal to check if further changes are need and then update the checksums that are invalidated. The only bugbear folks seem to have is that instead of offering an option to auto-fix checksums in need of update, it outputs a perl script that has to be copied and run for that purpose.

Small bugbear aside, this tool is fantastic. It enables us to link code blocks that need to be synchronized and catches when we change them with reasonably low overhead. Though I believe it is hugely useful, it is sadly custom to our codebase. I have long wanted to address that and create an open source version for everyone to use. checksync is that open source version.

🤔 The Requirements

Before writing checksync, I started out with the following requirements:

  • It should work with existing marked up code in the Khan Academy codebase; specifically,
    1. File paths are relative to the project root directory
    2. Checksums are calculated using Adler-32
    3. Both // and # style comments are used to comment the markup tags
    4. Start tag format is:
      sync-start:<ID> <CHECKSUM> <TARGET_FILE_PATH>
    5. End tag format is:
      sync-end:<ID>
    6. Multiple start tags can exist for the same tag ID but with different target files
    7. Sync tags are not included in the checksum'd content
    8. An extra line of blank content is included in the checksum'd content (due to a holdover from an earlier implementation)
    9. .gitignore files should be ignored
    10. Additional files can be ignored
  • It should be comparably performant to the existing linter
    • The linter ran over the entire Khan Academy website codebase in less than 15 seconds
  • It should auto-update invalid checksums if asked to do so
  • It should output file paths such that editors like Visual Studio Code can open them on the correct line
  • It should support more comment styles
  • It should generally support any text file
  • It should run on Node 8 and above
    • Some of our projects are still using Node 8 and I wanted to support those uses

With these requirements in mind, I implemented checksync (and ancesdir, which I ended up needing to ensure project root-relative file paths). By making it compatible with the existing Khan Academy linter, I could leverage the existing Khan Academy codebase to help measure performance and verify that things worked correctly. After a few changes to address various bugs and performance issues, it is still mildly slower than the Python equivalent, but the added features it provides more than make up for that (especially the fact that it is available to folks outside of our organization).

🎉 Check It Out

checksync includes a --help option to get information on usage. I have included the output below to give an overview of usage and the options available to customize how checksync runs.

checksync --help
checksync ✅ 🔗

Checksync uses tags in your files to identify blocks that need to remain
synchronised. It works on any text file as long as it can find the tags.

Tag Format

Each tagged block is identified by one or more sync-start tags and a single
sync-end tag.

The sync-start tags take the form:

    <comment> sync-start:<marker_id> <?checksum> <target_file>

The sync-end tags take the form:

    <comment> sync-end:<marker_id>

Each marker_idcan have multiple sync-start tags, each with a different
target file, but there must be only one corresponding sync-endtag.

Where:

    <comment>       is one of the comment tokens provided by the --comment
                    argument

    <marker_id>     is the unique identifier for this marker

    <checksum>      is the expected checksum of the corresponding block in
                    the target file

    <target_file>   is the path from your package root to the target file
                    with a corresponding sync block with the same marker_id

Usage

checksync <arguments> <include_globs>

Where:

    <arguments>       are the arguments you provide (see below)

    <include_globs>   are glob patterns for identifying files to check

Arguments

    --comments,-c      A string containing comma-separated tokens that
                       indicate the start of lines where tags appear.
                       Defaults to "//,#".

    --dry-run,-n       Ignored unless supplied with --update-tags.

    --help,-h          Outputs this help text.

    --ignore,-i        A string containing comma-separated globs that identify
                       files that should not be checked.

    --ignore-files     A comma-separated list of .gitignore-like files that
                       provide path patterns to be ignored. These will be
                       combined with the --ignore globs.
                       Ignored if --no-ignore-file is present.
                       Defaults to .gitignore.

    --no-ignore-file   When true, does not use any ignore file. This is
                       useful when the default value for --ignore-file is not
                       wanted.

    --root-marker,-m   By default, the root directory (used to generate
                       interpret and generate target paths for sync-start
                       tags) for your project is determined by the nearest
                       ancestor directory to the processed files that
                       contains a package.json file. If you want to
                       use a different file or directory to identify your
                       root directory, specify that using this argument.
                       For example, --root-marker .gitignore would mean
                       the first ancestor directory containing a
                       .gitignore file.

    --update-tags,-u   Updates tags with incorrect target checksums. This
                       modifies files in place; run with --dry-run to see what
                       files will change without modifying them.

    --verbose          More details will be added to the output when this
                       option is provided. This is useful when determining if
                       provided glob patterns are applying as expected, for
                       example.

And here is a simple example (taken from the checksync code repository) of running checksync against a directory with two files, using the defaults. The two files are given below to show how they are marked up for use with checksync. In this example, the checksums do not match the tagged content (though you are not expected to know that just by looking at the files – that's what checksync is for).

// This is a a javascript (or similar language) file

// sync-start:update_me 45678 __examples__/checksums_need_updating/b.py
const someCode = "does a thing";
console.log(someCode);
// sync-end:update_me
# Test file in Python style

# sync-start:update_me 4567 __examples__/checksums_need_updating/a.js
code = 1
# sync-end:update_me
Example output showing mismatched checksums

Additional examples that demonstrate various synchronization conditions and error cases can be found in the checksync code repository. To give checksync a try for yourself:

I hope you find this tool useful, and if you do or you have any questions, please do comment on this blog.