Open Source – somewhat abstract

🛠 These Software Development Tools Will Blow Your Mind

For my final post of 2019, I thought I would deal with some serious FOMO¹ by adding one of my own to the legions of year-end listicles adorned by clickbait headlines. You will not believe what's at number four! Nevermore shall I lament having passed up the opportunity to proffer a subjective collection of arbitrary length for your attention. Of what engrossing subject shall this list be? Life achievements? 11 places to visit before you die? Things I have discovered when looking for other things that I have lost? Nay, it shall be tools! Tools, I say. To be specific, development tools for it is tools of software development that I use often. To be even more specific, this is a list of development tools for each of which I could write an additional list of killer features, extensions, and magical whosywotsits. I present to you a list of feature packed fancies for fruitful software fabrication. How many tools? Read on to find out².

Visual Studio Code
`code.visualstudio.com`

A screenshot of the Visual Studio Code environment, showing the Extensions Marketplace on the left and a JavaScript file being edited on the right, below which is shown a terminal panel indicating code has been successfully compiled. The editor is also showing an autocomplete list of possible syntax to insert. The bottom of the screen is a status bar showing additional information about the currently opened files.

This open source, cross platform, integrated development environment (IDE) backed by Microsoft really is the best I have used so far. With its built-in terminal, text editor, and task engine, it really is an integrated environment where, if it weren't for Slack and my web browser, I would spend all of my software crafting days. In fact, if I liked the extensions that integrated Slack and web browsing into Visual Studio Code, I could use it or them too; but I don't, so I don't.

Unlike IDE's of old, Visual Studio Code – often referred to as just VS Code or Code (it's command line invocation is the delightful code the-file-I-want-to-edit.js), is implemented to avoid having opinions about the code you write. Instead, it is written to support your code dictating how you write code so that you can deftly move between projects without worrying that the settings for one team will somehow traipse all over the settings for another.

If you like vim-style editors, there's a vim emulator. If you don't like the menus, use Zen Mode or go fullscreen. Want to run tests integrated in the sidebar or the test file itself? Do that. Want to run them from a terminal or via a background task? Do that instead. I could (and probably will) write a whole new list on the best extensions to use with Visual Studio Code. It is versatile enough that I believe any developer could make it the editor they need.

Why it is great for software developers?

Either natively or via an extension, Visual Studio Code supports just about every aspect of the software development lifecycle you might encounter, on every platform that likely matters (Windows, macOS, and Linux), using any workflow that suits you. Not to mention it gets feature updates monthly and is supported by a huge community of users. It takes a little DNA from editors like Atom, Sublime, and your basic text editor, and elevates them to something, well, sublimer³.

WordPress
`wordpress.org`

A screenshot of a web browser showing the WordPress admin screen for choosing a theme, with several themes previewed in a grid.

When concentrating on software development tools, it is really easy for me to overlook this one – probably because it is not a software development tool, at least not from first impressions. I use WordPress for this blog; I always have. There are many alternatives out there; some more technically involved than others. I know I could use markdown in a GitHub repository; I have heard of Jekyll and Gatsby and so many other ways to generate a site; I know about Medium, but for me, WordPress wins because it has the features I need, including wide support for hosting, accessibility, themes, plugins, and autonomy from the whimsy and money-grabbing aggregation platforms like Medium.

The recent updates to the core editing experience, known as Gutenberg, have been amazing and the regular updates that are auto-applied without me raising a finger keep adding polish to an already awesome experience. I can schedule posts, manage comments, and use plugins to add syntax highlighting, footnotes, multi-factor authentication, backups, and spam filtering, to name just a few. Just as with Visual Studio Code extensions, I could write another list of plugins that I love for WordPress.

35% of the web uses WordPress, from hobby blogs to the biggest news sites online.
https://wordpress.org/

WordPress is used by over a third of the web. A third! That includes this blog, almost every site that Ann Arbor Give Camp has worked on in the past few years, and rollingstone.com! If you are considering putting together any kind of website, whether a blog, or something else, I highly recommend this freely available platform that has more versatility than Meryl Streep.

Why it is great for software developers?

As I mentioned earlier, WordPress seems entirely unrelated to software development (unless you are writing themes and plugins for WordPress⁴). However, I have learned that writing a blog about ones technical exploits is an absolutely amazing software development tool. I never realised how much I could learn just by trying to teach someone else. I used to think a blog was about its readers and being right; I have come to learn that a blog is merely its writer, being. The act of writing a blog is where its value lies. Writing this blog identifies gaps in my knowledge, personal biases, and more. It can shine a light on my laziness, focus the my mind on a gnarly problem, and provide a scaffold from which to hang my personal growth. There are numerous times where writing posts for this blog (including some I never published) has helped me become a better software engineer. The fact that sometimes, someone reads it and finds it useful, entertaining, or infuriating is really just a bonus.

Write a blog. Hold an opinion. It is worth it.

GitHub
`github.com`

For some readers, this may seem a pointless entry. Using GitHub for collaborative software development is so incredibly common that suggesting folks should use it seems a bit like suggesting folks should try breathing air⁵. Though there are alternatives such as GitLab, GitKracken⁶, or BitBucket, GitHub is almost ubiquitous. I do not recall an open source project that I have interacted with recently that was not hosted on GitHub⁷. With the recent changes allowing private and public repositories for personal accounts, the addition of GitHub Actions for automating all kinds of workflows – free to open source, and some great improvements to code review that have been released or are in beta, GitHub is an absolutely fantastic tool for those developing software. Add to that the integrations with other tools that I use like Visual Studio Code, Slack, and third-party issue trackers such as Jira, and GitHub shines. Many feared that its acquisition by Microsoft would doom it to failure, yet the Microsoft of today is a wonderful curator of open source goodies, and it seems that we all get to reap the benefits.

Why it is great for software developers?

Free backup of your source code, code reviews, automated workflows, and more, all on a tried and tested platform with a huge community. Not only that, but if you want, you get to collaborate, build, and present work with that community⁸.

🤷🏻‍♂️ That's it…

I don't know about you, but lists are exhausting. I've only written about three things and I'm already done with everything and ready for a lie down. I do stand by this list though. I really thought about what to put on it, considering the various tools I have used, not only because I have to, but because I want to. I would choose these tools from the very start of a new project unlike some others I use that, while I like them, are specific to a technology (such as React Developer Tools), are only what I use because the circumstances call for it, or are not really that standout against alternatives that I could be using.

Of course, this is all my personal opinion drawn from my personal experience; you have absolutely no obligation to agree with me. In fact, you have every right to use anything but the things I mentioned above, remaining in your state of willful ignorance, knowing you are wrong, unwilling to accept the truth as a way of life 😈. Just kidding, these are development tools, not religions – what works for you, works for you. These work for me. Perhaps you agree and want to pat me on the back from my excellent choices, maybe you care to tell me your preferred alternatives or shout at me about mine, or perhaps you read the footnotes and really have something to say about privilege, toxicity, and portfolios – feel free to engage in the comments; let's talk 🤗.

And with that, I bid you well until the 🎆New Year and all the productive software shenanigans that await us in 2020. 🙇🏻‍♂️

Fear Of Missing Out – a most annoying acronym, I find; why? no idea [↩]
it's three, three tools [↩]
don't you roll your eyes at me [↩]
PHP? Ew! [↩]
you really should, it is to die for [↩]
mention-worthy, if only for the pun [↩]
before GitHub, it was SourceForge, before the DevShare debacle – https://en.wikipedia.org/wiki/SourceForge [↩]
Side Note: I think it is perfectly fine not to have a portfolio; some of the best developers I know do not have any public source, or fancy stuff to show off. This weird obsession some folks have with portfolios feels like another toxic manifestation of privilege in the software development world, and I don't care for it. Let's share what we want (and are able) to share, and accept that if we don't, that doesn't mean we're shit developers. 💙 [↩]

🙇🏻‍♂️ Introducing checksync

Photo by Clint Adair on Unsplash

Have you ever written code in more than one place that needs to stay in sync? Perhaps there is a tool in your framework of choice that can generate multiple files from a single source of truth, like T4 templates in the .NET world; perhaps not. Even if there is such a tool, it adds a layer of complexity that is not necessarily easy to grok. If you look at the output files or the template itself, it may not be clear what files are affected or related.

At Khan Academy, we have a linter, written in Python, that is executed whenever we create a new diff for review. It runs across a subset of our files and looks for blocks of text that are marked up with a custom comment format that identifies those blocks as being synchronized with other target blocks. Included in that markup is a checksum of the target block content such that if the target changes, we will get an error from the linter. This is our signal to check if further changes are need and then update the checksums that are invalidated. The only bugbear folks seem to have is that instead of offering an option to auto-fix checksums in need of update, it outputs a perl script that has to be copied and run for that purpose.

Small bugbear aside, this tool is fantastic. It enables us to link code blocks that need to be synchronized and catches when we change them with reasonably low overhead. Though I believe it is hugely useful, it is sadly custom to our codebase. I have long wanted to address that and create an open source version for everyone to use. checksync is that open source version.

🤔 The Requirements

Before writing checksync, I started out with the following requirements:

It should work with existing marked up code in the Khan Academy codebase; specifically,
1. File paths are relative to the project root directory
2. Checksums are calculated using Adler-32
3. Both // and # style comments are used to comment the markup tags
4. Start tag format is:
  sync-start:<ID> <CHECKSUM> <TARGET_FILE_PATH>
5. End tag format is:
  sync-end:<ID>
6. Multiple start tags can exist for the same tag ID but with different target files
7. Sync tags are not included in the checksum'd content
8. An extra line of blank content is included in the checksum'd content (due to a holdover from an earlier implementation)
9. .gitignore files should be ignored
10. Additional files can be ignored
It should be comparably performant to the existing linter
- The linter ran over the entire Khan Academy website codebase in less than 15 seconds
It should auto-update invalid checksums if asked to do so
It should output file paths such that editors like Visual Studio Code can open them on the correct line
It should support more comment styles
It should generally support any text file
It should run on Node 8 and above
- Some of our projects are still using Node 8 and I wanted to support those uses

With these requirements in mind, I implemented checksync (and ancesdir, which I ended up needing to ensure project root-relative file paths). By making it compatible with the existing Khan Academy linter, I could leverage the existing Khan Academy codebase to help measure performance and verify that things worked correctly. After a few changes to address various bugs and performance issues, it is still mildly slower than the Python equivalent, but the added features it provides more than make up for that (especially the fact that it is available to folks outside of our organization).

🎉 Check It Out

checksync includes a --help option to get information on usage. I have included the output below to give an overview of usage and the options available to customize how checksync runs.

checksync --help

checksync ✅ 🔗

Checksync uses tags in your files to identify blocks that need to remain
synchronised. It works on any text file as long as it can find the tags.

Tag Format

Each tagged block is identified by one or more sync-start tags and a single
sync-end tag.

The sync-start tags take the form:

    <comment> sync-start:<marker_id> <?checksum> <target_file>

The sync-end tags take the form:

    <comment> sync-end:<marker_id>

Each marker_idcan have multiple sync-start tags, each with a different
target file, but there must be only one corresponding sync-endtag.

Where:

    <comment>       is one of the comment tokens provided by the --comment
                    argument

    <marker_id>     is the unique identifier for this marker

    <checksum>      is the expected checksum of the corresponding block in
                    the target file

    <target_file>   is the path from your package root to the target file
                    with a corresponding sync block with the same marker_id

Usage

checksync <arguments> <include_globs>

Where:

    <arguments>       are the arguments you provide (see below)

    <include_globs>   are glob patterns for identifying files to check

Arguments

    --comments,-c      A string containing comma-separated tokens that
                       indicate the start of lines where tags appear.
                       Defaults to "//,#".

    --dry-run,-n       Ignored unless supplied with --update-tags.

    --help,-h          Outputs this help text.

    --ignore,-i        A string containing comma-separated globs that identify
                       files that should not be checked.

    --ignore-files     A comma-separated list of .gitignore-like files that
                       provide path patterns to be ignored. These will be
                       combined with the --ignore globs.
                       Ignored if --no-ignore-file is present.
                       Defaults to .gitignore.

    --no-ignore-file   When true, does not use any ignore file. This is
                       useful when the default value for --ignore-file is not
                       wanted.

    --root-marker,-m   By default, the root directory (used to generate
                       interpret and generate target paths for sync-start
                       tags) for your project is determined by the nearest
                       ancestor directory to the processed files that
                       contains a package.json file. If you want to
                       use a different file or directory to identify your
                       root directory, specify that using this argument.
                       For example, --root-marker .gitignore would mean
                       the first ancestor directory containing a
                       .gitignore file.

    --update-tags,-u   Updates tags with incorrect target checksums. This
                       modifies files in place; run with --dry-run to see what
                       files will change without modifying them.

    --verbose          More details will be added to the output when this
                       option is provided. This is useful when determining if
                       provided glob patterns are applying as expected, for
                       example.

And here is a simple example (taken from the checksync code repository) of running checksync against a directory with two files, using the defaults. The two files are given below to show how they are marked up for use with checksync. In this example, the checksums do not match the tagged content (though you are not expected to know that just by looking at the files – that's what checksync is for).

// This is a a javascript (or similar language) file

// sync-start:update_me 45678 __examples__/checksums_need_updating/b.py
const someCode = "does a thing";
console.log(someCode);
// sync-end:update_me

# Test file in Python style

# sync-start:update_me 4567 __examples__/checksums_need_updating/a.js
code = 1
# sync-end:update_me

Example output showing mismatched checksums

Additional examples that demonstrate various synchronization conditions and error cases can be found in the checksync code repository. To give checksync a try for yourself:

Install it from the npmjs.com repository:
yarn install checksync
Get the source from github.com/somewhatabstract/checksync and follow the usage instructions.

I hope you find this tool useful, and if you do or you have any questions, please do comment on this blog.

🙇🏻‍♂️ Introducing ancesdir

Photo by Maksym Kaharlytskyi on Unsplash

After many years of software development, I finally published my own NPM package. In fact, I published two. I was working on my checksync tool when I realised that I needed the package that this blog introduces. More on checksync in the next entry.

https://www.npmjs.com/package/ancesdir

🤔 What is root? Where is root?

Quite often, when working on some projects at Khan Academy, we need to know the root directory of the project. This enables us to write tools, linters, and tests that use root-relative paths, which in turn can make it much easier to refactor code. However, determining the root path of a project is not necessarily simple.

First, there is working out what identifies the root of a project. Is it the node_modules directory? The package.json file? The existence of .git folder? It may seem obvious to use one of these, but all these things have something in common; they don't necessarily exist. We can configure our package manager to have package.json and node_modules in non-standard places and we might change our source control, or not even run our code from within a clone of our repository. Determining the root folder by relying on any of these things as a marker is potentially not going to work.

Second, the code to walk the directory structure to find the given "marker" file or directory is not trivial. Sharing a common implementation within your project means everything that needs it, needs to locate it; in JavaScript, that means a relative path, at which point, you may as well just use a relative path to the known root directory and skip the shared approach all together. Yet, if you don't share a common implementation from a single location, then the code has to be duplicated everywhere you need it. I don't know about you, but that feels wrong.

💁🏻‍♂️ Solution: ancesdir

The issue of sharing a common implementation is easiest to solve. If that common implementation is installed as an NPM package, we don't need to include it via a relative path; we can just import it by its package name. There are packages out there that do this, but the ones I found all assumed some level of default setup, failing to acknowledge that this may change. In turn, they did not support a monorepo setup where there could be multiple sub-projects. How could one find the root folder of the monorepo from within a sub-project if all we used to identify the root folder were package.json? What if we wanted to sometimes get the root of the sub-project and sometimes the root of the monorepo?

I needed a way to identify a specific ancestor directory based on a known marker file or directory that would work even with non-standard setups. At Khan Academy, we have a marker file at the root of the project that is there solely to identify its parent directory as the project root. This file is agnostic of tech stack; it's just an empty file. It is solely there to say "this directory is the root directory". No tooling changes are going to render this mechanism broken unexpectedly unless they happen to use the same filename, which is unlikely. This way, we can find the repository root easily by locating that file. I wanted a package that could work just as easily with this custom marker file as it could with package.json.

I created ancesdir to fulfill these requirements¹.

yarn add ancesdir

The API is simple. In the default case, all you need to do is:

import ancesdir from "ancesdir";

console.log(`ancesdir's root directory is ${ancesdir()}`);

If you have a standard setup, with a package.json file, you will get the ancestor directory of the ancesdir package that contains that package.json file.

However, if you want the ancestor directory of the current file or a different path, you might use ancesdir like this:

import ancesdir from "ancesdir";

console.log(`This file's root directory is ${ancesdir(__dirname)}`);

In this example, we have given ancesdir a path from which to being its search. Of course, that still only works if there is an ancestor directory that contains a package.json file. What if that's not what you want?

For the more complex scenarios, like monorepos, for example, you can use ancesdir with a marker name, like this:

import ancesdir from "ancesdir";

console.log(`The monorepo root directory is ${ancesdir(__dirname, ".my_unique_root_marker_file")}`);

ancesdir will then give you the directory you seek (or null if it cannot be found). Not only that, but repeated requests will work faster as the results are cached as the directory tree is traversed.

Conclusion

If you find yourself needing a utility like this, checkout ancesdir. I hope y'all find it useful and I would love to hear if you do. You can checkout the source on GitHub.

The name is a play on the word "ancestor", while also attempting indicate that it has something to do with directories. I know, clever, right? [↩]

Visual Studio Codecode.visualstudio.com

Why it is great for software developers?

WordPresswordpress.org

Why it is great for software developers?

GitHubgithub.com

Why it is great for software developers?

🤷🏻‍♂️ That's it…

🤔 The Requirements

🎉 Check It Out

🤔 What is root? Where is root?

💁🏻‍♂️ Solution: ancesdir

yarn add ancesdir

Conclusion

Visual Studio Code
`code.visualstudio.com`

WordPress
`wordpress.org`

GitHub
`github.com`