yarn – somewhat abstract

🏗 Creating An Express Server

This is part 4 of my series on server-side rendering (SSR):

Over the previous three posts in this series we have described what server-side rendering (SSR) is, created a simple application using React, and discussed the architecture of a privacy-aware server to ensure we understand some of the sharp edges around SSR. Now, we will actually implement a basic server. Just as with the React application we created, the server we create will not be a complete solution, but it will provide a foundation from which we can continue to explore SSR.

✨ A New Project

Where do we start? Well, we need a server that can receive web requests and respond to them. For that, I am going to use Express¹, but first I need a project.

NOTE: Where you see yarn, know that you can use your own package manager as you see fit.

Add a new repository on GitHub (or your source control platform of choice).
Make a new folder locally for your code
cd to your new folder and run git init
git remote add origin <your github repo URL>
git pull origin master
git branch --set-upstream-to=origin/master
Create and commit a .gitignore file
Initialize it for JavaScript package management with yarn init
Run yarn install to generate the lock file
Commit the yarn.lock and package.json to the git repository

Great, so now we have a project we can start working on. Let's add Express.

yarn add express

This should update our package.json and yarn.lock, so don't forget to commit those changes. I also recommend pushing often to your remote repository, that way your code is backed up online in case your computer suffers a nasty accident².

👋🏻 Hello World!

At this point we need to write some code. We need to setup a route for our server that can handle providing a rendered result for any URL that our application might have. There are a couple of ways we could do this:

Assuming that our server is invoked by some intermediate layer, such as a cache, we could have the server implement a single route (e.g. /render) and pass the URL to be rendered as a query parameter.
Our server could assume the URL is to be rendered by the client code and just accept any URL.

Option 1 gives us a great deal of flexibility in what our server can do, but it forces us to ensure that there is a layer between the original browser request and our server, as something has to be responsible for constructing the appropriate /render route request. Option 2 removes the need to have an intermediate layer, but it perhaps restricts us from expanding server functionality. Of course, option 2 can be changed to option 1 if the need arises, so we can go with option 2 for now, knowing that later, it can be updated to suit changing needs.

Normally, I would add lots of other things to this server to improve development and runtime investigations, such as linters, testing, and logging, but for the sake of brevity, right now we will stick to the main functionality.

const express = require("express");

const port = 3000;
const app = express();

app.get("/*", (req, res) => res.send("Hello World!"));

app.listen(port, () => console.log(`Example app listening on port ${port}!`));

This is our index.js file. It is not doing a lot. On line 4, we create our express app. On line 6, we tell it that for any route matching /*, return Hello World!. On line 8, we tell it to listen for requests on port 3000³.

If we run this app with node index.js, we can go to our browser, visit any route starting with localhost:3000 and see the text, Hello World!. This is fantastic. We have a server and it is responding as we hope. Since we are going to run this often as we make changes, I will add a script to our package.json to run node index.js for us.

{
  "name": "hello-react-world-ssr",
  "version": "0.0.1",
  "description": "A server-side rendering server",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "express": "^4.17.1"
  },
  "scripts": {
    "start": "node index.js"
  }
}

In the package.json file shown above, I have highlighted the section I added containing the new start command. From now on, we can start our app with yarn start. The next step is getting our server to render our React application. Before we do that, consider these questions:

How does the server know about and load the code for our React application?
How does the server get the rendered result to send back?
How do we isolate render requests to avoid side-effects bleeding across requests?

🤔 The Hows

The answers to the first two questions have implications beyond the server itself, possibly influencing both our client application and any deployment process.

How our server knows about and loads our client application may affect how our server is deployed. Some server-side rendering solutions involve deploying the client-side code with the server so that it has direct access to the appropriate code, others use a mechanism such as looking up in a manifest to identify the files to load from a separate location (such as a content delivery network (CDN)). Neither of these is necessarily a bad choice – they both have their advantages and disadvantages. For example, deploying the server with the right code means:

✅The server has fast access to the client application it is rendering
✅The server can integrate nicely with the client application
❌The render server must be deployed every time the client application changes
❌The server is closely coupled to the client application

Whereas, looking up files in a manifest and loading them from elsewhere means:

✅The render server rarely requires updating
✅The server can render more than one application
❌The server will probably need to cache JavaScript files locally or be at the mercy of latency when communicating with the CDN
❌The client applications that the server renders likely need to include custom code to support being rendered by that server

Being aware of how these approaches differ – and they differ in more than just the ways I have suggested, is useful in understanding the trade-offs we must make when implementing our render server. Perhaps answering the second question will help us decided which route to take; consider how will our server get a rendered result of the client application?

Our server is going to invoke a call from the React framework that renders our React application to a string, rather than mounting it inside the DOM of a browser. To do that, it needs a React component to render, so it must load our client application and get the root-level component. In addition, assuming our render server is rendering the entire page and not just the React component, the server is likely going to need to gather additional information, such as which files must be loaded in the page, the page title, etc.

This whole process of capturing the application render and associated metadata requires interplay between the server code and the client code. Revisiting the first question and the two approaches I gave: if the server has the client code deployed with it, the server could know exactly which files to load to render the component, importing those directly and using them accordingly; if the server is less-closely coupled, we likely need some mechanism whereby the client application itself does more of the heavy lifting by hooking into some framework provided by the server, even if that is just exporting a specific object so that server can identify the appropriate things to coordinate rendering.

Ultimately, either we have a server that is custom built to our application, or we have a server that is built to support many applications. What to do? I say, dive in and try them both. To that end, next time we will look at the first option where the server knows all about the client application (though we may cut some corners to get to the salient points), and we will answer that third question; how do we isolate our renders?

🙇🏻‍♂️ In Conclusion

Herein we have created our server, though it does not do much yet. We have also considered two different approaches to connect our server to our client application: closely-coupled or more open, and we have started to think about how the server will isolate and respond to render requests.

This week's entry turned out a little longer than I had intended, and covered less things than I had hoped. Sometimes that is the way it goes. One of the biggest reasons I write these blogs is to discover what I do and do not know about something. Often in the effort of explaining it to someone else, I identify a bias that I have without any supporting evidence, or a topic I grasp that is far harder to explain than I expected.

Until next time, when we start to implement our server-side rendering, please leave a comment. Perhaps you have a question, a personal experience writing a render server, or want to take umbrage at something I have stated. I look forward to learning with you as we continue this journey into the land of SSR. 🗺

I find Express easy enough to use and well-supported, though there are other options that one could use instead if one were so inclined [↩]
A lesson from bitter experience; hard drives die (especially SSDs) without warning, drinks spill, laptops get dropped – keep your work backed up [↩]
The port is currently hard-coded for simplicity, but we could make this configurable [↩]

✨ Creating A React App

Photo by Rolands Zilvinskis on Unsplash

This is part 2 of my series on server-side rendering (SSR):

Last week, I gave my own brief history of web sites and how their frontend implementation has drifted from entirely server-based, to entirely client-based, and is now settling (perhaps) with hybrid that we call server-side rendering (SSR). The goal of this journey is to poke around the gnarly bones of SSR and learn what we learn. We may make mistakes, break idioms, and portray ourselves as fools, but we will definitely learn. For that reason, we are not going to bother with things like Next.js, which have already fleshed over and hidden away the gnarly bones for us¹.

Now, before we dip even further in the specific world of SSR, we are going to need an app. To be specific, we are going to need a React app.

Hello, React World!

Before creating a React app, we need a package manager; either npm or yarn will suffice. Though we could put together our own app from scratch, there is no need to as the handy create-react-app package exists. There are numerous ways to use this, but the easiest is to use yarn create or npx, which will do the work of obtaining the package and executing it all in one go.

For yarn, drop create from the front of create-react-app:

yarn create react-app <app-name-here>

For npx:

npx create-react-app <app-name-here>

I want to also put this in a git repository so I can track my changes. I would normally make a directory, run git init and then get started. In this case, we do not need to as create-react-app takes care of that for us.

So, let's begin. Open a terminal and invoke create-react-app.

yarn create react-app hello-react-world

After executing this, you will have a working React app that uses react-scripts to manage the basics. This is perfect for our initial journey. If we navigate to that directory and run the project, we can see our app in action.

cd hello-react-world
yarn start

Fantastic. We have an app. Before we do any more, let's get some remote source control underway. I am paranoid of my machine dying and losing all my work, so having an off machine place to store things is really useful. First, add a new repository on your source control site of choice (I prefer GitHub). Second, connect the local repository to the remote one:

git remote add origin <repo-url>
git fetch
git branch --set-upstream-to origin/master master

Since we want our local code to be the first commit and we're pushing to a brand new repository, we can force push what we have.

git push -f

Routing

Okay, we have an app and it is in source control. This is usually a good spot to spend some time setting up code quality tools like eslint and prettier. I am going to be naughty and skip right over that right now and save it for a different post, perhaps. Instead, let us add some routing to our fledgling application.

There are a few options for implementing routing in a React application (some frameworks, like Next.js, even provide it out of the box). We are going to use React Router. There are two variants of React Router; one for React on websites, and one for React Native on mobile. We want the website variant, which is provided by the react-router-dom package.

yarn add react-router-dom

Now we will edit our app to have a couple of routes. The main app is defined in the src/App.js file. It should look something like this. I have highlighted the lines we're going to replace; we are also going to add some too.

import React from 'react';
import logo from './logo.svg';
import './App.css';

function App() {
  return (
    <div className="App">
      <header className="App-header">
        <img src={logo} className="App-logo" alt="logo" />
        <p>
          Edit <code>src/App.js</code> and save to reload.
        </p>
        <a
          className="App-link"
          href="https://reactjs.org"
          target="_blank"
          rel="noopener noreferrer"
        >
          Learn React
        </a>
      </header>
    </div>
  );
}

export default App;

There are two things we want to add.

The routes to render our pages.
The links to navigate to our routes.

First we import four things from react-router-dom:

BrowserRouter
This is the root of our React Router-based navigation. Basically, the router is responsible for the routing (I'm sure you guessed that).
Link
This replaces the anchor tag (<a>) for our navigation.
Route
This is used to render a matched route.
Switch
This allows us to specify a table of possible routes that can be used to work out what should handle the URL currently being viewed.

With these things, we can then add some routes. I am adding Home, About, and Contact. Here is my app code after the edit. I have highlighted the new lines.

import React from 'react';
import {BrowserRouter, Link, Route, Switch} from "react-router-dom";
import logo from './logo.svg';
import './App.css';

function App() {
  return (
    <BrowserRouter>
      <div className="App">
        <header className="App-header">
          <img src={logo} className="App-logo" alt="logo" />
          <div className="App-links">
            <Link className="App-link" to="/">Home</Link>
            <Link className="App-link" to="/about">About</Link>
            <Link className="App-link" to="/contact">Contact</Link>
          </div>
        </header>
        <section className="App-content">
          <Switch>
            <Route path="/about">
              This is the about page!
            </Route>
            <Route path="/contact">
              This is the contact page!
            </Route>
            <Route path="/">
              This is the home page!
            </Route>
          </Switch>
        </section>
      </div>
    </BrowserRouter>
  );
}

export default App;

I also edited the CSS a little, but only to make things easier to see. The important bits are the router, wrapping our app, the Link components to perform navigation, and the Route components that render each route. With this, we now have a single page React app that has three pages for home, about, and contact.

This is going to be the application we will eventually render on the server. The important take away at this point is that we are not going to change the functionality of this app in order to achieve our aim. There are some changes we must make to support SSR, but we will not have two versions of the code. The code that runs in the browser will run on the server.

Next time, we are going to setup a server that will perform our SSR and consider what changes we need to make to our application infrastructure in order to support it. We might even get our first server-side rendered page. Until then, thanks for joining me on this continued exploration of server-side rendering using React.

of course, if starting a new project knowing you need SSR, you should explore solutions like Next.js [↩]

🙇🏻‍♂️ Introducing checksync

Photo by Clint Adair on Unsplash

Have you ever written code in more than one place that needs to stay in sync? Perhaps there is a tool in your framework of choice that can generate multiple files from a single source of truth, like T4 templates in the .NET world; perhaps not. Even if there is such a tool, it adds a layer of complexity that is not necessarily easy to grok. If you look at the output files or the template itself, it may not be clear what files are affected or related.

At Khan Academy, we have a linter, written in Python, that is executed whenever we create a new diff for review. It runs across a subset of our files and looks for blocks of text that are marked up with a custom comment format that identifies those blocks as being synchronized with other target blocks. Included in that markup is a checksum of the target block content such that if the target changes, we will get an error from the linter. This is our signal to check if further changes are need and then update the checksums that are invalidated. The only bugbear folks seem to have is that instead of offering an option to auto-fix checksums in need of update, it outputs a perl script that has to be copied and run for that purpose.

Small bugbear aside, this tool is fantastic. It enables us to link code blocks that need to be synchronized and catches when we change them with reasonably low overhead. Though I believe it is hugely useful, it is sadly custom to our codebase. I have long wanted to address that and create an open source version for everyone to use. checksync is that open source version.

🤔 The Requirements

Before writing checksync, I started out with the following requirements:

It should work with existing marked up code in the Khan Academy codebase; specifically,
1. File paths are relative to the project root directory
2. Checksums are calculated using Adler-32
3. Both // and # style comments are used to comment the markup tags
4. Start tag format is:
  sync-start:<ID> <CHECKSUM> <TARGET_FILE_PATH>
5. End tag format is:
  sync-end:<ID>
6. Multiple start tags can exist for the same tag ID but with different target files
7. Sync tags are not included in the checksum'd content
8. An extra line of blank content is included in the checksum'd content (due to a holdover from an earlier implementation)
9. .gitignore files should be ignored
10. Additional files can be ignored
It should be comparably performant to the existing linter
- The linter ran over the entire Khan Academy website codebase in less than 15 seconds
It should auto-update invalid checksums if asked to do so
It should output file paths such that editors like Visual Studio Code can open them on the correct line
It should support more comment styles
It should generally support any text file
It should run on Node 8 and above
- Some of our projects are still using Node 8 and I wanted to support those uses

With these requirements in mind, I implemented checksync (and ancesdir, which I ended up needing to ensure project root-relative file paths). By making it compatible with the existing Khan Academy linter, I could leverage the existing Khan Academy codebase to help measure performance and verify that things worked correctly. After a few changes to address various bugs and performance issues, it is still mildly slower than the Python equivalent, but the added features it provides more than make up for that (especially the fact that it is available to folks outside of our organization).

🎉 Check It Out

checksync includes a --help option to get information on usage. I have included the output below to give an overview of usage and the options available to customize how checksync runs.

checksync --help

checksync ✅ 🔗

Checksync uses tags in your files to identify blocks that need to remain
synchronised. It works on any text file as long as it can find the tags.

Tag Format

Each tagged block is identified by one or more sync-start tags and a single
sync-end tag.

The sync-start tags take the form:

    <comment> sync-start:<marker_id> <?checksum> <target_file>

The sync-end tags take the form:

    <comment> sync-end:<marker_id>

Each marker_idcan have multiple sync-start tags, each with a different
target file, but there must be only one corresponding sync-endtag.

Where:

    <comment>       is one of the comment tokens provided by the --comment
                    argument

    <marker_id>     is the unique identifier for this marker

    <checksum>      is the expected checksum of the corresponding block in
                    the target file

    <target_file>   is the path from your package root to the target file
                    with a corresponding sync block with the same marker_id

Usage

checksync <arguments> <include_globs>

Where:

    <arguments>       are the arguments you provide (see below)

    <include_globs>   are glob patterns for identifying files to check

Arguments

    --comments,-c      A string containing comma-separated tokens that
                       indicate the start of lines where tags appear.
                       Defaults to "//,#".

    --dry-run,-n       Ignored unless supplied with --update-tags.

    --help,-h          Outputs this help text.

    --ignore,-i        A string containing comma-separated globs that identify
                       files that should not be checked.

    --ignore-files     A comma-separated list of .gitignore-like files that
                       provide path patterns to be ignored. These will be
                       combined with the --ignore globs.
                       Ignored if --no-ignore-file is present.
                       Defaults to .gitignore.

    --no-ignore-file   When true, does not use any ignore file. This is
                       useful when the default value for --ignore-file is not
                       wanted.

    --root-marker,-m   By default, the root directory (used to generate
                       interpret and generate target paths for sync-start
                       tags) for your project is determined by the nearest
                       ancestor directory to the processed files that
                       contains a package.json file. If you want to
                       use a different file or directory to identify your
                       root directory, specify that using this argument.
                       For example, --root-marker .gitignore would mean
                       the first ancestor directory containing a
                       .gitignore file.

    --update-tags,-u   Updates tags with incorrect target checksums. This
                       modifies files in place; run with --dry-run to see what
                       files will change without modifying them.

    --verbose          More details will be added to the output when this
                       option is provided. This is useful when determining if
                       provided glob patterns are applying as expected, for
                       example.

And here is a simple example (taken from the checksync code repository) of running checksync against a directory with two files, using the defaults. The two files are given below to show how they are marked up for use with checksync. In this example, the checksums do not match the tagged content (though you are not expected to know that just by looking at the files – that's what checksync is for).

// This is a a javascript (or similar language) file

// sync-start:update_me 45678 __examples__/checksums_need_updating/b.py
const someCode = "does a thing";
console.log(someCode);
// sync-end:update_me

# Test file in Python style

# sync-start:update_me 4567 __examples__/checksums_need_updating/a.js
code = 1
# sync-end:update_me

Example output showing mismatched checksums

Additional examples that demonstrate various synchronization conditions and error cases can be found in the checksync code repository. To give checksync a try for yourself:

Install it from the npmjs.com repository:
yarn install checksync
Get the source from github.com/somewhatabstract/checksync and follow the usage instructions.

I hope you find this tool useful, and if you do or you have any questions, please do comment on this blog.

🙇🏻‍♂️ Introducing ancesdir

Photo by Maksym Kaharlytskyi on Unsplash

After many years of software development, I finally published my own NPM package. In fact, I published two. I was working on my checksync tool when I realised that I needed the package that this blog introduces. More on checksync in the next entry.

https://www.npmjs.com/package/ancesdir

🤔 What is root? Where is root?

Quite often, when working on some projects at Khan Academy, we need to know the root directory of the project. This enables us to write tools, linters, and tests that use root-relative paths, which in turn can make it much easier to refactor code. However, determining the root path of a project is not necessarily simple.

First, there is working out what identifies the root of a project. Is it the node_modules directory? The package.json file? The existence of .git folder? It may seem obvious to use one of these, but all these things have something in common; they don't necessarily exist. We can configure our package manager to have package.json and node_modules in non-standard places and we might change our source control, or not even run our code from within a clone of our repository. Determining the root folder by relying on any of these things as a marker is potentially not going to work.

Second, the code to walk the directory structure to find the given "marker" file or directory is not trivial. Sharing a common implementation within your project means everything that needs it, needs to locate it; in JavaScript, that means a relative path, at which point, you may as well just use a relative path to the known root directory and skip the shared approach all together. Yet, if you don't share a common implementation from a single location, then the code has to be duplicated everywhere you need it. I don't know about you, but that feels wrong.

💁🏻‍♂️ Solution: ancesdir

The issue of sharing a common implementation is easiest to solve. If that common implementation is installed as an NPM package, we don't need to include it via a relative path; we can just import it by its package name. There are packages out there that do this, but the ones I found all assumed some level of default setup, failing to acknowledge that this may change. In turn, they did not support a monorepo setup where there could be multiple sub-projects. How could one find the root folder of the monorepo from within a sub-project if all we used to identify the root folder were package.json? What if we wanted to sometimes get the root of the sub-project and sometimes the root of the monorepo?

I needed a way to identify a specific ancestor directory based on a known marker file or directory that would work even with non-standard setups. At Khan Academy, we have a marker file at the root of the project that is there solely to identify its parent directory as the project root. This file is agnostic of tech stack; it's just an empty file. It is solely there to say "this directory is the root directory". No tooling changes are going to render this mechanism broken unexpectedly unless they happen to use the same filename, which is unlikely. This way, we can find the repository root easily by locating that file. I wanted a package that could work just as easily with this custom marker file as it could with package.json.

I created ancesdir to fulfill these requirements¹.

yarn add ancesdir

The API is simple. In the default case, all you need to do is:

import ancesdir from "ancesdir";

console.log(`ancesdir's root directory is ${ancesdir()}`);

If you have a standard setup, with a package.json file, you will get the ancestor directory of the ancesdir package that contains that package.json file.

However, if you want the ancestor directory of the current file or a different path, you might use ancesdir like this:

import ancesdir from "ancesdir";

console.log(`This file's root directory is ${ancesdir(__dirname)}`);

In this example, we have given ancesdir a path from which to being its search. Of course, that still only works if there is an ancestor directory that contains a package.json file. What if that's not what you want?

For the more complex scenarios, like monorepos, for example, you can use ancesdir with a marker name, like this:

import ancesdir from "ancesdir";

console.log(`The monorepo root directory is ${ancesdir(__dirname, ".my_unique_root_marker_file")}`);

ancesdir will then give you the directory you seek (or null if it cannot be found). Not only that, but repeated requests will work faster as the results are cached as the directory tree is traversed.

Conclusion

If you find yourself needing a utility like this, checkout ancesdir. I hope y'all find it useful and I would love to hear if you do. You can checkout the source on GitHub.

The name is a play on the word "ancestor", while also attempting indicate that it has something to do with directories. I know, clever, right? [↩]