🖥 Our first server-side render

Photo by Markus Spiske on Unsplash

This is part 5 of my series on server-side rendering (SSR):

  1. 🤷🏻‍♂️ What is server-side rendering (SSR)?
  2. ✨ Creating A React App
  3. 🎨 Architecting a privacy-aware render server
  4. 🏗 Creating An Express Server
  5. [You are here] 🖥 Our first server-side render
  6. 🖍 Combining React Client and Render Server for SSR
  7. ⚡️ Static Router, Static Assets, Serving A Server-side Rendered Site
  8. 💧 Hydration and Server-side Rendering
  9. 🦟 Debugging and fixing hydration issues
  10. 🛑 React Hydration Error Indicator
  11. 🧑🏾‍🎨 Render Gateway: A Multi-use Render Server

Over the last few weeks, we have been chipping away at server-side rendering and how to implement it. In the last post, we created a server; in this one, we will see if we can make that server render a page containing some server-side rendered React. If you recall from last time, there are two ways we can approach implementing our render server:

  1. Standalone with a way to pass our React app to it
  2. Integrated so that it knows all about our React app

Both approaches start from common origins; they both need a server that can render a React component inside a page. By the end of this post, we should be able to request a URL from our server and receive a rendered HTML page with some rendered React embedded inside of it. This will give us some fundamentals that we can then use next time to finally render our client-side application.


🖌 Rendering React on the server

app.get("/*", (req, res) => res.send("Hello World!"));

The server that we made last time will be the basis for our solution. Above is our current route handler for the server. Regardless of the get request, the server responds with Hello World!. What we want to do is to replace Hello World! with the rendered React component tree embedded within an HTML page.

When rendering in a browser, our React application is mounted so that it will dynamically update based on events like mouse movements, network requests, etc. When we are rendering on the server, we do not want all that. In fact, we do not even have a DOM like the browser does in which to create elements and event handlers and the like1. Instead of mounting the React application, we want to capture the very first render of the application and stop. React provides a methods for doing things like this in the React DOM package. We are going to use its renderToString method2.

The renderToString method takes a React component and gives us back a string of the markup that is initially rendered by that component. Before we can try it, we need to add the appropriate packages to our server: react and react-dom.

yarn add react react-dom

Now, in theory, we can render some React. Sadly, just updating our route handler with a component as shown below will not work.

app.get("/*", (req, res) => res.send(
    renderToString(<div>Hello World!</div>)
));

If we add this and then run our server with yarn start, we get a rather cryptic error output.

app.get("/*", (req, res) => res.send(renderToString(<div>Hello World!</div>)));
                                                    ^

SyntaxError: Unexpected token <
    at Module._compile (internal/modules/cjs/loader.js:723:23)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
    at Function.Module._load (internal/modules/cjs/loader.js:585:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:831:12)
    at startup (internal/bootstrap/node.js:283:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:623:3)

The issue is, our server has no idea how to handle JSX syntax (the embedded HTML-like description of our React component; <div>Hello World!</div>). Our client-side application works because the create-react-app package sets up some tools to process JSX files and turn them into valid JavaScript that can be understood by a modern browser. Our server-side application does not have any of that magic and so it does not work.

Rather than spending time to add that magic into our server, it feels more appropriate to assume our server is just another browser and that our client-side code will already be transformed into JavaScript by the time we see it3. Instead, just to test our React rendering, we can replace the JSX with its transpiled counterpart, which is a call to React.createElement. React.createElement takes the component being rendered and its props. In our case, we are rendering an HTML div element. These are special cases where a string is used to represent them, rather than a real React component type. Therefore, our simple JSX example becomes; note how the text is passed as the children of the component.

app.get("/*", (req, res) => res.send(
    renderToString(React.createElement("div", { children: "Hello World!" }))),
);

If we now yarn start our server application, it runs and when we visit http://localhost:3000, we see our Hello World! text. This is great. It means that given a suitably transpiled React component, we can server-side render it. Now that we have the ability to render a component, we need to embed that rendered component inside a full HTML page.

📄 The Page Template

OK, so we have some HTML that represents our rendered component. Now we need to put that into an HTML page, we need to think about what that page looks like. What does the page include? We can revisit the React app we made and see for ourselves.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <link rel="icon" href="/favicon.ico" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <meta name="theme-color" content="#000000" />
    <meta
      name="description"
      content="Web site created using create-react-app"
    />
    <link rel="apple-touch-icon" href="/logo192.png" />
    <!--
      manifest.json provides metadata used when your web app is installed on a
      user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
    -->
    <link rel="manifest" href="/manifest.json" />
    <!--
      Notice the use of  in the tags above.
      It will be replaced with the URL of the `public` folder during the build.
      Only files inside the `public` folder can be referenced from the HTML.

      Unlike "/favicon.ico" or "favicon.ico", "/favicon.ico" will
      work correctly both with client-side routing and a non-root public URL.
      Learn how to configure a non-root public URL by running `npm run build`.
    -->
    <title>React App</title>
  </head>
  <body>
    <noscript>You need to enable JavaScript to run this app.</noscript>
    <div id="root"></div>
    <!--
      This HTML file is a template.
      If you open it directly in the browser, you will see an empty page.

      You can add webfonts, meta tags, or analytics to this file.
      The build step will place the bundled scripts into the <body> tag.

      To begin the development, run `npm start` or `yarn start`.
      To create a production bundle, use `npm run build` or `yarn build`.
    -->
  <script src="/static/js/bundle.js"></script><script src="/static/js/0.chunk.js"></script><script src="/static/js/main.chunk.js"></script></body>
</html>

Above is the development-time HTML template that is used with our simple React app. I have highlighted some important sections.

  1. The head containing page metadata, including title, description, favicon, etc.
  2. Scaffold body to provide a mounting point for our React component
  3. Scripts

The head content is static4 and the scripts are inserted by the build operation of our client app. The bit that matters to us is the mounting point for our React app, <div id="root"></div>, as this is where anything we render will need to be inserted by our server-side rendering operation.

Of course, we want to do all this with production code. To see how that affects things, we can run yarn build in our React app. Running this creates a build folder with all sorts of things in it, including a slightly different version of our HTML template (we will perhaps consider the other files another time).

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <link rel="icon" href="/favicon.ico" />
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <meta name="theme-color" content="#000000" />
    <meta
      name="description"
      content="Web site created using create-react-app"
    />
    <link rel="apple-touch-icon" href="/logo192.png" />
    <link rel="manifest" href="/manifest.json" />
    <title>React App</title>
    <link href="/static/css/main.b0083702.chunk.css" rel="stylesheet" />
  </head>
  <body>
    <noscript>You need to enable JavaScript to run this app.</noscript>
    <div id="root"></div>
    <script>
      !(function(f) {
           // SNIPPED FOR CLARITY
      })([]);
    </script>
    <script src="/static/js/2.78e6b881.chunk.js"></script>
    <script src="/static/js/main.dcbf6a7c.chunk.js"></script>
  </body>
</html>

This looks a little different than what we had before, but it is not as different as you may think. We still have the head element metadata (though this time it includes a CSS file, which was not there in the development version), we still have the scripts (though there is now some inlined scripting that wasn't there before, which I have snipped out just to make things a little more readable), and most importantly, we still have our mounting point, <div id="root"></div>.

Given this information, we can update our server application to return a full page containing our rendered component. For our purposes here, we will hard code a simple page template. Eventually, we can replace this simple template and the React component being rendered with the production output of our client application.

🖼 Rendering the page and the component together

Using what we have learned here, I have modified the server as follows.

const express = require("express");
const React = require("react");
const {renderToString} = require("react-dom/server");

const port = 3000;
const app = express();

const pageTemplate = `<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta
      name="description"
      content="SSR result"
    />
    <title>React App</title>
  </head>
  <body>
    <noscript>You need to enable JavaScript to run this app.</noscript>
    <div id="root"></div>
  </body>
</html>
`;

const renderPage = (reactComponent) => {
    const renderedComponent = renderToString(reactComponent);
    return pageTemplate.replace('<div id="root"></div>', `<div id="root">${renderedComponent}</div>`);
};

app.get("/*", (req, res) => res.send(
    renderPage(React.createElement("div", {children: "Hello World!"})),
));

app.listen(port, () => console.log(`Example app listening on port ${port}!`));

We have a page template string called pageTemplate. Then we have a renderPage method that does a simple replace operation to replace <div id="root"></div> in our template with the same div containing our rendered React component. Finally, in the get handler, the renderPage method is invoked with our React component.

If we yarn start this version of the server and visit http://localhost:3000, viewing the resultant page source gives us the following HTML.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta
      name="description"
      content="SSR result"
    />
    <title>React App</title>
  </head>
  <body>
    <noscript>You need to enable JavaScript to run this app.</noscript>
    <div id="root"><div data-reactroot="">Hello World!</div></div>
  </body>
</html>

On the highlighted line, you can see the inserted server-side rendered React code. Success! We haven't even loaded any scripts client-side to see this result. Of course, if we want an app that a user can interact with, we are going to need to change that. Join me next time when we work out how to integrate our server with our client-side application in order to get our very first server-side rendered app. Of course, that does not mean we will have reached our destination on this server-side rendering adventure; on the contrary, it feels like we have barely begun.

Thanks again for reading. I hope that something you find here is useful. Please comment as you see fit. 💝

  1. We could introduce a DOM using a library like JSDOM. However, that is an extra step that we would like to avoid as it increases the latency of our server-side rendering process. Instead, we should aim to make sure our React app does not rely on there being a DOM present at all. []
  2. It is worth looking at the other options, such as renderToNodeStream, to see what they can offer you and your specific SSR challenges []
  3. You may notice that this starts to lead us down one of our two paths; does the server know all about the client code, or does it get blindly provided somehow? []
  4. For now, let's assume that the page metadata (item 1, above) remains static; while we can certainly build in a mechanisms to make this dynamic, such as changing page title when the selected route is different, that will over-complicate things at this stage. []

🏗 Creating An Express Server

Photo by Kelly Sikkema on Unsplash

This is part 4 of my series on server-side rendering (SSR):

  1. 🤷🏻‍♂️ What is server-side rendering (SSR)?
  2. ✨ Creating A React App
  3. 🎨 Architecting a privacy-aware render server
  4. [You are here] 🏗 Creating An Express Server
  5. 🖥 Our first server-side render
  6. 🖍 Combining React Client and Render Server for SSR
  7. ⚡️ Static Router, Static Assets, Serving A Server-side Rendered Site
  8. 💧 Hydration and Server-side Rendering
  9. 🦟 Debugging and fixing hydration issues
  10. 🛑 React Hydration Error Indicator
  11. 🧑🏾‍🎨 Render Gateway: A Multi-use Render Server

Over the previous three posts in this series we have described what server-side rendering (SSR) is, created a simple application using React, and discussed the architecture of a privacy-aware server to ensure we understand some of the sharp edges around SSR. Now, we will actually implement a basic server. Just as with the React application we created, the server we create will not be a complete solution, but it will provide a foundation from which we can continue to explore SSR.

✨ A New Project

Where do we start? Well, we need a server that can receive web requests and respond to them. For that, I am going to use Express1, but first I need a project.

NOTE: Where you see yarn, know that you can use your own package manager as you see fit.

  1. Add a new repository on GitHub (or your source control platform of choice).
  2. Make a new folder locally for your code
  3. cd to your new folder and run git init
  4. git remote add origin <your github repo URL>
  5. git pull origin master
  6. git branch --set-upstream-to=origin/master
  7. Create and commit a .gitignore file
  8. Initialize it for JavaScript package management with yarn init
  9. Run yarn install to generate the lock file
  10. Commit the yarn.lock and package.json to the git repository

Great, so now we have a project we can start working on. Let's add Express.

yarn add express

This should update our package.json and yarn.lock, so don't forget to commit those changes. I also recommend pushing often to your remote repository, that way your code is backed up online in case your computer suffers a nasty accident2.

👋🏻 Hello World!

At this point we need to write some code. We need to setup a route for our server that can handle providing a rendered result for any URL that our application might have. There are a couple of ways we could do this:

  1. Assuming that our server is invoked by some intermediate layer, such as a cache, we could have the server implement a single route (e.g. /render) and pass the URL to be rendered as a query parameter.
  2. Our server could assume the URL is to be rendered by the client code and just accept any URL.

Option 1 gives us a great deal of flexibility in what our server can do, but it forces us to ensure that there is a layer between the original browser request and our server, as something has to be responsible for constructing the appropriate /render route request. Option 2 removes the need to have an intermediate layer, but it perhaps restricts us from expanding server functionality. Of course, option 2 can be changed to option 1 if the need arises, so we can go with option 2 for now, knowing that later, it can be updated to suit changing needs.

Normally, I would add lots of other things to this server to improve development and runtime investigations, such as linters, testing, and logging, but for the sake of brevity, right now we will stick to the main functionality.

const express = require("express");

const port = 3000;
const app = express();

app.get("/*", (req, res) => res.send("Hello World!"));

app.listen(port, () => console.log(`Example app listening on port ${port}!`));

This is our index.js file. It is not doing a lot. On line 4, we create our express app. On line 6, we tell it that for any route matching /*, return Hello World!. On line 8, we tell it to listen for requests on port 30003.

If we run this app with node index.js, we can go to our browser, visit any route starting with localhost:3000 and see the text, Hello World!. This is fantastic. We have a server and it is responding as we hope. Since we are going to run this often as we make changes, I will add a script to our package.json to run node index.js for us.

{
  "name": "hello-react-world-ssr",
  "version": "0.0.1",
  "description": "A server-side rendering server",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "express": "^4.17.1"
  },
  "scripts": {
    "start": "node index.js"
  }
}

In the package.json file shown above, I have highlighted the section I added containing the new start command. From now on, we can start our app with yarn start. The next step is getting our server to render our React application. Before we do that, consider these questions:

  1. How does the server know about and load the code for our React application?
  2. How does the server get the rendered result to send back?
  3. How do we isolate render requests to avoid side-effects bleeding across requests?

🤔 The Hows

The answers to the first two questions have implications beyond the server itself, possibly influencing both our client application and any deployment process.

How our server knows about and loads our client application may affect how our server is deployed. Some server-side rendering solutions involve deploying the client-side code with the server so that it has direct access to the appropriate code, others use a mechanism such as looking up in a manifest to identify the files to load from a separate location (such as a content delivery network (CDN)). Neither of these is necessarily a bad choice – they both have their advantages and disadvantages. For example, deploying the server with the right code means:

  • ✅The server has fast access to the client application it is rendering
  • ✅The server can integrate nicely with the client application
  • ❌The render server must be deployed every time the client application changes
  • ❌The server is closely coupled to the client application

Whereas, looking up files in a manifest and loading them from elsewhere means:

  • ✅The render server rarely requires updating
  • ✅The server can render more than one application
  • ❌The server will probably need to cache JavaScript files locally or be at the mercy of latency when communicating with the CDN
  • ❌The client applications that the server renders likely need to include custom code to support being rendered by that server

Being aware of these approaches differ – and they differ in more than just the ways I have suggested, is useful in understanding the trade-offs we must make when implementing our render server. Perhaps answering the second question will help us decided which route to take; consider how will our server get a rendered result of the client application?

Our server is going to invoke a call from the React framework that renders our React application to a string, rather than mounting it inside the DOM of a browser. To do that, it needs a React component to render, so it must load our client application and get the root-level component. In addition, assuming our render server is rendering the entire page and not just the React component, the server is likely going to need to gather additional information, such as which files must be loaded in the page, the page title, etc.

This whole process of capturing the application render and associated metadata requires interplay between the server code and the client code. Revisiting the first question and the two approaches I gave: if the server has the client code deployed with it, the server could know exactly which files to load to render the component, importing those directly and using them accordingly; if the server is less-closely coupled, we likely need some mechanism whereby the client application itself does more of the heavy lifting by hooking into some framework provided by the server, even if that is just exporting a specific object so that server can identify the appropriate things to coordinate rendering.

Ultimately, either we have a server that is custom built to our application, or we have a server that is built to support many applications. What to do? I say, dive in and try them both. To that end, next time we will look at the first option where the server knows all about the client application (though we may cut some corners to get to the salient points), and we will answer that third question; how do we isolate our renders?

🙇🏻‍♂️ In Conclusion

Herein we have created our server, though it does not do much yet. We have also considered two different approaches to connect our server to our client application: closely-coupled or more open, and we have started to think about how the server will isolate and respond to render requests.

This week's entry turned out a little longer than I had intended, and covered less things than I had hoped. Sometimes that is the way it goes. One of the biggest reasons I write these blogs is to discover what I do and do not know about something. Often in the effort of explaining it to someone else, I identify a bias that I have without any supporting evidence, or a topic I grasp that is far harder to explain than I expected.

Until next time, when we start to implement our server-side rendering, please leave a comment. Perhaps you have a question, a personal experience writing a render server, or want to take umbrage at something I have stated. I look forward to learning with you as we continue this journey into the land of SSR. 🗺

  1. I find Express easy enough to use and well-supported, though there are other options that one could use instead if one were so inclined []
  2. A lesson from bitter experience; hard drives die (especially SSDs) without warning, drinks spill, laptops get dropped – keep your work backed up []
  3. The port is currently hard-coded for simplicity, but we could make this configurable []

🤷🏻‍♂️ What is server-side rendering (SSR)?

Featured image modified from photo by Andre Mouton on Unsplash

This is part 1 of my series on server-side rendering (SSR):

  1. [You are here] 🤷🏻‍♂️ What is server-side rendering (SSR)?
  2. ✨ Creating A React App
  3. 🎨 Architecting a privacy-aware render server
  4. 🏗 Creating An Express Server
  5. 🖥 Our first server-side render
  6. 🖍 Combining React Client and Render Server for SSR
  7. ⚡️ Static Router, Static Assets, Serving A Server-side Rendered Site
  8. 💧 Hydration and Server-side Rendering
  9. 🦟 Debugging and fixing hydration issues
  10. 🛑 React Hydration Error Indicator
  11. 🧑🏾‍🎨 Render Gateway: A Multi-use Render Server

One of my main responsibilities at work involves server-side rendering (SSR). From managing the services that perform SSR to the client components that developers use to build SSR-able frontends, I have my focus on many pieces of our frontend infrastructure. In this series of posts, I want to share some of the things I have learned and perhaps demystify this mostly fantastic approach to creating performant, stable, web experiences.

When the Internet started coming alive the first time, a lot of the magic was implemented on servers (aka server-side or "on the backend") that built HTML pages to deliver to web browsers. During this period, impressive collections of user interface components were created to make developing these server-based web apps easier and more reliable. Sadly, sites often felt a clunky and slow because the browser just was not equipped to do much beyond rendering the HTML it was given; JavaScript execution was too slow for anything very meaningful. Even button clicks in the browser would cause a new request to the server that would then generate a whole new page for the browser.

Then Google Chrome and the v8 JavaScript engine came along and changed everything. Browsers became blessed with speedy JavaScript engines. That meant we could do a lot of this work in the browser (aka client-side or "on the frontend") and develop applications that could properly divide presentation (the application running in the browser) from data (the database and CRUD1 operations running in the backend). From this new power came the concept of the single page app (SPA), where one page comes from the server and then does most of its work client-side, deferring to the backend only when data is read or written. Often, the page is received from the server in an initial state and then subsequent data requests may populate that page (imagine your Facebook feed loading) to get it ready for you to use. However, this can mean that the time to interactive – the length of time before a user can actually use the page – is long. This affects all sorts of things, but particularly user retention. Folks don't like waiting and if they wait too long, they become frustrated and eventually bounce2.

Much like in the backend era prior to faster frontend JavaScript execution, new frameworks and user interface components have appeared that help to create powerful web apps using browser-based JavaScript. Things like Angular, Ember, and React (there are more – there are always more3). However, there can still be a mismatch between backend and frontend. To get a nice experience for our users, code runs on the backend to build an initial page and then that is handed to the frontend, which promptly takes over. Sometimes, this transition is nice and smooth, but other times it is not. More importantly, there are at least two different code paths for generating the page; at least one backend one and at least one frontend one4.

Having more than one code path trying to do equivalent work is hard to maintain. A change in one place may or may not need a change in the other, and either way, careful quality engineering is needed to make sure bugs are not introduced. The shift away from web apps executing entirely on the backend but rendered on the frontend to being executed mostly on the frontend with a bit of backend increased the complexity of the code for anyone that wanted a performant, engaging web site. You had a choice; either keep the separation of frontend for presentation and backend for data, and have a slower initial website experience, or blur the line and have more complex code, but a nicer user experience. Thankfully, folks thought about this and like those responsible for React, came up with a solution – server-side rendering (SSR).

Thanks to the JavaScript Revolution that started with v8, we now are able to run JavaScript outside of our web browsers (using NodeJS, for example). This creates some interesting opportunities for running the same code in both the frontend and backend. This does not mean that all the code would run in two places – we may want to keep the CRUD operations as a backend thing; however, being able to run our presentation code in both places means we can overcome some of that delay when a user first visits a page of our site. We can use the same JavaScript that would render our page in the web browser to render a version of our page on the server and then let the browser take over, all with a single codebase5.

Server-side rendering (SSR) – The rendering of a web page on a server rather than in a browser

And in the context of what I want to write about, that is server-side rendering (more commonly referred to as SSR, at least by me, anyway) – the rendering of a web page on a server rather than in a browser. In fact, it's so similar to rendering in a web browser, I have started to refer to the server responsible for SSR as a server-side browser. This tends to reframe how folks think of problems they face and how to start thinking about frontend code not as "does this run in the server or the client?" but "what browsers does this have to support?". It turns out that second question is much more familiar to most frontend developers than the first.

For now, I will leave things there. I think this post is quite long enough. Thank you for reading. Over the next few posts, we will look at creating an app using React that supports SSR, as well as a backend browser to perform that SSR, and the implications that SSR has when it comes to writing frontend code.

  1. Create Read Update Delete []
  2. there's a reason it's called "bounce rate" []
  3. While I was writing this, I expect three more frontend frameworks came into being and at least one died []
  4. There are cases where different parts of the same page are rendered by different services; front or backend – talk about complicated []
  5. Not only that, but the server response could be cached with a CDN (content delivery network) to make our sites even faster! []

🙇🏻‍♂️ Introducing checksync

Photo by Clint Adair on Unsplash

Have you ever written code in more than one place that needs to stay in sync? Perhaps there is a tool in your framework of choice that can generate multiple files from a single source of truth, like T4 templates in the .NET world; perhaps not. Even if there is such a tool, it adds a layer of complexity that is not necessarily easy to grok. If you look at the output files or the template itself, it may not be clear what files are affected or related.

At Khan Academy, we have a linter, written in Python, that is executed whenever we create a new diff for review. It runs across a subset of our files and looks for blocks of text that are marked up with a custom comment format that identifies those blocks as being synchronized with other target blocks. Included in that markup is a checksum of the target block content such that if the target changes, we will get an error from the linter. This is our signal to check if further changes are need and then update the checksums that are invalidated. The only bugbear folks seem to have is that instead of offering an option to auto-fix checksums in need of update, it outputs a perl script that has to be copied and run for that purpose.

Small bugbear aside, this tool is fantastic. It enables us to link code blocks that need to be synchronized and catches when we change them with reasonably low overhead. Though I believe it is hugely useful, it is sadly custom to our codebase. I have long wanted to address that and create an open source version for everyone to use. checksync is that open source version.

🤔 The Requirements

Before writing checksync, I started out with the following requirements:

  • It should work with existing marked up code in the Khan Academy codebase; specifically,
    1. File paths are relative to the project root directory
    2. Checksums are calculated using Adler-32
    3. Both // and # style comments are used to comment the markup tags
    4. Start tag format is:
      sync-start:<ID> <CHECKSUM> <TARGET_FILE_PATH>
    5. End tag format is:
      sync-end:<ID>
    6. Multiple start tags can exist for the same tag ID but with different target files
    7. Sync tags are not included in the checksum'd content
    8. An extra line of blank content is included in the checksum'd content (due to a holdover from an earlier implementation)
    9. .gitignore files should be ignored
    10. Additional files can be ignored
  • It should be comparably performant to the existing linter
    • The linter ran over the entire Khan Academy website codebase in less than 15 seconds
  • It should auto-update invalid checksums if asked to do so
  • It should output file paths such that editors like Visual Studio Code can open them on the correct line
  • It should support more comment styles
  • It should generally support any text file
  • It should run on Node 8 and above
    • Some of our projects are still using Node 8 and I wanted to support those uses

With these requirements in mind, I implemented checksync (and ancesdir, which I ended up needing to ensure project root-relative file paths). By making it compatible with the existing Khan Academy linter, I could leverage the existing Khan Academy codebase to help measure performance and verify that things worked correctly. After a few changes to address various bugs and performance issues, it is still mildly slower than the Python equivalent, but the added features it provides more than make up for that (especially the fact that it is available to folks outside of our organization).

🎉 Check It Out

checksync includes a --help option to get information on usage. I have included the output below to give an overview of usage and the options available to customize how checksync runs.

checksync --help
checksync ✅ 🔗

Checksync uses tags in your files to identify blocks that need to remain
synchronised. It works on any text file as long as it can find the tags.

Tag Format

Each tagged block is identified by one or more sync-start tags and a single
sync-end tag.

The sync-start tags take the form:

    <comment> sync-start:<marker_id> <?checksum> <target_file>

The sync-end tags take the form:

    <comment> sync-end:<marker_id>

Each marker_idcan have multiple sync-start tags, each with a different
target file, but there must be only one corresponding sync-endtag.

Where:

    <comment>       is one of the comment tokens provided by the --comment
                    argument

    <marker_id>     is the unique identifier for this marker

    <checksum>      is the expected checksum of the corresponding block in
                    the target file

    <target_file>   is the path from your package root to the target file
                    with a corresponding sync block with the same marker_id

Usage

checksync <arguments> <include_globs>

Where:

    <arguments>       are the arguments you provide (see below)

    <include_globs>   are glob patterns for identifying files to check

Arguments

    --comments,-c      A string containing comma-separated tokens that
                       indicate the start of lines where tags appear.
                       Defaults to "//,#".

    --dry-run,-n       Ignored unless supplied with --update-tags.

    --help,-h          Outputs this help text.

    --ignore,-i        A string containing comma-separated globs that identify
                       files that should not be checked.

    --ignore-files     A comma-separated list of .gitignore-like files that
                       provide path patterns to be ignored. These will be
                       combined with the --ignore globs.
                       Ignored if --no-ignore-file is present.
                       Defaults to .gitignore.

    --no-ignore-file   When true, does not use any ignore file. This is
                       useful when the default value for --ignore-file is not
                       wanted.

    --root-marker,-m   By default, the root directory (used to generate
                       interpret and generate target paths for sync-start
                       tags) for your project is determined by the nearest
                       ancestor directory to the processed files that
                       contains a package.json file. If you want to
                       use a different file or directory to identify your
                       root directory, specify that using this argument.
                       For example, --root-marker .gitignore would mean
                       the first ancestor directory containing a
                       .gitignore file.

    --update-tags,-u   Updates tags with incorrect target checksums. This
                       modifies files in place; run with --dry-run to see what
                       files will change without modifying them.

    --verbose          More details will be added to the output when this
                       option is provided. This is useful when determining if
                       provided glob patterns are applying as expected, for
                       example.

And here is a simple example (taken from the checksync code repository) of running checksync against a directory with two files, using the defaults. The two files are given below to show how they are marked up for use with checksync. In this example, the checksums do not match the tagged content (though you are not expected to know that just by looking at the files – that's what checksync is for).

// This is a a javascript (or similar language) file

// sync-start:update_me 45678 __examples__/checksums_need_updating/b.py
const someCode = "does a thing";
console.log(someCode);
// sync-end:update_me
# Test file in Python style

# sync-start:update_me 4567 __examples__/checksums_need_updating/a.js
code = 1
# sync-end:update_me
Example output showing mismatched checksums

Additional examples that demonstrate various synchronization conditions and error cases can be found in the checksync code repository. To give checksync a try for yourself:

I hope you find this tool useful, and if you do or you have any questions, please do comment on this blog.

🙇🏻‍♂️ Introducing ancesdir

Photo by Maksym Kaharlytskyi on Unsplash

After many years of software development, I finally published my own NPM package. In fact, I published two. I was working on my checksync tool when I realised that I needed the package that this blog introduces. More on checksync in the next entry.

https://www.npmjs.com/package/ancesdir

🤔 What is root? Where is root?

Quite often, when working on some projects at Khan Academy, we need to know the root directory of the project. This enables us to write tools, linters, and tests that use root-relative paths, which in turn can make it much easier to refactor code. However, determining the root path of a project is not necessarily simple.

First, there is working out what identifies the root of a project. Is it the node_modules directory? The package.json file? The existence of .git folder? It may seem obvious to use one of these, but all these things have something in common; they don't necessarily exist. We can configure our package manager to have package.json and node_modules in non-standard places and we might change our source control, or not even run our code from within a clone of our repository. Determining the root folder by relying on any of these things as a marker is potentially not going to work.

Second, the code to walk the directory structure to find the given "marker" file or directory is not trivial. Sharing a common implementation within your project means everything that needs it, needs to locate it; in JavaScript, that means a relative path, at which point, you may as well just use a relative path to the known root directory and skip the shared approach all together. Yet, if you don't share a common implementation from a single location, then the code has to be duplicated everywhere you need it. I don't know about you, but that feels wrong.

💁🏻‍♂️ Solution: ancesdir

The issue of sharing a common implementation is easiest to solve. If that common implementation is installed as an NPM package, we don't need to include it via a relative path; we can just import it by its package name. There are packages out there that do this, but the ones I found all assumed some level of default setup, failing to acknowledge that this may change. In turn, they did not support a monorepo setup where there could be multiple sub-projects. How could one find the root folder of the monorepo from within a sub-project if all we used to identify the root folder were package.json? What if we wanted to sometimes get the root of the sub-project and sometimes the root of the monorepo?

I needed a way to identify a specific ancestor directory based on a known marker file or directory that would work even with non-standard setups. At Khan Academy, we have a marker file at the root of the project that is there solely to identify its parent directory as the project root. This file is agnostic of tech stack; it's just an empty file. It is solely there to say "this directory is the root directory". No tooling changes are going to render this mechanism broken unexpectedly unless they happen to use the same filename, which is unlikely. This way, we can find the repository root easily by locating that file. I wanted a package that could work just as easily with this custom marker file as it could with package.json.

I created ancesdir to fulfill these requirements1.

yarn add ancesdir

The API is simple. In the default case, all you need to do is:

import ancesdir from "ancesdir";

console.log(`ancesdir's root directory is ${ancesdir()}`);

If you have a standard setup, with a package.json file, you will get the ancestor directory of the ancesdir package that contains that package.json file.

However, if you want the ancestor directory of the current file or a different path, you might use ancesdir like this:

import ancesdir from "ancesdir";

console.log(`This file's root directory is ${ancesdir(__dirname)}`);

In this example, we have given ancesdir a path from which to being its search. Of course, that still only works if there is an ancestor directory that contains a package.json file. What if that's not what you want?

For the more complex scenarios, like monorepos, for example, you can use ancesdir with a marker name, like this:

import ancesdir from "ancesdir";

console.log(`The monorepo root directory is ${ancesdir(__dirname, ".my_unique_root_marker_file")}`);

ancesdir will then give you the directory you seek (or null if it cannot be found). Not only that, but repeated requests will work faster as the results are cached as the directory tree is traversed.

Conclusion

If you find yourself needing a utility like this, checkout ancesdir. I hope y'all find it useful and I would love to hear if you do. You can checkout the source on GitHub.

  1. The name is a play on the word "ancestor", while also attempting indicate that it has something to do with directories. I know, clever, right? []