πŸ§‘πŸΎβ€πŸŽ¨ Render Gateway: A Multi-use Render Server

Photo by Nikola Knezevic on Unsplash

This is part 11 of my series on server-side rendering (SSR):

  1. πŸ€·πŸ»β€β™‚οΈ What is server-side rendering (SSR)?
  2. ✨ Creating A React App
  3. 🎨 Architecting a privacy-aware render server
  4. πŸ— Creating An Express Server
  5. πŸ–₯ Our first server-side render
  6. πŸ– Combining React Client and Render Server for SSR
  7. ⚑️ Static Router, Static Assets, Serving A Server-side Rendered Site
  8. πŸ’§ Hydration and Server-side Rendering
  9. 🦟 Debugging and fixing hydration issues
  10. πŸ›‘ React Hydration Error Indicator
  11. [You are here] πŸ§‘πŸΎβ€πŸŽ¨ Render Gateway: A Multi-use Render Server

Way back in January 2020, I started blogging about server-side rendering (SSR) React. I intended it to be a short, four post series that would go into the details of what SSR is, how to do it, and the pitfalls that lie within.

This is post eleven of the four post series, and perhaps the last (at least for now). Over the course of the last ten posts, we have created a simple React app that server-side renders and successfully hydrates. Along the way, we have learned about the complexities of server-side rendering and hydration, and from this we can identify some important lessons.

  1. The initial render of client and server renders must be the same for hydration to be successful
  2. Maintaining a server-side rendering solution can become complex, especially as the app itself becomes more complex (and it doesn't help that React only reports hydration problems in its development build until React 18)
  3. Reasoning about an app to ensure we consider both the server-side and client-side behavior can be a pain

We cannot change the first point. In order for the hydration to be successful, we need our initial render to be the same on both client and server. To achieve this, we need to understand how our code will render in both contexts. However, with some clever components (and hooks, in some cases), we can simplify things to reduce the impact of the other two points. There are frameworks available such as NextJS that provide these features for us. However, I find great value in understanding the complexities of something to grasp exactly what tradeoffs third-party solutions are incurring, and at the time I was working on the SSR solution for Khan Academy, moving to NextJS was far too great a lift. So, in this series we have rolled our own solution.

First, by using components like WithSSRPlaceholder in Wonder Blocks Core, we abstract away the general complexity of understanding the process of server-side rendering to ensure our result hydrates properly.

Second, by testing our code in development in diverse browser environments we can check for things that often cause hydration errors (such as browser feature detection being used to change what gets rendered – remember, the server has no idea what the user has configured in their browser, what their screen size is, etc.).

Finally, by changing the server-side rendering solution from one that knows lots about our frontend code to one that knows as little as possible, we can build a server-side rendering approach that will work without needing to be redeployed every time we change our frontend. And that is where we are heading in this post as we created such a server to perform server-side rendering at Khan Academy.

Goliath and the Render Gateway

For more than two years, the Khan Academy backend that underpins our website and mobile apps has been undergoing a major re-architecture. We named this massive project, Goliath – part pun on the new backend language we had chosen, Go, and part pun on the absolutely colossal amount of work we had ahead of us to get the job done. You can read all about it in these posts on the Khan Academy Engineering blog:

The re-architecture was a big project, made ever more complex by the need to keep the site running smoothly for the millions of folks that rely on us as we transitioned things off the old architecture and on to the new, piece by piece1. As part of this re-architecture, we knew we needed to change the way our website was served and so I, along with the amazing team I work with, were tasked with creating a server that would render our web pages. We made a variety of decisions to simplify the work and to simplify maintenance long term:

  1. We would only support rendering pages that used our latest frontend architecture
    Supporting legacy tech-debt laden code would only perpetuate problems and would most definitely increase the complexity and volume of work to be done. By using our current frontend architecture, all the information about the site, including what page to render for which routes, would be codified within the frontend code.
  2. We would get it working first and get it working fast second
    While we made decisions all the way through to avoid performance issues, we also deliberately avoided making any performance optimizations before we knew what a working solution looked like. And we took measurements – always take measurements before and after when you are making performance improvements.
  3. We would make it generic enough to cope with the multiple changes we make to our frontend each day. We deploy many times in one day to fix bugs and release new features. Our engineers work hard to make these deployments invisible to users and we wanted to implement a solution that would support that effectively.

Our strategy was to get something working, move eligible routes over to that something one by one, and make incremental changes as we went to improve performance and fix bugs.

We knew up front that we would be using an edge cloud platform like Fastly to route the traffic and ultimately provide caching for our SSR'd pages, so we made sure that our design incorporated support for things like the Vary response header to support efficient caching (though we did not use that to begin with, no premature optimization). We went as far as including code in our frontend that could track what request headers were used by a page rendering so that we could build that Vary header with a view to utilizing it once we were at a stage where cache optimization made sense2.

After a little back-and-forth we settled on a name for this new approach to rendering our website; the Render Gateway.

What we did

We spent quite some time building the main Render Gateway code, solving many problems like:

  • How do we know what code to run?
  • How do we build a result that the cloud edge service can understand?
  • What does that result look like?

Many test implementations were stood up as we added more features, including the ability to:

  1. Verify incoming requests so that we can immediately throw away spam
  2. Add different status values and headers to the response to support redirects
  3. Track request header access and add proper support for the Vary header
  4. Log and trace requests in sufficient detail to debug issues in production

By mid-2020 we had a working server and we went live, serving the logged-out homepage and more from this new service. It worked!

It was also slow and had a massive memory leak. 😒

And so began the arduous work of performance testing and memory investigations as we worked to improve things. Our biggest performance wins came from reducing the amount and size of the JavaScript that it takes to render our site (an ongoing and effective focus for site performance in general) and from utilizing the Vary header along with our cloud edge service to reduce the amount of traffic the server needs to handle. For example, we do not gain much value from rendering pages that are for logged-in users so our cloud edge does not ask us to SSR those pages. In addition, better use of the Vary header increases our cache hit rate, leading to more logged-out users benefitting from SSR'd pages.

The Memory Leak

Sadly, the memory leak was a real pain. Every 20 to 40 production requests, an instance would hit a soft or hard memory limit and die. Google App Engine (GAE) works admirably in the face of an unstable app. It will detect the soft or hard memory limit violation, kill the service and spin up new instances as needed, even resubmitting the failed request so that the caller only sees the impact as a slower request rather than a complete failure. This meant that we could keep our leaky implementation serving production users while we investigated the problem, allowing us to continue supporting the Goliath project, albeit with a bit of a limp.

Myself and John Resig spent many hours performing memory investigations, writing multi-process render environments and more in our attempts to both track down and mitigate the memory leak. Just when we thought we had noticed what was holding onto memory, we would realise we were wrong and seek a new path. This was only exacerbated by how hard it was to generate the leak in development, especially since the Chrome dev tools used to investigate memory issues would hold onto references of the code it loaded, and our main usage of memory was that very code that we loaded dynamically. It was weeks of effort until another colleague noted a similar leak in another node service that we had in production. It turned out that the @google-cloud/debug-agent package we were using has a problem and it appears to be down to the very same v8 engine issue we encountered when using Chrome dev tools to investigate the memory issue. Once we removed that dependency, the memory leak went away and instead of crashing every 20-40 requests, each instance of the Render Gateway can handle millions of requests without a care3.

How it works

At its core, the Render Gateway is a generic express server written in JavaScript to operate in Node on Google App Engine. It takes a URL request and renders a result using a configured render environment. Because it uses an API to define that render environment, it is incredibly versatile. There are no rules to what that render environment does other than take in a request and provide a response. Here's an example from the publicly available repository4:

const {runServer} = require("../../src/gateway/index.js");

async function main() {
    const renderEnvironment = {
        render: (
            url /*: string*/,
            renderAPI /*: RenderAPI*/,
        ) /*: Promise<RenderResult>*/ =>
            Promise.resolve({
                body: `You asked us to render ${url}`,
                status: 200,
                headers: {},
            }),
    };

    runServer({
        name: "DEV_LOCAL",
        port: 8080,
        host: "127.0.0.1",

        renderEnvironment,
    });
}

main().catch((err) => {
    console.error(`Error caught from main setup: ${err}`);
});

If you were to run this code with node, you would get a server listening on port 8080 of your local machine with support for the following routes:

  • /_api/ping
    This will return pong, and provides a way to test if the server is responsive.
  • /_api/version
    This will return the value of the GAE_VERSION environment variable, something that Google App Engine sets which you can configure at deployment to specify the version of the server code being run.
  • /_ah/warmup
    Google App Engine supports a warmup handler that it sometimes runs to warm up new instances of an app when scaling. By default, this just returns OK, but the app can be configured to do additional work as needed.
  • /_render
    This performs the actual render. The URL to be rendered is specified using a url query param.

If you invoked http://localhost:8080/_render?url=http://example.com with this server running, it would respond with a 200 status code and the text You asked us to render http://example.com.

The magic is the render environment, which in this case is a very simple object with a single render function:

const renderEnvironment = {
    render: (
        url /*: string*/,
        renderAPI /*: RenderAPI*/,
    ) /*: Promise<RenderResult>*/ =>
        Promise.resolve({
            body: `You asked us to render ${url}`,
            status: 200,
            headers: {},
        }),
};

The Render Gateway source also includes an environment implementation that uses JSDOM, allowing you to construct a more complex environment. However, it does nothing specifically related to React because how your code actually renders server-side is up to you and how you configure it. In fact, because it is built on express, you can plug-and-play the various pieces used to build the main startGateway call to implement your own approach if you so desire, even if you don't want to use Google App Engine.

At Khan Academy, we have a custom render environment that uses some organizational conventions and custom header values populated by our cloud edge service to identify which version of our frontend code is needed. The render environment then downloads (or retrieves from cache) that code and executes it within an isolated node environment to produce the body, status, and response headers (including the aforementioned Vary header) for the result. This is then sent in response to the caller. All the code executed to actually produce a result is from the downloaded code at the time of the request. To support this, we have some conventions, components, and frameworks that allow developers to access request header values, set response header values, and update the response status code from within our frontend code in a manner that feels natural (for example, a <Redirect/> component abstracts away the work of setting the status code and the Location header as needed). This means that our engineers, when working on our frontend code, do not need to context switch between thinking about client-side rendering and server-side rendering; instead, they have idioms to hand that enable them to build frontend user experiences that just work.

Our simple app revisited

Now to come full circle, we can envisage what our server-side rendering solution might look like using the Render Gateway. Instead of importing the client-side code at build time, we could leverage a render environment using JSDOM to dynamically load the code when a request is made, decoupling our server from our client.

I have made some changes to demonstrate this concept of using a manifest. However, this change still assumes a client build local to the server. If we wanted to make this entirely client-build agnostic, we would change our render environment to download the files (including the manifest) from a CDN instead. The GAE_VERSION environment value, or some header we receive could indicate the version of our frontend we need. We can then look up a manifest in our CDN to tell us the URLs of files we need, download them, execute them, and invoke the rendering process to get a result.

For now, if we are in production, we look for ../client/build/ folder to load the manifest and then load the files from that same folder; in development, we defer to the client webpack server. So, in a way, the development setup is closer to our envisaged CDN-based setup, with webpack acting as that third-party file host.

Take a look at the branch and think about how you might modify things to use a CDN for production. Note that the render-gateway code is currently specific to Google App Engine.

Some final SSR thoughts

Server-side rendering is great for providing search engines with a more complete version of your page when they come crawling your site. It is also great at showing more of your page to your users on first display. And if used unnecessarily, it is a great way to sloooooooow the delivery of your site 😱.

If you always SSR a page before serving it to users, you could wait quite a while for that page to finally land in front of the user. The real value of SSR is only realised when it is coupled with caching so that an SSR result can be re-used for multiple requests. This can be easy to setup with a service like CloudFlare or Fastly, but to do it right and get the best cache hits without compromising your users data or the utility of your site can take a little more work. You will want to familiarise yourself with things like the Vary response header, edge-side includes, and other useful concepts. Not to mention performance and other site metrics so that you can measure the impact of your SSR strategy and make sure it is serving its purpose without hindering your site or your users.

Whatever you choose to do next, I hope this series on server-side rendering with React has demystified the topic and provided you with some helpful resources as you consider SSR and what it may mean to your current or next project – please stop by and let me know about your SSR adventures in the comments. In the meantime, as the React team works more on React and features like Suspense, the server-side rendering story, like so many software developments stories, is going to change.

For now, thank you for joining me on this SSR journey. When I started, I thought I knew everything I needed to know about SSR in order to tell you everything you needed to know about it. It should come as no surprise to any of us that I still have things to learn.

  1. The pandemic that showed up right after we started also contributed to the complexity of the project as more and more folks around the world turned to us to support their education []
  2. The Vary response header allows a server to tell a cache like the one Fastly provides with headers in the request were used to generate that response. Along with the URL and other considerations, this tells the cache what header values need to match for a cached page to be used versus requesting a new response from our server []
  3. At the time of writing, that issue is still open although there is ongoing movement to suggest it may soon be resolved, or made redundant with the removal of that feature from Google's offering []
  4. There are currently no NPM packages to install for this, though I hope to change that – instead, the dist is included in the repo and we install via commit SHA []

πŸ— Creating An Express Server

Photo by Kelly Sikkema on Unsplash

This is part 4 of my series on server-side rendering (SSR):

  1. πŸ€·πŸ»β€β™‚οΈ What is server-side rendering (SSR)?
  2. ✨ Creating A React App
  3. 🎨 Architecting a privacy-aware render server
  4. [You are here] πŸ— Creating An Express Server
  5. πŸ–₯ Our first server-side render
  6. πŸ– Combining React Client and Render Server for SSR
  7. ⚑️ Static Router, Static Assets, Serving A Server-side Rendered Site
  8. πŸ’§ Hydration and Server-side Rendering
  9. 🦟 Debugging and fixing hydration issues
  10. πŸ›‘ React Hydration Error Indicator
  11. πŸ§‘πŸΎβ€πŸŽ¨ Render Gateway: A Multi-use Render Server

Over the previous three posts in this series we have described what server-side rendering (SSR) is, created a simple application using React, and discussed the architecture of a privacy-aware server to ensure we understand some of the sharp edges around SSR. Now, we will actually implement a basic server. Just as with the React application we created, the server we create will not be a complete solution, but it will provide a foundation from which we can continue to explore SSR.

✨ A New Project

Where do we start? Well, we need a server that can receive web requests and respond to them. For that, I am going to use Express1, but first I need a project.

NOTE: Where you see yarn, know that you can use your own package manager as you see fit.

  1. Add a new repository on GitHub (or your source control platform of choice).
  2. Make a new folder locally for your code
  3. cd to your new folder and run git init
  4. git remote add origin <your github repo URL>
  5. git pull origin master
  6. git branch --set-upstream-to=origin/master
  7. Create and commit a .gitignore file
  8. Initialize it for JavaScript package management with yarn init
  9. Run yarn install to generate the lock file
  10. Commit the yarn.lock and package.json to the git repository

Great, so now we have a project we can start working on. Let's add Express.

yarn add express

This should update our package.json and yarn.lock, so don't forget to commit those changes. I also recommend pushing often to your remote repository, that way your code is backed up online in case your computer suffers a nasty accident2.

πŸ‘‹πŸ» Hello World!

At this point we need to write some code. We need to setup a route for our server that can handle providing a rendered result for any URL that our application might have. There are a couple of ways we could do this:

  1. Assuming that our server is invoked by some intermediate layer, such as a cache, we could have the server implement a single route (e.g. /render) and pass the URL to be rendered as a query parameter.
  2. Our server could assume the URL is to be rendered by the client code and just accept any URL.

Option 1 gives us a great deal of flexibility in what our server can do, but it forces us to ensure that there is a layer between the original browser request and our server, as something has to be responsible for constructing the appropriate /render route request. Option 2 removes the need to have an intermediate layer, but it perhaps restricts us from expanding server functionality. Of course, option 2 can be changed to option 1 if the need arises, so we can go with option 2 for now, knowing that later, it can be updated to suit changing needs.

Normally, I would add lots of other things to this server to improve development and runtime investigations, such as linters, testing, and logging, but for the sake of brevity, right now we will stick to the main functionality.

const express = require("express");

const port = 3000;
const app = express();

app.get("/*", (req, res) => res.send("Hello World!"));

app.listen(port, () => console.log(`Example app listening on port ${port}!`));

This is our index.js file. It is not doing a lot. On line 4, we create our express app. On line 6, we tell it that for any route matching /*, return Hello World!. On line 8, we tell it to listen for requests on port 30003.

If we run this app with node index.js, we can go to our browser, visit any route starting with localhost:3000 and see the text, Hello World!. This is fantastic. We have a server and it is responding as we hope. Since we are going to run this often as we make changes, I will add a script to our package.json to run node index.js for us.

{
  "name": "hello-react-world-ssr",
  "version": "0.0.1",
  "description": "A server-side rendering server",
  "main": "index.js",
  "license": "MIT",
  "dependencies": {
    "express": "^4.17.1"
  },
  "scripts": {
    "start": "node index.js"
  }
}

In the package.json file shown above, I have highlighted the section I added containing the new start command. From now on, we can start our app with yarn start. The next step is getting our server to render our React application. Before we do that, consider these questions:

  1. How does the server know about and load the code for our React application?
  2. How does the server get the rendered result to send back?
  3. How do we isolate render requests to avoid side-effects bleeding across requests?

πŸ€” The Hows

The answers to the first two questions have implications beyond the server itself, possibly influencing both our client application and any deployment process.

How our server knows about and loads our client application may affect how our server is deployed. Some server-side rendering solutions involve deploying the client-side code with the server so that it has direct access to the appropriate code, others use a mechanism such as looking up in a manifest to identify the files to load from a separate location (such as a content delivery network (CDN)). Neither of these is necessarily a bad choice – they both have their advantages and disadvantages. For example, deploying the server with the right code means:

  • βœ…The server has fast access to the client application it is rendering
  • βœ…The server can integrate nicely with the client application
  • ❌The render server must be deployed every time the client application changes
  • ❌The server is closely coupled to the client application

Whereas, looking up files in a manifest and loading them from elsewhere means:

  • βœ…The render server rarely requires updating
  • βœ…The server can render more than one application
  • ❌The server will probably need to cache JavaScript files locally or be at the mercy of latency when communicating with the CDN
  • ❌The client applications that the server renders likely need to include custom code to support being rendered by that server

Being aware of how these approaches differ – and they differ in more than just the ways I have suggested, is useful in understanding the trade-offs we must make when implementing our render server. Perhaps answering the second question will help us decided which route to take; consider how will our server get a rendered result of the client application?

Our server is going to invoke a call from the React framework that renders our React application to a string, rather than mounting it inside the DOM of a browser. To do that, it needs a React component to render, so it must load our client application and get the root-level component. In addition, assuming our render server is rendering the entire page and not just the React component, the server is likely going to need to gather additional information, such as which files must be loaded in the page, the page title, etc.

This whole process of capturing the application render and associated metadata requires interplay between the server code and the client code. Revisiting the first question and the two approaches I gave: if the server has the client code deployed with it, the server could know exactly which files to load to render the component, importing those directly and using them accordingly; if the server is less-closely coupled, we likely need some mechanism whereby the client application itself does more of the heavy lifting by hooking into some framework provided by the server, even if that is just exporting a specific object so that server can identify the appropriate things to coordinate rendering.

Ultimately, either we have a server that is custom built to our application, or we have a server that is built to support many applications. What to do? I say, dive in and try them both. To that end, next time we will look at the first option where the server knows all about the client application (though we may cut some corners to get to the salient points), and we will answer that third question; how do we isolate our renders?

πŸ™‡πŸ»β€β™‚οΈ In Conclusion

Herein we have created our server, though it does not do much yet. We have also considered two different approaches to connect our server to our client application: closely-coupled or more open, and we have started to think about how the server will isolate and respond to render requests.

This week's entry turned out a little longer than I had intended, and covered less things than I had hoped. Sometimes that is the way it goes. One of the biggest reasons I write these blogs is to discover what I do and do not know about something. Often in the effort of explaining it to someone else, I identify a bias that I have without any supporting evidence, or a topic I grasp that is far harder to explain than I expected.

Until next time, when we start to implement our server-side rendering, please leave a comment. Perhaps you have a question, a personal experience writing a render server, or want to take umbrage at something I have stated. I look forward to learning with you as we continue this journey into the land of SSR. πŸ—Ί

  1. I find Express easy enough to use and well-supported, though there are other options that one could use instead if one were so inclined []
  2. A lesson from bitter experience; hard drives die (especially SSDs) without warning, drinks spill, laptops get dropped – keep your work backed up []
  3. The port is currently hard-coded for simplicity, but we could make this configurable []