7 min read

Zero API Necessary: HTTP/X

Zero API Necessary: HTTP/X

This is part 2 of a 12-part series called Distilling the Web to Zero. The web is far from done improving - particularly as it pertains to building rich web applications that are, to users, developers, and businesses alike, more desirable than their native-app counterparts. This series is prefaced by defining the biggest challenges on the road towards that goal and the 12 essays that follow explore potential solutions using concrete examples.

Because it tastes better

A man went to the store for a ham. After returning home, his wife asked him why he didn't have the butcher cut off the end of the ham. When he inquired why, she replied that it’s her mother’s recipe; she had always done it that way because it makes it taste better. Since the wife's mother was visiting, they asked her why she always cut off the end of the ham. Mother replied that she wasn’t sure why, this was just the way her mother did it. They decided to call Grandmother to solve this three-generation mystery. Grandmother replied, “Oh that’s not part of the recipe. My roaster was too small to cook it in one piece.”

The Five Whys

The five whys is a technique for exploring root cause analysis. It’s done by repeating the question “Why?” five times which should reveal the root cause of the problem. For example, park managers in Washington DC noticed that the Jefferson Memorial was crumbing at an alarming rate. They found out that this was because it was being washed more frequently than any of the other memorials. However, simply washing it less couldn’t be the solution.

  1. Why is the monument deteriorating?
    Because it’s being cleaned every two weeks.
  2. Why so frequently?
    Because of the excessive bird droppings.
  3. Why so many birds?
    Because they’re attracted to the spiders.
  4. Why so many spiders?
    Because there are so many insects.
  5. Why so many insects?
    Because they are attracted to the lights at dusk.

Solution: Turn on the lights after dusk.

Using this approach on modern web development’s best practices reveals an interesting core assumption that’s worth rethinking from first principles.

  1. Why has building for the web become so complicated?
    Mostly because logic must be split across the client and the server.
  2. Why must the logic be split?
    Because only the browser can handle the micro-interactions necessary for a rich UI.
  3. Why can only the browser do this?
    Because the UI state lives in the browser which is used to render the UI.
  4. Why must the UI state live in the browser?
    Because web servers use a stateless architecture and are located too far away.
  5. Why are web servers built this way?
    Because scaling horizontally requires that web servers retain no state between requests.

Solution: Build webapps on a stateful architecture.

Compounding complexity

Complexity begets more complexity. This might be the most difficult aspect of effective engineering – aggressively protecting simplicity at every turn because when complexity creeps in, it doesn’t just live forever; it grows like a cancer.

“Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.” ― Antoine de Saint-Exupéry

Cascade-delete your infrastructure

They say engineering is just the process of moving complexity to where you want it. Let’s explore some of the unintended consequences of moving UI state from the bottom of the stack to the top. Not only does it greatly reduce the number of moving parts needed in your infrastructure (making developers and businesses happier) it also enables webapps of unbeatable speed and UX (delighting end users too). Win, win, win.

stateless-v-stateful.svg

1. Firstly, fewer web servers are needed

StateLESS architectures scale horizontally by spreading traffic across large clusters of smaller web servers. Since the state used in every request is discarded after every response, sometimes these web workers are even ephemeral cloud functions/workers which greatly simplifies scaling on-demand. While convenient, it’s common to be blissfully unaware of just how costly that “horizontal tax” can be.

StateFULL architectures scale vertically by using fewer but larger machines. StackOverflow is among the top 50 most trafficked websites on the Internet and they can famously handle their entire traffic on only one web server!

To put it another way, when it comes to building web applications, 64 separate single-core machines ≠ 1 single 64-core machine. And not by a small margin either.

2. Therefore, caching servers are not needed

When web servers aren’t required to reassemble and immediately discard the full context between each and every request, there’s little reason to cache it on a separate server since it can be safely assumed that all subsequent communication from that same client will continue to that same machine. Each client’s context should simply remain in that machine’s RAM until the client disconnects.

The only reason we tolerate the overhead of introducing a dedicated caching service such as Redis, Memcached, or some other key-value store is because, in a stateless world, 10 requests from the same client might be load-balanced across 10 different web servers thus rendering each server’s in-memory cache useless and very likely incorrect due to staleness.

In a stateful architecture, each client (browser) would communicate directly with one and only one server for the duration of its session. This is not unlike how many video games are built where a server is selected before hosting the entirety of that world, level or tournament.

It’s also the way other familiar web protocols work. Take FTP for example – a TCP connection is established, authenticated but then stays open awaiting a multitude of commands for potentially hours, possibly even days. While HTTP does have something similar called Keep-Alive, it’s now prohibited in HTTP/2 and HTTP/3, and was only ever used to boost setup speeds at the transport layer, and never associated with caching at the application layer.

3. Therefore, business logic is free to finally move to the edge

In a world where context isn’t discarded and rebuilt between each and every request, suddenly business logic isn’t required to be located within milliseconds of a centrally-located, shared caching service. In fact, the business logic can even be spread out geographically, not unlike a CDN.

Granted, the database must still be centrally located, but when business logic can fully embrace statefulness, the role of the database becomes far less chatty and much more transaction-oriented which substantially reduces the amount of round tripping needed.

4. Therefore, load balancers are not needed

In a world where clients maintain a single persistent connection with its server for the duration of its session and when business logic is no longer centrally located, running a load balancer becomes pointless. A stateful, geographically distributed architecture would instead lean on geo-aware DNS to shape its traffic.

5. Therefore, edge middleware/config is not needed

When the business logic is already running on the edge, having separate services for edge middleware or edge config becomes redundant, more costly, and needlessly complicated.

6. Therefore, a CDN might not be needed

Running your application on the edge means there’s never any cache misses since all assets are already deployed to the edge. (However a CDN might still prove useful for DDoS protection or if you use large static content that comes from blob storage.)

7. Therefore, an API is not needed

Traditionally your webserver gives JavaScript to the browser (often measured in megabytes, which is far too much). After that, the role of your webserver becomes wastefully passive, only responding when spoken to through a custom-built API over HTTP’s limited request/response protocol.

Running your business logic on the edge opens the door to a fascinating new way to interact with the browser. Suddenly your event handlers can live on the server instead of needing to be handled inside the browser. (The performance profile of this approach will be covered later in this series.) The main takeaway is that this approach brings two HUGE advantages.

  1. No more splitting logic across client and server – it can live 100% server-side with zero compromise to the richness of the UI.
  2. No custom-built API is needed for the browser to communicate with the server. Marshaling events, sending them up to the server, and allowing the server to push down instructions for how the DOM should react can be distilled down to its own standardized protocol.

HTTP/X is the new API

Think of HTTP/X just like HTTP but in reverse. Instead of the browser driving the communication with the server, the server drives the communication with the browser.

While HTTP/X is technically a protocol, it is NOT intended to be implemented by browsers or engines nor is it a proposal for the HTTP spec itself. It’s just a collection of lesser-used HTTP-based protocols (like WebSockets and SSE) coupled with a small, pre-defined instruction set designed to manipulate the DOM. Utilizing HTTP/X requires application-level libraries written in whatever server-side language is preferred coupled with sending a small amount of bootstrapping-JavaScript to the browser.

In a nutshell the protocol operates in the following flow:

  • The browser performs a regular GET request to any URL.
  • The server responds with regular HTML. No hydration. No client-side rendering.
  • Only 2 small pieces of JavaScript are allowed to run in the browser:
    1. The communication-bootstrapping which establishes a stateful, bi-directional channel with the server. (There are many libraries in many languages that do this using protocols like WebSockets but can gracefully fallback to other approaches like server-sent events or long polling.)
    2. The only other JavaScript allowed is the small instruction set the server will use to issue mutations to the DOM. These instructions aren’t for brute-force replacing HTML partials, they are designed to manipulate the DOM at the element-level. The server can then push commands to the browser and the browser will react accordingly.

This stateful channel between server and client is mandatory and is the main thing that separates it from other mainstream web frameworks where such features are possible but always optional and never the primary mechanism for reactivity.

Read the full spec at httpx.org.

HTTP/X is designed to work seamlessly with ZeroScript for templating which was covered in the previous essay Zero New Syntax to Learn.

Why now?

So what’s wrong with this approach? Why didn’t it become mainstream long ago? Why is this a stateful architecture never even considered? Short answer: We boiled that frog for over 3 decades. But that’s a story for another day.

Artifacts