Rick van Rein

Thu 03 July 2014


Web Architecture 1: Stateless Applications?!?

It is common practice in today’s world to be provisioning services over the web protocol HTTP. This is a distortion of the original design philosophy of the web, and it leads to great inefficiencies. But fixing it is easy.

This document is part of an article series on Web Architecture.

The original idea

The original design of the web is a rather straightforward concept: A server provides access to resources that reside at locations, and on which operations can be executed. For example, one could PUT a document into a certain location, and GET it back later from that same location. The range of operations can be extended, for example to permit co-operative editing.

This concept centers around stateless service, meaning that each operation stands on its own and has no relations with earlier operations. The model has been very useful for storing web sites, whose pages were composed of a number of resources such as HTML text, images, style sheets and even JavaScript code; each of these resources would sit in its own location, and the HTML text would contain links to the various parts that combine to form a page.

Interestingly, the various resources are typed, so they are not dependent on local file naming conventions, but instead adhere to a scheme of so-called MIME-types, which classify a resource as text/html or image/jpeg which are standardised by the Internet Engineering Task Force.

This model permits a web designer to reuse elements in a multitude of pages, and such reuse actually saves on network bandwidth, because a browser need not reload resources that it has recently seen.

How we wrecked it

The web mechanisms are ideally suited for static content that is stored passively on a web server that basically functions like a remote file system whose content is suited for use in a web browser.

In fact, even the dynamic use we make of the web these days is not a problem per se. It is more the way in which we do it.

The wonderfully interactive sites that we like to use are based on JavaScript programs that run in our browsers, and that push and pull small bits of information to and from the web server. These small bits are often the only dynamic portions, and thanks to the JavaScript code these can be rendered in a lovely way in our browser.

Other applications may not be so interactive, and use the web server to generate full HTML pages, whose content is constructed on the fly, based on database lookups and so on.

In the backend of web servers, we see that both kinds of dynamic information are built up from scratch. This is a direct descendant of the statless style of the web design, but it means that the same database queries need to be run over and over.

Meanwhile, we are trying to tie pages together, so most frameworks do carry a session identifier along with the individual requests to GET or PUT a resource. You may have heard of these identifiers, they are also used to profile your online behaviour and are stored in cookies — but for this purpose they are actually functional; they help the server keep many of the same-time users separated.

But the stateless properties of the web service have caused servers to not store any information. So, based on this session identifier, all sorts of session information must be dug up — from the database. Again.

We’ll speak of databases in a later article in this series; for now, let me summarise that this is not as efficient as we’d like it to be.

How we can fix the web

Based on this analysis, you may well see the solution: to be better at handling web queries, we simply need stateful web services; that is, web services that can maintain state between individual resource requests. And this is not such a strange concept — sitting behind your browser, you experience a session, and you expect your just-made clicks to impact your future ones. You actually need this information stored.

The current practice of confining storage within the bounds of a single GET or PUT on a resource is more limited than what we are experiencing, but it is still being honoured by the majority of web applications. So they end up saving state in a database, and pulling it back up in the very next request. And the one after that. And after that. A lot of work is done repetitively, and only because applications are confined within the one-page-at-a-time trap.

But a revolution is mounting, and it is a healthy one. Applications are once again becoming programs, that retain internal state just like your word processor holds a copy of the document. They just happen to interface with remote users over the web. There is no reason you couldn’t GET or PUT resources on such a remotely run application. And come to think of it, this still counts as interaction with a resource, except that this web service probably has more knowledge of the content, can provide indexes and cross-links in a more versatile manner than a remote file service ever could.

A modern-style web server will not even run all these programs locally, but permit them to be run on any nearby server; the web server would simply be configured to forward certain chunk of its resource locations to an application that interfaces over the web. For instance, you could have all locations that start with /blog/ forwarded to a blogging application.

Need to fix PHP

A major application framework on the web is PHP. This is a language in which many dynamic web applications are built. PHP has a few tedious properties, but it is widely appreciated for its fast development cycle.

The PHP model is strongly centered around this one-page-at-a-time model, and indeed it can be observed that applications load relatively fast until you start adding modules and features; when you do that, you quickly see your web service slow down because it needs to reconstruct every page from scratch. Had this been an application that happened to interface over the web, then the modules would have been written to load, and retain state and caches that help to quickly service any requests.

In terms of the topic at hand, PHP is not a problem in itself; it is more a problem how it is used. PHP is run as a standalone application, and the forceful development of PHP libraries more than suffices to come up with a web serving interface. Then, programmers can migrate their applications from the one-page-at-a-time model to the application-with-state model. Chances are that the creativity in the PHP community will establish a style of programming that suits both approaches; that would be ideal for a smooth transition from the current style to the application-oriented style, without the need to rebuild all applications in existence.

Go Top