Rick van Rein

do 03 juli 2014


Web Architecture 3: the Database Backend

Many websites are dynamic, and require the use of a database in their backend. It is surprising to so how unfit this solution is for the problem at hand.

This document is part of an article series on Web Architecture.

The original idea

When the first dynamic websites were created, they did things such as showing lists of items, for example the inventory of a shop. This was a pleasant interface to what would otherwise be too complicated to handle. The normal interface would have used the Set Query Lanague, usually abreviated to SQL, with complex statements like select user.name,user.birthdate from user where user.gender = 'M' which would output the names and dates of birth for all recorded male users — believe it or not, this language was considered fit for human consumption when it was designed in the 70’s! And indeed, it is somewhat readable, but not a pleasant language to actually write your computer interaction with.

Of course, when presenting items in a list you need to be terse. But sometimes you do need more detail. So the thought quickly arose to turn items into clickable links to a detail page with everything you could possibly want about an individual item.

SQL also excells at relating records, so it is possible to do things like listing all orders made by a given user. Once again, lists. The opposite also works, an order could link back to the user who placed it. You probably recognise the structure of web shop systems and such that you must have used.

How we wrecked it

SQL is meant to handle sets. In practice however, the majority of database searches that we do involves individual entities. This is because SQL is used to overcome the stateless mindset behind the original web design. So you will often see a session identifier linking to individual records to present individual bits of information.

And even though SQL is capable of doing these individual queries efficiently by setting up indexes, it still is a lot of work to be contacting a database, often run on a separate server across a network connection, to be posing it a question in SQL’s stylish English, to lookup the index and pull up the desired record. Or, in more complex situations, to trace through the database to collect the data required from individual bits and scraps of information.

SQL is by no means that best model to be storing web data. Much of the work relies on a simple location from which to retrieve individual entries, and the available power of set calculations is very general, but that generality comes at the usual price of overhead.

To make things worse, these database queries are not only used to setup actually dynamic pages, but many web frameworks have a lot of generic tendencies, treating things as objects of certain classes, and when an application starts thinking along those lines they need to rely on the generic powers of a database. Even if the website does not use it, or does not actually need all that power.

We are tossing compute cycles into our everyday websites without thinking much of it, but all these cycles take energy on a very large scale and, more in the individual interest of website owners and their visitors, they slow down the responsiveness of websites.

How we can fix the web

Many websites that use some form of web framework with builtin flexibilities are actually static in nature. These sites can be rendered after they have been edited, and then stored in their final form on the web server. Plenty of modern tools exist to do this efficiently, and they result in great websites.

Consider a blog. The index page changes with every newly posted article; but does that mean that a program should reconstruct the index for every user who wants to see it? Of course not; it could be done when new articles are added.

Consider a web shop. Do you really need to lookup the inventory in a database, everytime product #31807543 is requested? Nope. You could easily render a static page for that product. The one thing that is variable is perhaps the number of items in stock; it would suffice to retrieve only that piece of information, but there is no reason to download the description, specifications and images of the item from the database.

Consider your photo gallery. Sure, the application developer stores your image descriptions in a database and runs the same code with different parameters to show differently annotated photos or different index pages. But come to think of it, the number of pages is limited and you don’t need live updates of any kind, do you? So why render your gallery pages on the spot?

In fact, what this approach means, is that what we currently view as database-stored data will be split into a part that can be rendered statically, and perhaps a part that needs be drawn dynamically. This greatly improves page loading times and it completely disarms script-kiddies and other attackers. You want an example? How about InternetWide.org?

When a database is used, it is also clever to question the paradigm that makes most sense to your application. MySQL has long been the de facto standard database for hosting solutions, as part of their generic LAMP platform. But there is a growing NoSQL movement that agrees it is not a lingua franca and may need rethinking. Technologies such as Redis / Mnesia are popular, but perhaps chiefly of interest for “large sites”.

On the small end however, we have always had DBM available, a very straightforward key-to-value lookup system. Well… simple… if you want then there it can come with full-blown transactions, and redundancy across multiple servers.

Or do you need more querying facilities predefined? Then consider LDAP. It may not be able to relate objects, but the objects that it stores have dynamic fields and typing possibilities, which may be much more useful to you. Moreover, many things you might want to put in there have been standardised ages ago. Since LDAP is a well-defined protocol the data can be shared without the intervention of a web interface; this is the kind of thing we also discuss in our global directory series of articles.

The base setup of a hosting package may still rely on SQL, but alternatives are mounting and gaining in popularity. Do ask yourself if you want to spend countless compute cycles on repeated work, if you if you prefer a crisp site that loads like lightning.

Go Top