Thursday, May 9, 2013

Web Architecture Styles

This is a topic I touched upon at my AgileJkl talk this year. I thought it might be nice to recap some core thoughts and expand on it. I will go through some basic patterns I have observed. These patterns are on pretty high level and I won't go into detail too much. The basic idea is just to provide some food for thought and perhaps to spark some conversation.

Static Site

Since the beginning of the web people have been developing static sites. After all that was what the web was designed for, as a document platform with hyperlinks.

The model is quite useable even these days. Better yet it is possible to implement certain dynamic functionality using JavaScript so it doesn't have to be entirely static. Rather in this case static can be considered to mean the way we host our site.

Hosting is very simple in this approach. You simply need to dump your static files on some server and it will just work. That's it.

The security model of this approach is simple. If someone can root your server you're done for. In addition possible external services you might be using could inject some content you might not like. Compared to others this is the simplest security-wise, though. Contrast this approach with something like WordPress and you might see how much smaller the attack vector and the need for server maintenance is.

If you examine the image, you can see I've used GitHub Pages as my server platform. This works surprisingly well for hosting. As a bonus you don't have to take care of the server maintenance yourself. Naturally this approach integrates well with GitHub. Basically all you need to do is to update your Git repository and the machinery over at GitHub will update your site. It also provides support for Jekyll, a popular static site generator, although pure HTML works as well.

The image also highlights the way you might integrate an external service, such as Disqus, to your site. In this case Disqus provides commenting. Incidentally I use that service on this blog and it has grown immensely popular in the past few years. You could also aggregate data from RSS and services such as Twitter. You can see this approach in action at the site of my co-op.

Client-Server Approach

There are times when having just a static site and some dynamic JavaScript based doodads isn't quite enough. This is true particularly if you need to store some of the data yourself and it is dynamic by nature.

This approach is used at JSter, a catalog of JavaScript libraries, which I help to maintain. Basically we have a database of our own for the libraries. The users of the site can modify that through a web interface.

The approach is a bit monolithic by nature and I think that we're moving away from it. The first step towards this was implementing a REST interface for the data. The idea is that external services could consume our data through it. I'll go through the details in the next section.

If you examine the image, you can see Disqus there. Not surprisingly we use it to provide commenting for various parts of our site. Overall the approach can be considered a superset of a static site. There is some extra complexity but on the other hand you gain some extra flexibility. The most important feature is the possibility to render HTML in a dynamic way on the backend side. So as the data changes so does your view. This isn't the only way to achieve the effect, however.

Security-wise the approach is more complicated than a pure static site. You will have to make sure you are using recent versions of the libraries in addition to making sure your server stays secure. There are additional attack vectors available that are related to the fact that you are allowing the users to modify the data. This is a source of a whole range of attacks including those famous injections.

REST Architecture

As noted above a traditional client-server approach is quite monolithic by definition. These days we're moving towards something more distributed. In effect we're moving the data behind an interface of its own and then access that data using another service. There is a whole set of principles on how to implement this kind of interface.

The nice thing about this sort of split is that it allows you to implement multiple frontends for the same data. It also allows other services to consume your data more effectively making it possible to implement mashups.

The image shows how the data might be consumed by another service in case of JSter. We would still probably like to have an intermediate storage at the server running the main site to introduce some redundancy to the system. It isn't particularly nice if REST server goes down. Alternatively it should be possible to add some redundancy over there to make sure this cannot happen quite easily.

In this case the approach can be taken further. Besides consuming we could make it possible for other services to modify the data. This comes with a risk of vandalism, though. You definitely don't want that someone messes up your valuable data. As a result it likely makes sense to restrict write rights and implement some sort of revisioning over at the REST server. Of course you should have backups of the data available in any case should anything go wrong.

This approach also allows us to move some computation to the client. Client-side MVC frameworks are a good example of this trend. There seems to be some sort of counter-trend going on, though, as aspects such as SEO can be difficult to deal in a pure JavaScript based approach. I think we'll see more frameworks that take these concerns in count. It would not surprise me if that was the next wave of web development.

Conclusion

The examples above were just a couple of individual examples of possible architectures. These cover my use cases quite well, though. During AgileJkl I saw an interesting, generalized variant. What if a whole web service was just a collection of very small web services talking to each other? There could be literally hundreds or even thousands of these.

Even though that might sound a bit counter-intuitive there's some sense to it. It's the ultimate anti-thesis of a monolithic approach. Given you have standard ways in which the services communicate with each other it is possible to use whatever technology you happen to find suitable for the purpose.

The approach incurs some extra latency compared to more monolithic approaches. Perhaps clever caching and other performance tweaks can counter this, though. Overall it seems like an interesting way to think about web architectures.