Offline web apps: the cache (serious this time) 15 Jun, 2011
OK, so that last post was a bit annoying. Let’s do it for real this time!
Modern web browsers combined with HTML5 give us decent off-line capability, but usage isn’t too widespread at the moment it seems. Two things: (1) what do we mean by “off-line”? and (2) why go off-line at all?
Whilst it is true that web browsers cache content, that’s always been about improving the connected experience, reducing the performance hit. Off-line websites however are designed with the explicit premise that the user will use them whilst entirely disconnected from the web. As to the whys and wherefores of off-line web applications, well that’ just one point of focus for people looking at the mobile web (more on this another day).
So how does one go about achieving such witchcraft? Well, the first port of call is a “cache manifest” file. This is a special file which dictates to the user agent which web pages, images and related resources should be available off-line (and, conversely, which resources should not be available). This file is crucial in determining whether an app is truly able to go off-line or not: one mistake in the file, and it’s all over. Also worth noting at this juncture is that once created and declared (I’ll get to this in a sec.), your browser will not detect changes you make to your code unless the byte signature of the manifest has changed in some way. All clear? Good :-)
CACHE MANIFEST CACHE: index.html about.html cuckoos.html images/cuckoo.png images/cuckoo-thumb.png images/favicon.ico images/logo.png site.css print.css jquery.js site.js
Pretty straightforward. Now that we’ve set it up, how do we tell our web site about the file? A reference to the cache manifest should go into every relevant web page making up your site (in this example I have chosen the imaginative file name “cache.manifest”):
<!DOCTYPE html> <html manifest="./cache.manifest"> <head>…
(Note that this snippet of HTML is actually HTML5—we’re dealing with the new world order baby! Any other type of mark-up, and all bets are off).
The final thing you have to know about the manifest file (before we look more closely at its content) is that it must be served up as with its own special content type, which is
text/cache-manifest. You have to configure your web server to do this, else your lovingly-crafted cache will be ignored by everyone, and you will become a laughing-stock. To avoid such faux pas, if you’re running Apache, a line like this in your .htaccess file will set you right:
AddType text/cache-manifest .manifest
Alternatively, if you’re running Apache on a local machine—e.g. when using the standard web sharing in OS X like me—locate your user account configuration file and add the directive there instead:
…and finally, if you’re working with IIS as a your web server, Microsoft have some IIS 7 instructions for setting content types. Oh yes, hosting services like Amazon S3 also let you set content types for freshly-uploaded files—add a comment if you need to know more.
Modifying the cache signature
Back to the cache file… I mentioned earlier how it controls whether the user agent should look for updated pages and resources or not. Any material changes in the file will tell the browser that it should download all relevant files again, and then perform a cache “swap” Very clever! But it’s not enough to simply re-save the manifest file to trigger this change—instead, you must ensure you make a proper edit to the file which modifies its length in some way. I usually do this by adding and / or removing comments, e.g. update it with the latest build version number or some-such. This can be a bind during development, but hey ho. Cache manifest comment lines are like many other configuration formats, in that they start with the good old hash symbol—“pound sign” to my trans-Atlantic chums) like so:
CACHE: index.html about.html cuckoos.html # Here's a comment # Updated 15-Jun-2011
The cache in-depth
In addition to the CACHE: bit, manifest files can include a couple of other section types. The first of these is NETWORK. This section should comprise all resources in your site / app which should not be made available in the off-line cache. In other words, this should include resources for which a proper connection to your server is required, typically this includes stuff like dynamic pages, search functions, all that. Here’s how you use it (wild-cards are possible too, so that you ensure server-based resources are always employed when the user is connected):
CACHE MANIFEST NETWORK: search_nest.php CACHE: index.html about.html cuckoos.html
So that’CACHE and NETWORK. The final member of this unholy trio is the FALLBACK section, which takes a single URL “pattern” as its value. This section is interesting because up until now we have listed all the pages and resources we want to take off-line, explicitly. That’s good practice, and means that the browser knows to cache all those pages head of time ready for when it goes off-line. However, you don’t have to list all your off-line pages in the cache manifest. If you really want to, you can simply rely on having the
<html manifest="./cache.manifest"> line at the top of all pages that should be cached, and you leave the manifest be. The up-side to this is minimal maintenance: if you have a big ole’ site, or one in which pages are added willy-nilly, you’re set for going off-line at any time without constantly updating your cache manifest. The downside is that user agents don’t know about these various pages up-front, so unless a user has actually visited a given page, it won’t be cached locally for them. This is where our FALLBACK section comes in. Rather than throwing an error when such a page is encountered in the off-line cache, the user sees the page referenced instead. Here’s an example:
FALLBACK: / not_here_guv.html
The initial forward slash means that this behaviour kicks in for any page not in the local cache. The second bit is the page that should be shown—some kind of placeholder telling them to go on-line (or whatever).
That’s quite enough for now. Come back soon when we will start to build a simple off-line web site, whoop whoop!