PreviousNext…

Offline web apps: the cache (serious this time)

OK, so that last post was a bit annoying. Let’s do it for real this time!

Modern web browsers combined with HTML5 give us decent off-line capability, but usage isn’t too widespread at the moment it seems. Two things: (1) what do we mean by “off-line”? and (2) why go off-line at all?

Whilst it is true that web browsers cache content, that’s always been about improving the connected experience, reducing the performance hit. Off-line websites however are designed with the explicit premise that the user will use them whilst entirely disconnected from the web. As to the whys and wherefores of off-line web applications, well that’ just one point of focus for people looking at the mobile web (more on this another day).

So how does one go about achieving such witchcraft? Well, the first port of call is a “cache manifest” file. This is a special file which dictates to the user agent which web pages, images and related resources should be available off-line (and, conversely, which resources should not be available). This file is crucial in determining whether an app is truly able to go off-line or not: one mistake in the file, and it’s all over. Also worth noting at this juncture is that once created and declared (I’ll get to this in a sec.), your browser will not detect changes you make to your code unless the byte signature of the manifest has changed in some way. All clear? Good :-)

Let’s look at the make-up of a standard cache manifest (remember, it’s just a text file, and you should name it with the file suffix .manifest). Here’s what a cache manifest looks like for a basic three page website with associated images, Javascript and stylesheets:

CACHE MANIFEST

CACHE:
index.html
about.html
cuckoos.html

images/cuckoo.png
images/cuckoo-thumb.png
images/favicon.ico
images/logo.png

site.css
print.css

jquery.js
site.js

Pretty straightforward. Now that we’ve set it up, how do we tell our web site about the file? A reference to the cache manifest should go into every relevant web page making up your site (in this example I have chosen the imaginative file name “cache.manifest”):

<!DOCTYPE html>
<html manifest="./cache.manifest">
	<head>…

(Note that this snippet of HTML is actually HTML5—we’re dealing with the new world order baby! Any other type of mark-up, and all bets are off).

The final thing you have to know about the manifest file (before we look more closely at its content) is that it must be served up as with its own special content type, which is text/cache-manifest. You have to configure your web server to do this, else your lovingly-crafted cache will be ignored by everyone, and you will become a laughing-stock. To avoid such faux pas, if you’re running Apache, a line like this in your .htaccess file will set you right:

AddType text/cache-manifest .manifest

Alternatively, if you’re running Apache on a local machine—e.g. when using the standard web sharing in OS X like me—locate your user account configuration file and add the directive there instead:

/etc/apache2/users/YOUR_USER_NAME.conf

…and finally, if you’re working with IIS as a your web server, Microsoft have some IIS 7 instructions for setting content types. Oh yes, hosting services like Amazon S3 also let you set content types for freshly-uploaded files—add a comment if you need to know more.

Modifying the cache signature

Back to the cache file… I mentioned earlier how it controls whether the user agent should look for updated pages and resources or not. Any material changes in the file will tell the browser that it should download all relevant files again, and then perform a cache “swap” Very clever! But it’s not enough to simply re-save the manifest file to trigger this change—instead, you must ensure you make a proper edit to the file which modifies its length in some way. I usually do this by adding and / or removing comments, e.g. update it with the latest build version number or some-such. This can be a bind during development, but hey ho. Cache manifest comment lines are like many other configuration formats, in that they start with the good old hash symbol—“pound sign” to my trans-Atlantic chums) like so:

CACHE:
index.html
about.html
cuckoos.html
# Here's a comment
# Updated 15-Jun-2011

Debugging

Oh ho, this is where things get interesting! It is really easy to tinker away and end up with a set of pages that won’t go off-line properly. You shouldn’t even consider doing any of this off-line malarkey without some kind of test page that can check your cache manifest for you. Such a page is very easy to put together, thanks to the article Debugging HTML 5 Offline Application Cache by Jonathan Stark. In this post, he provides some boilerplate Javascript you can add to your test page which will report any cache errors. You won’t be able to accurately pinpoint the line in error, but at least you will know there’s an issue (and the “downloaded item” count may give you a clue as to which resource is the problem). This is indispensable. Seriously. Use it!

The cache in-depth

In addition to the CACHE: bit, manifest files can include a couple of other section types. The first of these is NETWORK. This section should comprise all resources in your site / app which should not be made available in the off-line cache. In other words, this should include resources for which a proper connection to your server is required, typically this includes stuff like dynamic pages, search functions, all that. Here’s how you use it (wild-cards are possible too, so that you ensure server-based resources are always employed when the user is connected):

CACHE MANIFEST
NETWORK:
search_nest.php

CACHE:
index.html
about.html
cuckoos.html

So that’CACHE and NETWORK. The final member of this unholy trio is the FALLBACK section, which takes a single URL “pattern” as its value. This section is interesting because up until now we have listed all the pages and resources we want to take off-line, explicitly. That’s good practice, and means that the browser knows to cache all those pages head of time ready for when it goes off-line. However, you don’t have to list all your off-line pages in the cache manifest. If you really want to, you can simply rely on having the <html manifest="./cache.manifest"> line at the top of all pages that should be cached, and you leave the manifest be. The up-side to this is minimal maintenance: if you have a big ole’ site, or one in which pages are added willy-nilly, you’re set for going off-line at any time without constantly updating your cache manifest. The downside is that user agents don’t know about these various pages up-front, so unless a user has actually visited a given page, it won’t be cached locally for them. This is where our FALLBACK section comes in. Rather than throwing an error when such a page is encountered in the off-line cache, the user sees the page referenced instead. Here’s an example:

FALLBACK:
/ not_here_guv.html

The initial forward slash means that this behaviour kicks in for any page not in the local cache. The second bit is the page that should be shown—some kind of placeholder telling them to go on-line (or whatever).

That’s quite enough for now. Come back soon when we will start to build a simple off-line web site, whoop whoop!

Comments

  1. Good article, can we have the type update instructions for amazon s3stickfight#
  2. Easy-peasy. From yer S3 console at http://console.aws.amazon.com/s3/home upload the cache manifest. Right-click on the file and select “Properties” (or click the relevant button) and a tabbed pane will appear at the bottom of the page.

    Click on “Metadata” and you should see content-type listed as the first attribute you can change. Whack in the new value(it’s not in the drop-down, you will need to key it in) and click “Save”: all done.

    BTW you can verify that S3 is serving up the file correctly by doing this from a terminal session:

    curl -I http://YOUR_SITE/YOUR_PATH/cache.manifest

    Ben Poole#
  3. Perfect timing on this series, I'm about to build my first web app that will work online / offline! I'm also going to be getting into the joy of localStorage, the embedded SQLite database and syncing data when online.

    For development, you could do something like having your main page detect a particular URL parameter (say ?recache) it could modify the cache.manifest for you and change a timestamp or something in it to force it to refetch.

    I was also contemplating the options for dynamically generating the cache.manifest file - my app is going to be essentially a single page Javascript app, so I could just scan the page and it's stylesheets for any dependant resources and write out a cache.manifest. Although that'll probably be more trouble than it's worth since the app is going to be fairly simple in terms of required resources.

    Anyway, look forward to more instalments, hopefully solving my problems for me before I encounter them in my development.Marcin Szczepanski#
  4. Marcin, you could certainly auto-generate the manifest cache; in fact, that should be a high priority for a moderately complex web application, as there’s simply too much scope for user error.

    With regards forcing an update, that’s a little more contentious. It should be sufficient to let the user agent and server work as designed: the server will send the appropriate HTTP status when the cache has been updated, and the browser should then honour that.

    (By the way, my next post will discuss the use of hashes and querystrings in web apps designed for off-line use—there are some mighty gotchas).Ben Poole#

Comments on this post are now closed.

About

I’m a developer / general IT wrangler, specialising in web apps, the mobile web, enterprise Java and the odd Domino system.

Best described as a simpleton, but kindly. Read more…