Flash the Cache

Some trends in content distribution have made me think back to my early days on the Internet, both at University, and during my time at a dialup ISP, back in the 1990s.

Bandwidth was a precious commodity, and webcaching, by deploying initially Harvest (remember Harvest?) and latterly it’s offspring Squid, became quite popular, to reduce load on external links. There was also the ability to exchange cache content with other topologically close caches – clustering your cache with those on your neighbour networks.

(I remember that at least one of the UK academic institutions – Imperial College, I think, or maybe it was Janet – had a very popular cache that it openly peered with other caches in the UK, was available via peering across LINX, and as a result was popular and well populated.)

There were attendant problems – staleness of cached content could blight some more active websites, and unless you tried enforced redirection of web traffic (unpopular in some quarters, even today, where transparent cacheing is commonplace), the ISP often had to convince your users to opt-in to using the cache through changing browser settings.

It was no surprise that once bandwidth prices fell, caches started to fall out of favour.

However, that trend has been reversing in recent times… the cache is making a comeback, but in a slightly different guise: Rather than general purpose caches that take a copy of anything passing by, these are very specific caches, targeted and optimised for the content, part of the worldwide growth of the distributed CDN.

Akamai have long been putting their servers out into ISP networks, and into major Internet Exchanges, delivering content locally to ISP subscribers. They famously say they “don’t have a backbone” – they just distribute their content through these local clusters. Akamai are delivering a local copy of “Akamaized” web content to local eyes, and continuing to experience significant growth.

Google is also in the caching game, with the “Google Content Cache” (GCC).

I heard a Google speaker at a recent conference explain how a GCC cluster installation at a broadband ISP provided an 85-90% saving in external bandwith to Google hosted content. To some extent this is clearly driven by YouTube content, but it has other smarts too.

So, what’s helped make caching popular again?

Herd Mentality – Social Networking is King. A link is posted and within minutes, it can have significant numbers of page impressions. For a network with a cache, that content only needs to be fetched once.

Bandwidth Demand – It’s not unheard of for large broadband networks to have huge (nx10G) private peerings with organisations such as Google. At some point, this is going to run into scaling difficulties, and it makes sense to distribute the content closer to the sink.

Fault Tolerance – User expectation is “it should just work”, distributed caches can help prevent localised failures from having widescale effect. (Would it have helped BBC this week?)

Response Speed – Placing the content closer to the user minimises latency, improves the user experience. For instance, GCC apparently takes this one step further, acting as a TCP proxy for more interactive Google services such as Gmail – this helps remove “spongyness” of interactive sessions for those in countries with limited or high-latency external connectivity (some African countries, for instance).

So, great, cacheing is useful and has taken it’s place in the Network Architect’s Swiss Army Knife again. But what’s the tipping point for installing something like the GCC or Akamai cluster on your network? There’s two main things at work: Bandwidth and Power.

Having a CDN cluster on your network doesn’t come for free – even if the CDN owns the hardware, you have to house it and power it and cool it. The normal hardware is a number of general purpose high spec 1U rack-mount PCs.

So the economics seem to be a case of factoring in the cost of bandwidth (whether it’s transit or peering), router interfaces, data centre cross-connects, etc., versus the cost of hosting the gear.

One thought on “Flash the Cache”

  1. Nice blog entry. Another thing you didn’t mention: someone investing into bandwidth and global computing power can leverage that infrastructure to offer services beyond simple CDN services. Cloud storage, cloud computation, powerful SaaS solutions in the cloud and so on.

    It makes a lot of sense to investigate the capabilities of CDN vendors beyond their core capabilities and see what they manage to do with their infrastructure.

Comments are closed.

%d bloggers like this: