End of the line for buffers?

This could get confusing…The End of the Line by Alan Walker, licenced for reuse under the Creative Commons License

buffer (noun)

1. something or someone that helps protect from harm,

2. the protective metal parts at the front and back of a train or at the end of a track, that reduce damage if the train hits something,

3. a chemical that keeps a liquid from becoming more or less acidic

…or in this case, none of the above, because I’m referring to network buffers.

For a long time, larger and larger network buffers have been creeping into network equipment, with many equipment vendors telling us big buffers are necessary (and charging us handsomely for them), and various queue management strategies being developed over the years.

Now, with storage networks and clustering moving from specialised platforms such as fibre channel and infiniband to using ethernet and IP, and the transition to distributed clustering (aka “The cloud”), this wisdom is being challenged, not just by researchers, but operators and even equipment manufacturers.

The fact is, in lots of cases, it can often be better to let the higher level protocols in the end stations deal with network congestion rather than introduce variable congestion due to deep buffering and badly configured queueing in the network which attempts to mask the problem and confuses the congestion control behaviours built into the protocols.

So, it was interesting to find two articles in the same day with this as one of the themes.

Firstly, I was at the UKNOF meeting in London, where one of the talks was from a research working on the BufferBloat project, which is approaching the problem from the CPE end – looking at the affordable mass-market home router, or more specifically the software which runs on them and the buffer management therein.

Second thing I came across was a great blog post from technology blogger Greg Ferro’s Ethereal Mind – on his visit to innovative UK ethernet fabric switch startup Gnodal – who are doing some seriously cool stuff which removes buffer bloat from the network (as well as some other original ethernet fabric tech), which is really important for the data centre networking market with it’s latency and jitter sensitivity, which Gnodal are aiming at.

(Yes, I’m as excited as Greg about what the Gnodal guys are up to, as it’s really breaking the mould, and being developed in the UK, of course I’m likely to be a bit biased!)

Is the writing on the wall for super-deep buffers?

BGP Convergence with Jumbo Frames

This is something of a follow up to Breaking the IXP MTU Egg is no Chicken’s game

One of the reasons for adoption that was doing the rounds in the wake of Martin Levy’s internet draft on the topic of enabling jumbo frames across shared media IXPs is that using jumbos will help speed up BGP convergence during startup. The rationale here is that session set up and bulk update exchange will happen more quickly over a jumbo capable transport.

Something about this didn’t sit right in my mind, it seemed like a red herring to me. A tail wagging the dog, so to speak. The primary reasons for wanting jumbos are already documented in the draft and discussed elsewhere. If using jumbos gave a performance boost during convergence, then it was a nice bonus, but that flew in the face of my experience of convergence – that it’s more likely to be bound by the CPU rather than the speed of the data exchange.

I wondered if any research had been done on this, so I had a quick Google to see what was out there.

No formal research on the first page of hits, but some useful (if a few years old) archive material from the Cisco-NSP mailing list, particularly this message

Spent some time recently trying to tune BGP to get
convergence down as far as possible. Noticed some peculiar

I'm running 12.0.28S on GSR12404 PRP-2.

Measuring from when the BGP session first opens, the time to
transmit the full (~128K routes) table from one router to
another, across a jumbo-frame (9000-bytes) GigE link, using
4-port ISE line cards (the routers are about 20 miles apart
over dark fiber).

I noticed that the xmit time decreases from ~ 35 seconds
with a 536-byte MSS to ~ 22 seconds with a 2500-byte MSS.
From there, stays about the same, until I get to 4000,
when it beings increasing dramatically until at 8636 bytes it
takes over 2 minutes.

I had expected that larger frames would decrease the BGP
converence time. Why would the convergence time increase
(and so significantly) as the MSS increases?

Is there some tuning tweak I'm missing here?


While not massively scientific, this does seem like a reasonable strawman test of the router architecture of the day (2004), and got this reply from Tony Li:

How are your huge processor buffers set up?

I would not expect a larger MTU/MSS to have much of an
effect, if at all.  BGP is typically not constrained by
throughput.  In fact, what you may be seeing is that
with really large MTUs and without a bigger TCP window,
you're turning TCP into a stop and wait protocol.


This certainly confirmed my suspicion that BGP convergence performance is not constrained by throughput but by other factors, and primarily by processing power.

Maybe there are some modest gains in convergence time to be had, but there is a danger that loss of a single frame of routing information data (due to some sort of packet loss, maybe a congested link, or a queue drop somewhere in a shallow buffer) could cause retransmits sufficiently damaging to slow reconvergence.

It somewhat indicates that performance gains in BGP convergance are marginal and “nice to have”, rather than a compelling argument to deploy jumbos on your shared fabric. The primary arguments are far more convincing, and my opinion is that we shouldn’t allow a fringe benefit (that may even have it’s downsides) such as this to cloud the main reasoning.

It does seem like some more up-to-date research is necessary to accompany the internet-draft, maybe even considering how other applications (beside BGP) handle packet drops and data retransmits in a jumbo framed environment? Does it reach a point where application performance is being impacted because a big lump of data got retransmitted.

Possibly, there is some expertise to be had from the R&E community which have been using jumbo capable transport for a number of years.?

Couldn’t go with the Openflow? Archive is online.

Last week, I missed an event I really wanted to make it to – the OpenFlow Symposium, hosted by PacketPushers and Tech Field Day. Just too much on (like the LONAP AGM), already done a fair bit of travel recently (NANOG, DECIX Customer Forum, RIPE 63), and couldn’t be away from home yet again. I managed to catch less than 30 minutes of the webcast.

From being something which initially seemed to be aimed at academia doing protocol development, lots of people are now talking about OpenFlow as it has attracted some funding, and more interest from folks with practical uses.

I think it’s potentially interesting for either centralised control plane management (almost a bit like a route reflector or a route server for BGP), or for implementing support for new protocols which are really unlikely to make it into silicon any time soon, as well as the originally intended purpose of protocol development and testing against production traffic and hardware.

Good news for folks such as myself is that some of the stream content is now being archived online, so I’m looking forward to catching up with proceedings.

Dell Acquisition Taking Hold at Force 10?

It seems the rDell/Force 10 combined logoecent acquisition of Force 10 by Dell is starting to make itself felt, and not only in a change of the logo on the website.

Eagle-eyed followers of the product information on their website will have noticed the complete disappearance of the product information for the chassis-based Zettascale switch, the Z9512, which was announced back in April. Continue reading “Dell Acquisition Taking Hold at Force 10?”

Brocade in a stitch over Exit Strategy?

A couple of days ago, on 18th August, storage and ethernet vendor Brocade released their Q3 2011 results (their Year End is in October).

It roughly followed what was outlined in their Preliminary Q3 numbers (flat revenue year-on-year) released on 5th August, and their shares taking their biggest hit since IPO, back in the ’99 boom time. Despite showing earlier signs of rallying, their stock is still trading around the $3.40 mark as I write.

It’s no secret that Brocade has been looking for a buyer for a couple of years, since completing the Foundry Networks integration to add ethernet to their existing storage focus. However, in the buyout beauty contest, likely suitor Dell has just passed Brocade up in favour of fluttering it’s eyelashes at Force10 Networks, which abandoned it’s own plans to IPO in favour of the Dell acquisition.

Interestingly, it’s not as though Foundry ended up being an indigestable meal for Brocade, as some might have predicted. It seems quite the opposite. The growth in their ethernet business is somewhat offsetting a slowdown in their SAN equipment sales, though it’s unclear if some element of this is “ethernet fabric” displacement in what would be classic fibre-channel space.

So, might Brocade be finding themselves in a stitch over their investors getting their money out? Their Exit Strategy, which seemed to be concentrated on selling on to a large storage/server builder at a profit, doesn’t look like it’s getting much traction anymore. Are there many potential buyers left? What happens if the answer is “no”? How do you keep going?

Here I think lies part of the problem: It seems that the common investor Exit Strategy becomes focused around “doing stuff to make money”, rather than “doing stuff that makes money”.

That has been the achilles heel I’ve long suspected exists in technology investing: While there might be a good solid idea at the foundation, one often gets the impression that more corporate priority can be given over to ensuring there’s an Exit Strategy for the investors, and following that road, rather than a sound development strategy to ensure the product or service itself drives the success of the company, and not how you sell it. The two find themselves at juxtaposition and can be a source of unwanted friction, as well.

In Brocade’s case, I think the current marketing style isn’t doing them any favours. Look at brocade.com. Videos of talking heads, trotting out technobabble, or grinning product managers waxing lyrical while stroking their hardware. “Video data sheet” – now what the hell is that all about? Yet, everyone’s up to it. Animated banners on pages where you actually just want the dry data and cold hard facts. It’s almost like they don’t want you to find the information you’re really looking for. Please can I just have a boring old data sheet?

Maybe in cases like this it’s time to go back to basics: Making something you’re proud of, something you can be happy putting your name to, which is how many great products and brands developed in the past. Question is, have tech companies remembered how, and do investors have the longer-term stomachs for it?

A week for new 40G toys…

It’s been a week for new 40G launches in the Ethernet switch world…

First out of the gate this week has been Arista, with their 7050S-64, 1U switch, with 48 dual-speed 1G/10G SFP ports and four 40G QSFP ports, 1.28Tbps of switching, 960Mpps, 9MB of packet buffer, front-to-back airflow for friendly top-of-rack deployment, etc, etc.

Next to arrive at the party is Cisco, with their Nexus 3064, 1u switch, with 48 dual-speed 1G/10G SFP ports and four 40G QSFP ports, 1.28Tbps of switching, 950Mpps, 9MB of packet buffer, front-to-back airflow for friendly top-of-rack deployment, etc, etc.

Whoa! Anyone else getting deja vu!

Continue reading “A week for new 40G toys…”

This weeks oxymoron: Ethernet will never dominate in…

…Broadcast TV and Storage. Apparently.

I’ve just read a blog post by Shehzad Merchant of Extreme Networks, about a panel he recently participated in, where one of the other panelists asserted the above was true.

Fascinating that a conference in 2011 is talking about Ethernet not becoming dominant in broadcast TV.

There are several broadcast organisations who are already using large scale 10 Gig Ethernet platforms in places such as their archiving systems and in their playout platforms, and I’m not talking niche broadcasters, but big boys like ESPN. Not sure if any of them are using Extreme’s Purple boxes though.

This unnamed panelist would be better off moving into time-travel, as it seems are already able to come here from the past and make these assertions.

I do wonder if it’s actually the stick-in-the-mud storage industry which will be slower to move than the broadcasters!

Back at work down the mines for Ethernet Standards Developers…

The ink of the 100GE standard is barely dry, and the first releases of products are only just shipping. “Phew,” thinks the large network operator, “we’re good for another few years.”

Well, among the largest, probably not. They are already faced with needing to aggregate (run in parallel) multiples of 100GE interfaces in their busiest areas. This doesn’t come cheaply, if you consider a single interface – you’re talking about a high five-figure list price minimum for interfaces (Hankins, NANOG 50), potentially more.

Fortunately, having had a little bit of a break, some enlightened folk involved in the 802.3ba standard are getting on the case again.

John D’Ambrosia, who was chair of the 802.3ba Working Group, and whose day job is in the Office of the CTO at Force 10 Networks, is in the process of kicking off a “Ethernet Wireline Bandwidth Needs” assessment activity, under the IEEE Industry Connections banner, to steer the next steps for Ethernet, so it can keep up with what the network is demanding of it.

There’s not much else online about this as yet, the effort is very much new, so I’ll add some links once there’s more information available.

This is a much needed activity, as there were some criticsms during the last iteration of the standards process about whether the faster speed was really needed, and disagreements about how big the market would be, almost conservative, while at the same time others said it would be too little, too late, at too high a price.

Good to see the new approach being taken, laying solid groundwork for the next (Terabit? Petabit? Something more creative?) run at the standard.