BGP Convergence with Jumbo Frames

This is something of a follow up to Breaking the IXP MTU Egg is no Chicken’s game

One of the reasons for adoption that was doing the rounds in the wake of Martin Levy’s internet draft on the topic of enabling jumbo frames across shared media IXPs is that using jumbos will help speed up BGP convergence during startup. The rationale here is that session set up and bulk update exchange will happen more quickly over a jumbo capable transport.

Something about this didn’t sit right in my mind, it seemed like a red herring to me. A tail wagging the dog, so to speak. The primary reasons for wanting jumbos are already documented in the draft and discussed elsewhere. If using jumbos gave a performance boost during convergence, then it was a nice bonus, but that flew in the face of my experience of convergence – that it’s more likely to be bound by the CPU rather than the speed of the data exchange.

I wondered if any research had been done on this, so I had a quick Google to see what was out there.

No formal research on the first page of hits, but some useful (if a few years old) archive material from the Cisco-NSP mailing list, particularly this message

Spent some time recently trying to tune BGP to get
convergence down as far as possible. Noticed some peculiar
behavior.

I'm running 12.0.28S on GSR12404 PRP-2.

Measuring from when the BGP session first opens, the time to
transmit the full (~128K routes) table from one router to
another, across a jumbo-frame (9000-bytes) GigE link, using
4-port ISE line cards (the routers are about 20 miles apart
over dark fiber).

I noticed that the xmit time decreases from ~ 35 seconds
with a 536-byte MSS to ~ 22 seconds with a 2500-byte MSS.
From there, stays about the same, until I get to 4000,
when it beings increasing dramatically until at 8636 bytes it
takes over 2 minutes.

I had expected that larger frames would decrease the BGP
converence time. Why would the convergence time increase
(and so significantly) as the MSS increases?

Is there some tuning tweak I'm missing here?

Pete.

While not massively scientific, this does seem like a reasonable strawman test of the router architecture of the day (2004), and got this reply from Tony Li:

How are your huge processor buffers set up?

I would not expect a larger MTU/MSS to have much of an
effect, if at all.  BGP is typically not constrained by
throughput.  In fact, what you may be seeing is that
with really large MTUs and without a bigger TCP window,
you're turning TCP into a stop and wait protocol.

Tony

This certainly confirmed my suspicion that BGP convergence performance is not constrained by throughput but by other factors, and primarily by processing power.

Maybe there are some modest gains in convergence time to be had, but there is a danger that loss of a single frame of routing information data (due to some sort of packet loss, maybe a congested link, or a queue drop somewhere in a shallow buffer) could cause retransmits sufficiently damaging to slow reconvergence.

It somewhat indicates that performance gains in BGP convergance are marginal and “nice to have”, rather than a compelling argument to deploy jumbos on your shared fabric. The primary arguments are far more convincing, and my opinion is that we shouldn’t allow a fringe benefit (that may even have it’s downsides) such as this to cloud the main reasoning.

It does seem like some more up-to-date research is necessary to accompany the internet-draft, maybe even considering how other applications (beside BGP) handle packet drops and data retransmits in a jumbo framed environment? Does it reach a point where application performance is being impacted because a big lump of data got retransmitted.

Possibly, there is some expertise to be had from the R&E community which have been using jumbo capable transport for a number of years.?

Ash Mair – Masterchef Pro 2011’s worthy Champion

Lo, the judgely tastebuds have had their say, and Ash Mair was revealed as Professional Masterchef 2011, as many expected.

He also did it in style, with another round of exquisite yet somehow hearty food that showcased the ingredients as much as it showcased his own skill. Merging often delicate Michelin star standard cooking and presentation with a plate of serious substance seems to be an elusive skill, yet Ash manages to pull it off time and time again.

I’m told that look on his face isn’t angst, it’s concentration and grim determination to pull it off.

But, I think it was a close call, as all three finalists pulled the stops out to make an amazing three courses.

I loved the look of Claire‘s smoked pigeon – getting the cooking and smokiness just right must have taken some serious timing and judgement. It’s something I’d have happily ordered in a restaurant. Great looking chocolate and coffee marquise for dessert as well, that would have tied with Ash’s “Spanish pain perdu”. Even though choc and cherries are a classic combo, Claire is clearly clueful when it comes to what works with pastry – just think of that lime cheesecake and bitter chocolate sorbet she did! Sadly, I wouldn’t have touched the oysters – they just aren’t my cup of tea, unadventurous prole that I am.

Steve‘s starter of confit salmon was right up my street, and the duck with braised chicory got my mouth watering. But his dessert – an abstracted peach melba – looked a lot on the plate, a bit too cluttered.

If I’d have walked into a restaurant and been presented with a menu composed of the dishes from all three finalists, I’d have had a tough time choosing – especially for main course!

Ash has been commenting on Twitter about the whirlwind which he now finds swirling around him, and it’s not just the howling south-westerlies we’ve been having earlier this week: He was on BBC Breakfast with Michel Roux Jr this morning, and I don’t know if it was just his laid-back Aussie style, but he still seemed almost stunned!

While Ash is the worthy Champion, the other two are still winners: Hopefully Steve is now on the road to his ambition of a small country house hotel with fantastic food, and as for Claire, I think the world’s her oyster. Just as long as I don’t have to eat any.

Update – 24th Jan 2012:

A lot of the searches which hit this page are wondering where to find Ash so you can go and eat his food. From this recent tweet it seems like he’s off to Barcelona to consult for a restaurant there. So, it may be a bit longer. There is a basque restaurant opening up in London, but so far, Ash doesn’t seem to be associated with it.

The other frequent search term landing here is for Claire’s chocolate sorbet. Sadly, it looks like that recipe is a secret known only to Claire, and now Michel Roux Jr. But, a selection of Claire’s recipes from Masterchef (including the rather good looking chocolate moelleux) can be found on the BBC Food recipe database, along with selected dishes from the other finalists.

Masterchef – The Professionals 2011: Everyone’s a winner…

Another diversion from the usual tech and travel diet, to something which might mean you need to go on a diet…

It’s the big final tonight of Masterchef – The Professionals, and we have three deserving finalists, angsty Aussie Ash Mair who battled on in an heroic-stylee despite getting hot fat splashed in the eye to produce an amazing main course last night, “Spiky” Steve Barringer making delightful desserts from disaster-zone-looking messy workspaces, and clever Brummie Claire Hutchings, who has made some amazing food during the series with brave flavour combinations, all the more astonishing considering she was 22 at the time they made the programme.

So, who’s going to nail it tonight?

Despite wanting Claire to win, my money is now on Ash. He’s consistent in so many ways – quality of the cooking, the high standard of presentation, and after a shaky start being criticised over weak or bland flavours, he’s learned something, now packing a real punch with his seasoning and sauces. He’s even consistent at looking totally embattled and under siege, yet still manages to bring it all together and plate up on time.

While Claire’s scallop sashimi for the chefs’ table last night was definitely brave and innovative, it wasn’t a plate I’d have wanted to eat. It felt like a step too far. Sometimes simple is good, less is more. It just might have been her undoing. It’s a shame, because throughout the series, Claire has just “got it”, time and time again.

But, they all have learned something along the way – Ash with flavours, Claire with preparation, timing and organisation during plateing up, and Steve with keeping it simple and keeping everything cleaned down – which is why they are here in the final.

While there can only be one champion, all three are winners, and deserve to go on to great things.

The final of Masterchef: The Professionals 2011 is on BBC Two tonight at 8pm.

Rural DIY Broadband: B4RN Launches

A few months ago, I’d blogged about B4RN, a community-led rural ultra-fast broadband project in my home county of Lancashire.

Today, they are holding a launch event in Lancaster to signify that they have reached their target number of interested parties who have committed to sign up for the service, and to announce they will be issuing shares in the organisation. I know the folks from ThinkBroadband are at the launch today, so expect to see some reporting from them shortly.

It’s heartening to look at this sea of raised hands from the community meeting – so many people putting their faith in their own community’s ability to organise and do this for themselves, rather than waiting for a centrally funded project that might not help them.

This is great news. I’d said before that DIY was the most realistic option for some of these regional communities. Fantastic stuff.

Breaking the IXP MTU Egg is no Chicken’s game

Networks have a thing called a Maximum Transmission Unit (MTU), and on many networks it’s long been somewhere around 1500 bytes, the default MTU of that pervasive protocol, Ethernet.

Why might you want a larger MTU? For a long time the main reason was if you’re transferring very large amounts of data, you reduced the framing and encapsulation overheads. More recent reasons for wanting a larger MTU include being able to accommodate additional encapsulations (such as MPLS/VPLS) in the network without reducing the end-to-end MTU of the service, to carry protocols which by default have a higher MTU (such as FCoE, which defaults to ~2.5k bytes), and make things like iSCSI more efficient.

There’s often been discussions about whether Ethernet-based Internet Exchange Points – the places where networks meet and interconnect over a shared fabric – have been one of the barriers (there are lots!) to adoption of a higher MTU in the network. Most are based on Ethernet, and most have a standard Ethernet MTU of ~1500 bytes.

Ethernet can carry larger frames, up to around the 9k byte mark in most cases. These are known as “Jumbo Frames“. Here is quite a nice article about the ups and downs of jumbo frame support from the perspective of doing it on your home network.

The inter-provider networks to date where you can usually depend on having a higher end-to-end MTU are the large Research and Educational Networks, such as JANET in the UK. With white-coated (and occasionally bearded) scientists wanting to move huge volumes of experimental data around the world, they’ve long needed and have been getting the benefits of a larger MTU. They deliberately interconnect their networks directly to ensure the MTU isn’t reduced by a third-party enroute.

This has recently resurfaced in the form of this Internet Draft submitted by my learned friend Martin Levy of Hurricane Electric – yes, he of “Just deploy IPv6” fame.

With this draft Martin is trying to break what is at best a chicken-and-egg, and at worst a deadlock:

  • Should an IXP support jumbo frames?
  • What should the maximum frame size be?

He’s trying to support his argument by collecting the various pieces of rationale for supporting inter-provider jumbos in one place, to guide IXP communities in making the right decision, and hopefully documenting the pitfalls and things to watch out for – the worst being that your packets go into the bit-bucket because of a MTU mismatch and PMTUD being broken by accident at best or foolish design at worst.

My own personal, most-recent experience dancing around this particular handbag as with an IXP operators hat on, gave the following results:

  • A miniscule (<5%) proportion of IXP participants wanted to exchange jumbo frames across the IXP
  • Of those who wanted jumbo frames, it was not possible to reach consensus on a supported maximum frame size. Some wanted 9k, others only wanted 4470, some wanted different 9k MTUs (9218, 9126, 9000), likely due to limitations of their own equipment and networks.

It was actually easier for this minority to interconnect bi-laterally over private pieces of wire or fibre, where they could also set the MTU for that link on a bi-lateral basis, rather than across a shared fabric where everyone had to agree.

Martin’s rationale is that folk argue about this because there’s no well-known guidance on the subject, so his draft is being proposed to provide just that and break the previous deadlock.

In terms of IXPs which do support a larger MTU today, there are a few, the most well-known probably being Sweden’s Netnod, which has long had an MTU of 4470, largely due to it’s own ancestry of originating on FDDI, and subsequently using Cisco’s proprietary DPT/SRP technology after the exchange outgrew FDDI (largely because of a local preference for maintaining a higher MTU). When Netnod moved to a Gigabit/10 Gigabit Ethernet based exchange fabric, the 4470 MTU was retained despite the newer ethernet hardware having support for a ~9k MTU, and it’s explicitly required by Netnod that IXP participant interfaces are configured with a 4470 MTU to avoid mismatches. It seems to be working pretty well.

One of the issues which is likely to cause discussion is where 100Mb Ethernet is deployed at an exchange, as this, generally speaking, cannot support jumbo frames. Does this create a “second class” exchange in some way?

Anyway, I applaud Martin for trying to take this slippery subject head-on. Looking forward to seeing where it goes.

The Asshole Effect

Sort of a follow-up to my post on “How to Reset a Broken Culture?”, I was recently directed to this great blog post on the “Asshole Effect” – which promotes the maxim that, basically, really successful companies don’t employ assholes.

It brought a smile to my face. Particularly the multiplication effect of having hired one idiot, you’ll end up with more, so you want to avoid hiring any in the first place, if at all possible.

So maybe this is the way to fix a broken culture? Systematically purge the organisation of the people who are dragging it down, starting at the top.

I guess the question here is can your already battered brand withstand the short-term unrest in the time it takes to purge the non-desireables from your employ?

It’s an interesting theory. Would love to see it in practice.

Comcast Residential IPv6 Deployment Pilot

Comcast, long active in the IPv6 arena have announced that they will be doing a native residential IPv6 deployment in Pleasanton, CA, on the edge of the San Francisco Bay Area, which will be a dual-stacked, native v4/v6 deployment with no NAT.

This is a much needed move to try and break the deadlock that seems to have been holding back wide scale v6 deployment in mass market broadband providers. Apart from isolated islands of activity such as XS4ALL‘s pioneering work in the Netherlands, v6 deployment has largely been available only as an option from providers focused on the tech savvy user (such as A&A in the UK).

Sure, it’s a limited trial, and initially aimed at single devices only (i.e. one device connected directly to the cable modem), but it’s a start, and there’s plans to expand this as experience is gained.

Read these good blog articles from Comcast’s John Brzozowski and Jason Livingood about the deployment and it’s aims.

The Return of “Scary Monica”

I don’t normally blog on things like TV programmes, but this week marked a highlight in the Autumn TV calendar for me: The return of Masterchef: The Professionals. Maybe it’s because I love good food. But maybe it’s because it appeals to my sense of schadenfreude.

The format is different to “vanilla” Masterchef: the eager amateur cooks are replaced by earnest chefs, ready to take their cooking up a gear; and while cuddly Gregg Wallace and his sweet tooth still front up the show, co-judge John Torode is replaced by Michelin-starred Michel Roux Jr with his classical French cooking, perfectionist presentation, demanding palate and seemingly boundless enthusiasm for good food – you just watch the smile on his face as he plates up a demonstration dish.

Just what *are* you doing to that octopus?
Just what *are* you doing to that octopus?

However, if cooking for a member of the Roux kitchen dynasty isn’t enough to make you want to raise your game, Michel Jr has a (not so secret) weapon up his sleeve – his fearsome sous chef, Monica Galetti, who seems to have a reputation for perfection and ruling the kitchens of Le Gavroche with her amazing set of facial expressions. One look from Monica, and you know whether you’ve got it right, or whether you’re in serious trouble and need to start bailing.

It’s right there on Monica’s face. The expressions say it all, you know almost exactly what she’s thinking.

I’ve never seen anyone quite have the same effect on men hardened by working in a commercial kitchen. Cooking for Monica seems to reduce the most competent of people to timid, quivering, shaking wrecks quicker than you can reduce a red wine jus on full gas. They are quaking in their boots before they even pick a knife up.

One test is that they make the chefs perform a 10-15 minute technical challenge, set by Monica, to demonstrate certain basic kitchen skills and the ability to work under time pressure, e.g. make an Italian meringue, decorate these desserts with spun sugar, make a crab salad using only meat from inside the shell, make a steak tartare, that sort of thing. To increase the pressure further, Monica demonstrates to camera first and makes it look effortless, then the chefs are brought in one-by-one to complete the challenge, receiving Gregg and Monica’s undivided attention. They are often shaking so much that I’m amazed no-one has sliced their fingers off yet.

Monica surely can’t be all scary, though? The good news is that the widened eyes, cutting critique and looks of incredulity as the hapless masscare yet another innocent scallop are rapidly replaced by warm smiles and compliments all round when there are shows of genuine kitchen prowess.

But, if you want to see grown men, some with tattooed forearms, cry, look no further.

Masterchef: The Professionals is on BBC Two Monday-Thursday evenings for the next few weeks – times vary from day to day.

Abstracting BGP from forwarding hardware with OpenFlow?

Interesting hallway discussion at RIPE 63 last week. Olaf Kolkman stopped me in the coffee break and asked if I could think of any way of intercepting BGP data in a switch/router, and somehow farming if out to an external control plane to process the BGP update, making the routing decisions external from the forwarding device and then updating the FIB in switch/router.

I had a think for about 15-20 seconds and asked “What about OpenFlow?”

In a classic switch/router type device, BGP packets are identified in the ingress packet processor and punted up to the control plane of the box. The routing process doesn’t run in the forwarding silicon, it’s running in software in the system CPU(s). The routing process evaluates the routing update, makes changes to the RIB in accordance, updates the system FIB, and then sends an internal message to program the forwarding silicon to do the right thing with the packet.

I’m assuming at this stage, that you understand the rationale behind wanting to move the routing decisions outside of the forwarding hardware? There are many reasons why you might want to do this: centralised routing decisions being one (hopefully quicker convergence?), the ability to apply routing policy based on higher level application needs (for example in cluster environments), to run routing decisions in powerful commodity hardware (rather than specialised, expensive, networking hardware), to run customised routing code to suit local requirements, or as Dave Meyer helpfully said, “Allows you do lots of abstractions”.

So, why not try and do this with Openflow?

It’s designed to make pattern matches on incoming packets in a switch and where necessary punt these packets to an OpenFlow controller for further processing. The OpenFlow controller can also signal back to the switch on what to do with further packets with the same properties, effectively programming the forwarding hardware in the switch. Not significantly different with how BGP updates are processed now, except it’s all happening inside the box.

It looks like OpenFlow could be the thing – except it’s probably a little young/incomplete to do what we’re asking of it at this stage – but it’s a rapidly developing protocol, and we’ve got folks who are well deployed in the SP arena that have already said they are roadmapping to build OpenFlow functionality into their SP product lines, such as Brocade.

It seems to me that there are plenty of open source routing daemons out there (Quagga, OpenBGPd, BIRD) which could serve as the outboard routing control plane.

So what seems to be needed is some sort of OpenFlow controller to routing daemon shim/abstraction layer, so that an existing BGP daemon can communicate with the OpenFlow controller, which seems to be what the QuagFlow project is doing.

Whither (UK) Regional Peering – Pt 3

Anyone still using C7513s?At the end of the last post, I vaguely threatened that at some point I’d go on to discuss IX Participant Connectivity.

The topic got a “bump” to the front of the queue last week, thanks to a presentation from Kurtis Lindqvist, CEO of Sweden’s Netnod exchange points, given at the RIPE 63 meeting in Vienna.

Netnod have been facing a dilemma. They provide a highly resilient peering service for Sweden, consisting of multiple discrete exchanges in various Swedish cities, with the biggest being in Stockholm – where they operate two physically seperate, redundant exchanges. They currently provide all their participants in Stockholm with the facility to connect to both fabrics, so they can benefit from the redundancy this provides. Sounds great doesn’t it? If one platform in Stockholm goes down, the other is up, traffic keeps flowing. Continue reading “Whither (UK) Regional Peering – Pt 3”