IXP – Mike's Twitter Overflow – Tech, Travel, other stuff

Successful 1st IXLeeds Open Meeting

I attended by all accounts a very successful first open meeting for the IXLeeds exchange point yesterday – with around 120 attendees, including many faces that are not regulars on the peering circuit making for brilliant networking opportunities and great talks from the likes of the Government super-fast broadband initiative, BDUK, and energy efficient processor giants ARM (behind the technology at the heart of most of the World’s smartphones), as well as more familiar faces such as RIPE NCC and LINX, among others.

Definitely impressed with the frank discussion that followed the talk by the DCMS’ Robert Ling on BDUK funding and framework, but still sceptical that it’s going to be any easier for smaller businesses to successfully get access to the public purse.

Andy Davidson, IXLeeds Director, was able to proudly announce that IXLeeds now provides support for jumbo frames via a seperate vlan overlaid on their switch, which is probably the only IXP in the UK which officially offers and promotes this service – at least for the time being. Of course, they are supporting a 9k frame size…

Well done to my friends and colleagues of IXLeeds for making it to this major milestone, and doing it in great style. It seems a long, long way from a discussion over some pizza in 2008.

The only thing I didn’t manage to do while in Leeds is take a look at the progress on the next phase of aql’s Salem Church data centre, but I’m sure I’ll just have to ask nicely and drop by aql at some point in the future. 🙂

Please accept new prefixes XYZ behind ASfoo – make it stop!

Those of you who ran networks in the 1990s (possibly even in the early 2000s) will remember the excitement you had joining your first Internet exchange, plugging in that shiny new cable to your router interface, and setting up your first peerings.

Back then, you may also remember that in the rapidly growing Internet of the day, it was common courtesy to let your peers know that you’ve taken on a new customer, or acquired some new address space, so they could update their configs – particularly any filtering they were doing on the routes exchanged with you, which were often quite small and maintained manually, except for the largest providers.

Your message would go something like this:

Continue reading “Please accept new prefixes XYZ behind ASfoo – make it stop!”

BGP Convergence with Jumbo Frames

This is something of a follow up to Breaking the IXP MTU Egg is no Chicken’s game…

One of the reasons for adoption that was doing the rounds in the wake of Martin Levy’s internet draft on the topic of enabling jumbo frames across shared media IXPs is that using jumbos will help speed up BGP convergence during startup. The rationale here is that session set up and bulk update exchange will happen more quickly over a jumbo capable transport.

Something about this didn’t sit right in my mind, it seemed like a red herring to me. A tail wagging the dog, so to speak. The primary reasons for wanting jumbos are already documented in the draft and discussed elsewhere. If using jumbos gave a performance boost during convergence, then it was a nice bonus, but that flew in the face of my experience of convergence – that it’s more likely to be bound by the CPU rather than the speed of the data exchange.

I wondered if any research had been done on this, so I had a quick Google to see what was out there.

No formal research on the first page of hits, but some useful (if a few years old) archive material from the Cisco-NSP mailing list, particularly this message…

Spent some time recently trying to tune BGP to get
convergence down as far as possible. Noticed some peculiar
behavior.

I'm running 12.0.28S on GSR12404 PRP-2.

Measuring from when the BGP session first opens, the time to
transmit the full (~128K routes) table from one router to
another, across a jumbo-frame (9000-bytes) GigE link, using
4-port ISE line cards (the routers are about 20 miles apart
over dark fiber).

I noticed that the xmit time decreases from ~ 35 seconds
with a 536-byte MSS to ~ 22 seconds with a 2500-byte MSS.
From there, stays about the same, until I get to 4000,
when it beings increasing dramatically until at 8636 bytes it
takes over 2 minutes.

I had expected that larger frames would decrease the BGP
converence time. Why would the convergence time increase
(and so significantly) as the MSS increases?

Is there some tuning tweak I'm missing here?

Pete.

While not massively scientific, this does seem like a reasonable strawman test of the router architecture of the day (2004), and got this reply from Tony Li:

How are your huge processor buffers set up?

I would not expect a larger MTU/MSS to have much of an
effect, if at all.  BGP is typically not constrained by
throughput.  In fact, what you may be seeing is that
with really large MTUs and without a bigger TCP window,
you're turning TCP into a stop and wait protocol.

Tony

This certainly confirmed my suspicion that BGP convergence performance is not constrained by throughput but by other factors, and primarily by processing power.

Maybe there are some modest gains in convergence time to be had, but there is a danger that loss of a single frame of routing information data (due to some sort of packet loss, maybe a congested link, or a queue drop somewhere in a shallow buffer) could cause retransmits sufficiently damaging to slow reconvergence.

It somewhat indicates that performance gains in BGP convergance are marginal and “nice to have”, rather than a compelling argument to deploy jumbos on your shared fabric. The primary arguments are far more convincing, and my opinion is that we shouldn’t allow a fringe benefit (that may even have it’s downsides) such as this to cloud the main reasoning.

It does seem like some more up-to-date research is necessary to accompany the internet-draft, maybe even considering how other applications (beside BGP) handle packet drops and data retransmits in a jumbo framed environment? Does it reach a point where application performance is being impacted because a big lump of data got retransmitted.

Possibly, there is some expertise to be had from the R&E community which have been using jumbo capable transport for a number of years.?

Breaking the IXP MTU Egg is no Chicken’s game

Networks have a thing called a Maximum Transmission Unit (MTU), and on many networks it’s long been somewhere around 1500 bytes, the default MTU of that pervasive protocol, Ethernet.

Why might you want a larger MTU? For a long time the main reason was if you’re transferring very large amounts of data, you reduced the framing and encapsulation overheads. More recent reasons for wanting a larger MTU include being able to accommodate additional encapsulations (such as MPLS/VPLS) in the network without reducing the end-to-end MTU of the service, to carry protocols which by default have a higher MTU (such as FCoE, which defaults to ~2.5k bytes), and make things like iSCSI more efficient.

There’s often been discussions about whether Ethernet-based Internet Exchange Points – the places where networks meet and interconnect over a shared fabric – have been one of the barriers (there are lots!) to adoption of a higher MTU in the network. Most are based on Ethernet, and most have a standard Ethernet MTU of ~1500 bytes.

Ethernet can carry larger frames, up to around the 9k byte mark in most cases. These are known as “Jumbo Frames“. Here is quite a nice article about the ups and downs of jumbo frame support from the perspective of doing it on your home network.

The inter-provider networks to date where you can usually depend on having a higher end-to-end MTU are the large Research and Educational Networks, such as JANET in the UK. With white-coated (and occasionally bearded) scientists wanting to move huge volumes of experimental data around the world, they’ve long needed and have been getting the benefits of a larger MTU. They deliberately interconnect their networks directly to ensure the MTU isn’t reduced by a third-party enroute.

This has recently resurfaced in the form of this Internet Draft submitted by my learned friend Martin Levy of Hurricane Electric – yes, he of “Just deploy IPv6” fame.

With this draft Martin is trying to break what is at best a chicken-and-egg, and at worst a deadlock:

Should an IXP support jumbo frames?
What should the maximum frame size be?

He’s trying to support his argument by collecting the various pieces of rationale for supporting inter-provider jumbos in one place, to guide IXP communities in making the right decision, and hopefully documenting the pitfalls and things to watch out for – the worst being that your packets go into the bit-bucket because of a MTU mismatch and PMTUD being broken by accident at best or foolish design at worst.

My own personal, most-recent experience dancing around this particular handbag as with an IXP operators hat on, gave the following results:

A miniscule (<5%) proportion of IXP participants wanted to exchange jumbo frames across the IXP
Of those who wanted jumbo frames, it was not possible to reach consensus on a supported maximum frame size. Some wanted 9k, others only wanted 4470, some wanted different 9k MTUs (9218, 9126, 9000), likely due to limitations of their own equipment and networks.

It was actually easier for this minority to interconnect bi-laterally over private pieces of wire or fibre, where they could also set the MTU for that link on a bi-lateral basis, rather than across a shared fabric where everyone had to agree.

Martin’s rationale is that folk argue about this because there’s no well-known guidance on the subject, so his draft is being proposed to provide just that and break the previous deadlock.

In terms of IXPs which do support a larger MTU today, there are a few, the most well-known probably being Sweden’s Netnod, which has long had an MTU of 4470, largely due to it’s own ancestry of originating on FDDI, and subsequently using Cisco’s proprietary DPT/SRP technology after the exchange outgrew FDDI (largely because of a local preference for maintaining a higher MTU). When Netnod moved to a Gigabit/10 Gigabit Ethernet based exchange fabric, the 4470 MTU was retained despite the newer ethernet hardware having support for a ~9k MTU, and it’s explicitly required by Netnod that IXP participant interfaces are configured with a 4470 MTU to avoid mismatches. It seems to be working pretty well.

One of the issues which is likely to cause discussion is where 100Mb Ethernet is deployed at an exchange, as this, generally speaking, cannot support jumbo frames. Does this create a “second class” exchange in some way?

Anyway, I applaud Martin for trying to take this slippery subject head-on. Looking forward to seeing where it goes.

Whither (UK) Regional Peering – Pt 2

It’s been a long while since I’ve blogged about this topic

Probably too long, as IXLeeds, something which inspired me to write Pt 1, is now a fully-fledged IX, not just a couple of networks plugged into a switch in a co-lo (all IXPs have to start somewhere!), but has formed a company, with directors, with about 12 active participants connected to its switch. Hurrah!

So, trying to pick up where I left off; in this post, I’m going to talk about shared fate, with respect to Internet Exchanges.

What do I mean by shared fate? Continue reading “Whither (UK) Regional Peering – Pt 2”

15 Years of INEX, me one year on

There were two anniversaries last week. The first was the 15th Birthday of INEX – the Internet Exchange Point in Dublin. To celebrate this, they organised a rather good event at Dublin’s history-steeped Mansion House (the first Dáil sat there in 1919) complete with distinguished speakers such as Dan Kaminsky and Geoff Huston, and a rather good dinner from the adjoining Fire Restaurant.

It was also Arthur’s Day, another excuse to drink copious quantities of the black stuff. Coincidence? You decide…

Dan spoke for over an hour, including Q&A, with no slides, no sheaf of notes, just this interesting stream of consciousness that made you want to sit up and listen.

Some things that Dan said got me thinking, not least the comment that “The world’s social life is being run from Silicon Valley”, and more to the point by a bunch of nerds (e.g. Facebook, G+, etc.), maybe some of the most anthrophobic people you might find! This linked up with some other stuff I’d been reading.

So I thought I’d try and make sense of what was going through my mind. Continue reading “15 Years of INEX, me one year on”