Breaking the IXP MTU Egg is no Chicken’s game

Networks have a thing called a Maximum Transmission Unit (MTU), and on many networks it’s long been somewhere around 1500 bytes, the default MTU of that pervasive protocol, Ethernet.

Why might you want a larger MTU? For a long time the main reason was if you’re transferring very large amounts of data, you reduced the framing and encapsulation overheads. More recent reasons for wanting a larger MTU include being able to accommodate additional encapsulations (such as MPLS/VPLS) in the network without reducing the end-to-end MTU of the service, to carry protocols which by default have a higher MTU (such as FCoE, which defaults to ~2.5k bytes), and make things like iSCSI more efficient.

There’s often been discussions about whether Ethernet-based Internet Exchange Points – the places where networks meet and interconnect over a shared fabric – have been one of the barriers (there are lots!) to adoption of a higher MTU in the network. Most are based on Ethernet, and most have a standard Ethernet MTU of ~1500 bytes.

Ethernet can carry larger frames, up to around the 9k byte mark in most cases. These are known as “Jumbo Frames“. Here is quite a nice article about the ups and downs of jumbo frame support from the perspective of doing it on your home network.

The inter-provider networks to date where you can usually depend on having a higher end-to-end MTU are the large Research and Educational Networks, such as JANET in the UK. With white-coated (and occasionally bearded) scientists wanting to move huge volumes of experimental data around the world, they’ve long needed and have been getting the benefits of a larger MTU. They deliberately interconnect their networks directly to ensure the MTU isn’t reduced by a third-party enroute.

This has recently resurfaced in the form of this Internet Draft submitted by my learned friend Martin Levy of Hurricane Electric – yes, he of “Just deploy IPv6” fame.

With this draft Martin is trying to break what is at best a chicken-and-egg, and at worst a deadlock:

  • Should an IXP support jumbo frames?
  • What should the maximum frame size be?

He’s trying to support his argument by collecting the various pieces of rationale for supporting inter-provider jumbos in one place, to guide IXP communities in making the right decision, and hopefully documenting the pitfalls and things to watch out for – the worst being that your packets go into the bit-bucket because of a MTU mismatch and PMTUD being broken by accident at best or foolish design at worst.

My own personal, most-recent experience dancing around this particular handbag as with an IXP operators hat on, gave the following results:

  • A miniscule (<5%) proportion of IXP participants wanted to exchange jumbo frames across the IXP
  • Of those who wanted jumbo frames, it was not possible to reach consensus on a supported maximum frame size. Some wanted 9k, others only wanted 4470, some wanted different 9k MTUs (9218, 9126, 9000), likely due to limitations of their own equipment and networks.

It was actually easier for this minority to interconnect bi-laterally over private pieces of wire or fibre, where they could also set the MTU for that link on a bi-lateral basis, rather than across a shared fabric where everyone had to agree.

Martin’s rationale is that folk argue about this because there’s no well-known guidance on the subject, so his draft is being proposed to provide just that and break the previous deadlock.

In terms of IXPs which do support a larger MTU today, there are a few, the most well-known probably being Sweden’s Netnod, which has long had an MTU of 4470, largely due to it’s own ancestry of originating on FDDI, and subsequently using Cisco’s proprietary DPT/SRP technology after the exchange outgrew FDDI (largely because of a local preference for maintaining a higher MTU). When Netnod moved to a Gigabit/10 Gigabit Ethernet based exchange fabric, the 4470 MTU was retained despite the newer ethernet hardware having support for a ~9k MTU, and it’s explicitly required by Netnod that IXP participant interfaces are configured with a 4470 MTU to avoid mismatches. It seems to be working pretty well.

One of the issues which is likely to cause discussion is where 100Mb Ethernet is deployed at an exchange, as this, generally speaking, cannot support jumbo frames. Does this create a “second class” exchange in some way?

Anyway, I applaud Martin for trying to take this slippery subject head-on. Looking forward to seeing where it goes.

3 thoughts on “Breaking the IXP MTU Egg is no Chicken’s game”

  1. An industry colleague, Kevin Oberman, sent me this in response to this blog post…

    “Interesting perspective and probably valid. BGP convergence is a real CPU hog and I don’t see any likelihood that it will change dramatically. I discussed this (and other problems facing the Internet) at the Internet2/ESnet Joint Techs meeting in 2008.

    I also wonder about Tony’s suggestion that it might be tied to windowing issues. It would be interesting to run tcptrace over BGP sessions and see what might be happening with the windows.

    Of course, it also could well be a combination of the above and more. These things are seldom as simple as they first appear and this does not even look simple.”

Comments are closed.