This is something of a follow up to Breaking the IXP MTU Egg is no Chicken’s game…
One of the reasons for adoption that was doing the rounds in the wake of Martin Levy’s internet draft on the topic of enabling jumbo frames across shared media IXPs is that using jumbos will help speed up BGP convergence during startup. The rationale here is that session set up and bulk update exchange will happen more quickly over a jumbo capable transport.
Something about this didn’t sit right in my mind, it seemed like a red herring to me. A tail wagging the dog, so to speak. The primary reasons for wanting jumbos are already documented in the draft and discussed elsewhere. If using jumbos gave a performance boost during convergence, then it was a nice bonus, but that flew in the face of my experience of convergence – that it’s more likely to be bound by the CPU rather than the speed of the data exchange.
I wondered if any research had been done on this, so I had a quick Google to see what was out there.
No formal research on the first page of hits, but some useful (if a few years old) archive material from the Cisco-NSP mailing list, particularly this message…
Spent some time recently trying to tune BGP to get convergence down as far as possible. Noticed some peculiar behavior. I'm running 12.0.28S on GSR12404 PRP-2. Measuring from when the BGP session first opens, the time to transmit the full (~128K routes) table from one router to another, across a jumbo-frame (9000-bytes) GigE link, using 4-port ISE line cards (the routers are about 20 miles apart over dark fiber). I noticed that the xmit time decreases from ~ 35 seconds with a 536-byte MSS to ~ 22 seconds with a 2500-byte MSS. From there, stays about the same, until I get to 4000, when it beings increasing dramatically until at 8636 bytes it takes over 2 minutes. I had expected that larger frames would decrease the BGP converence time. Why would the convergence time increase (and so significantly) as the MSS increases? Is there some tuning tweak I'm missing here? Pete.
While not massively scientific, this does seem like a reasonable strawman test of the router architecture of the day (2004), and got this reply from Tony Li:
How are your huge processor buffers set up? I would not expect a larger MTU/MSS to have much of an effect, if at all. BGP is typically not constrained by throughput. In fact, what you may be seeing is that with really large MTUs and without a bigger TCP window, you're turning TCP into a stop and wait protocol. Tony
This certainly confirmed my suspicion that BGP convergence performance is not constrained by throughput but by other factors, and primarily by processing power.
Maybe there are some modest gains in convergence time to be had, but there is a danger that loss of a single frame of routing information data (due to some sort of packet loss, maybe a congested link, or a queue drop somewhere in a shallow buffer) could cause retransmits sufficiently damaging to slow reconvergence.
It somewhat indicates that performance gains in BGP convergance are marginal and “nice to have”, rather than a compelling argument to deploy jumbos on your shared fabric. The primary arguments are far more convincing, and my opinion is that we shouldn’t allow a fringe benefit (that may even have it’s downsides) such as this to cloud the main reasoning.
It does seem like some more up-to-date research is necessary to accompany the internet-draft, maybe even considering how other applications (beside BGP) handle packet drops and data retransmits in a jumbo framed environment? Does it reach a point where application performance is being impacted because a big lump of data got retransmitted.
Possibly, there is some expertise to be had from the R&E community which have been using jumbo capable transport for a number of years.?