Abstracting BGP from forwarding hardware with OpenFlow?

Interesting hallway discussion at RIPE 63 last week. Olaf Kolkman stopped me in the coffee break and asked if I could think of any way of intercepting BGP data in a switch/router, and somehow farming if out to an external control plane to process the BGP update, making the routing decisions external from the forwarding device and then updating the FIB in switch/router.

I had a think for about 15-20 seconds and asked “What about OpenFlow?”

In a classic switch/router type device, BGP packets are identified in the ingress packet processor and punted up to the control plane of the box. The routing process doesn’t run in the forwarding silicon, it’s running in software in the system CPU(s). The routing process evaluates the routing update, makes changes to the RIB in accordance, updates the system FIB, and then sends an internal message to program the forwarding silicon to do the right thing with the packet.

I’m assuming at this stage, that you understand the rationale behind wanting to move the routing decisions outside of the forwarding hardware? There are many reasons why you might want to do this: centralised routing decisions being one (hopefully quicker convergence?), the ability to apply routing policy based on higher level application needs (for example in cluster environments), to run routing decisions in powerful commodity hardware (rather than specialised, expensive, networking hardware), to run customised routing code to suit local requirements, or as Dave Meyer helpfully said, “Allows you do lots of abstractions”.

So, why not try and do this with Openflow?

It’s designed to make pattern matches on incoming packets in a switch and where necessary punt these packets to an OpenFlow controller for further processing. The OpenFlow controller can also signal back to the switch on what to do with further packets with the same properties, effectively programming the forwarding hardware in the switch. Not significantly different with how BGP updates are processed now, except it’s all happening inside the box.

It looks like OpenFlow could be the thing – except it’s probably a little young/incomplete to do what we’re asking of it at this stage – but it’s a rapidly developing protocol, and we’ve got folks who are well deployed in the SP arena that have already said they are roadmapping to build OpenFlow functionality into their SP product lines, such as Brocade.

It seems to me that there are plenty of open source routing daemons out there (Quagga, OpenBGPd, BIRD) which could serve as the outboard routing control plane.

So what seems to be needed is some sort of OpenFlow controller to routing daemon shim/abstraction layer, so that an existing BGP daemon can communicate with the OpenFlow controller, which seems to be what the QuagFlow project is doing.

Whither (UK) Regional Peering – Pt 3

Anyone still using C7513s?At the end of the last post, I vaguely threatened that at some point I’d go on to discuss IX Participant Connectivity.

The topic got a “bump” to the front of the queue last week, thanks to a presentation from Kurtis Lindqvist, CEO of Sweden’s Netnod exchange points, given at the RIPE 63 meeting in Vienna.

Netnod have been facing a dilemma. They provide a highly resilient peering service for Sweden, consisting of multiple discrete exchanges in various Swedish cities, with the biggest being in Stockholm – where they operate two physically seperate, redundant exchanges. They currently provide all their participants in Stockholm with the facility to connect to both fabrics, so they can benefit from the redundancy this provides. Sounds great doesn’t it? If one platform in Stockholm goes down, the other is up, traffic keeps flowing. Continue reading “Whither (UK) Regional Peering – Pt 3”

Just let IPv4 run out. It’s over. Just get on with it.

So, I’m currently at the RIPE 63 meeting in Vienna. Obviously, one of the ongoing hot topics here is IPv4 depletion, at times consisting of discussion on either a) the transition away from IPv4 to IPv6 via various transition mechanisms, and b) how to make the pitiful amount of IPv4 addressing that’s left last as long as possible.

One of the things that is often said about (b) is that it shouldn’t be done to death, IPv4 should just be allowed to run out, we get over it, and deploy IPv6. However (b) behaviour is to be expected when dealing with exhaustion of a finite resource.

There are similarities and parallels to be drawn between IPv4 runout and IPv6 adoption, fossil fuel depletion and movement to alternative energy techologies. The early adopters and the laggards. The hoarders and speculators. The evangelists and the naysayers.

So, for a minute don’t think about oil and gas resources being depleted, that’s way in the future. We’re facing one of the first examples of exhaustion of a finite resource on which businesses and economies depend.

If the IPv4 depletion and IPv6 (slow) adoption situation is a dry run of what might actually happen when something like oil runs out, then we should be worried, because we can’t just rely on carrier grade NAT to save us.

More specifics driving traffic to transit?

Interesting talk at RIPE 63 in Vienna today from Fredy Kuenzler of the Swiss network Init7 – How more specifics increase your transit bill (video transcript).

It proposes that although you may peer with directly with a network, any more specific prefixes or deaggregated routes which they announce to their upstream transit provider will eventually reach you, and circumvent the direct peering. If this forces traffic to your transit provider, it costs you money per meg, rather than it being covered in your (usually flat) cost of peering.

Of course, if it’s the one transit provider in the middle, they are getting to double-dip – being paid twice (once on each side) for the same traffic! Nice if you can get it!

So, the question is, how to find these more specific routes mark them as unattractive and not install them in your Forwarding Table, preferring the peered route, and saving you money.

Geoff Huston suggests he could provide a feed or a list of the duplicate more specific routes, crunching this sort of thing is something he’s been doing for ages with BGP routing data, such as the CIDR Report.

But the question remains how to take these routes and either a) keep them in the table, but deprefer the more specific which breaks a fundamental piece of decision making in BGP processing, or b) filter them out entirely, without affecting redundancy if the direct peering fails for any reason.

I started out being too simplistic, but hmm… having a think about this one…