Abstracting BGP from forwarding hardware with OpenFlow?

Interesting hallway discussion at RIPE 63 last week. Olaf Kolkman stopped me in the coffee break and asked if I could think of any way of intercepting BGP data in a switch/router, and somehow farming if out to an external control plane to process the BGP update, making the routing decisions external from the forwarding device and then updating the FIB in switch/router.

I had a think for about 15-20 seconds and asked “What about OpenFlow?”

In a classic switch/router type device, BGP packets are identified in the ingress packet processor and punted up to the control plane of the box. The routing process doesn’t run in the forwarding silicon, it’s running in software in the system CPU(s). The routing process evaluates the routing update, makes changes to the RIB in accordance, updates the system FIB, and then sends an internal message to program the forwarding silicon to do the right thing with the packet.

I’m assuming at this stage, that you understand the rationale behind wanting to move the routing decisions outside of the forwarding hardware? There are many reasons why you might want to do this: centralised routing decisions being one (hopefully quicker convergence?), the ability to apply routing policy based on higher level application needs (for example in cluster environments), to run routing decisions in powerful commodity hardware (rather than specialised, expensive, networking hardware), to run customised routing code to suit local requirements, or as Dave Meyer helpfully said, “Allows you do lots of abstractions”.

So, why not try and do this with Openflow?

It’s designed to make pattern matches on incoming packets in a switch and where necessary punt these packets to an OpenFlow controller for further processing. The OpenFlow controller can also signal back to the switch on what to do with further packets with the same properties, effectively programming the forwarding hardware in the switch. Not significantly different with how BGP updates are processed now, except it’s all happening inside the box.

It looks like OpenFlow could be the thing – except it’s probably a little young/incomplete to do what we’re asking of it at this stage – but it’s a rapidly developing protocol, and we’ve got folks who are well deployed in the SP arena that have already said they are roadmapping to build OpenFlow functionality into their SP product lines, such as Brocade.

It seems to me that there are plenty of open source routing daemons out there (Quagga, OpenBGPd, BIRD) which could serve as the outboard routing control plane.

So what seems to be needed is some sort of OpenFlow controller to routing daemon shim/abstraction layer, so that an existing BGP daemon can communicate with the OpenFlow controller, which seems to be what the QuagFlow project is doing.

More specifics driving traffic to transit?

Interesting talk at RIPE 63 in Vienna today from Fredy Kuenzler of the Swiss network Init7 – How more specifics increase your transit bill (video transcript).

It proposes that although you may peer with directly with a network, any more specific prefixes or deaggregated routes which they announce to their upstream transit provider will eventually reach you, and circumvent the direct peering. If this forces traffic to your transit provider, it costs you money per meg, rather than it being covered in your (usually flat) cost of peering.

Of course, if it’s the one transit provider in the middle, they are getting to double-dip – being paid twice (once on each side) for the same traffic! Nice if you can get it!

So, the question is, how to find these more specific routes mark them as unattractive and not install them in your Forwarding Table, preferring the peered route, and saving you money.

Geoff Huston suggests he could provide a feed or a list of the duplicate more specific routes, crunching this sort of thing is something he’s been doing for ages with BGP routing data, such as the CIDR Report.

But the question remains how to take these routes and either a) keep them in the table, but deprefer the more specific which breaks a fundamental piece of decision making in BGP processing, or b) filter them out entirely, without affecting redundancy if the direct peering fails for any reason.

I started out being too simplistic, but hmm… having a think about this one…