Unsensationalist SDN

SDN – Software Defined Networking – has become one of the trendy Silicon Valley buzzwords. Often abused and misused, it seems that all the marketing folks are looking at ways of passing what they already have on their product roadmaps as being “SDN”, while startups are hoping that it will bring in the latest tranche of VC money. Some folk would even have you believe that SDN can cook your toast, make your coffee and iron your shirts.

People are talking about massively centrally orchestrated networks, a move away from the distributed intelligence which on the wider Internet is one of it’s strong points. Yes, you could have a massive network with a single brain, and very centralised policy control, if that’s what you wanted. For some applications, it’s what you need – for instance in Google’s internal networks. But for others you may really need things to be more distributed. The way some people are talking, they make it sound like SDN is an “all or nothing” proposal.

But, is there a place for SDN between distributed and centralised intelligence that people are missing in the hype?

Put all the excitement to one side for a minute, and we’ll go back to basics.

The majority of large networking equipment today usually already has a distributed architecture, such that when a packet of data arrives that the receiving interface doesn’t know what to do with, it’s forwarded from the high-speed forwarding silicon to software running in a CPU. In some devices, there are multiple CPUs, often arranged in a hierarchy with a “master” CPU in charge of the overall system, and local CPUs delegated to run elements of the system, such as logical or physical groups of interfaces, or specific processes in the system.

When the forwarding hardware sends a packet to the CPU, to get a decision on what to do with it, it’s commonly known as a “CPU punt”.

The software running on the CPU examines the packet based on the configuration of the device, compares it against it’s internal tables and any configured policy, and makes a decision about what to do with it.

It sends the packet back to the forwarding engine, along with an instruction of how to program the forwarding hardware so that it knows what to do with other packets with the same properties (i.e. from the same data stream, or with the same destination address, etc.), whether that be to apply policy, forward it, drop it, etc.

This isn’t that different from how OpenFlow works, but the CPU is abstracted from the device in the data path and resides in a thing known as a “Controller”. Effectively, what were CPU punts now become messages to the OpenFlow Controller. The Controller, by nature of it residing on a more generalised computer, is capable of doing things that are less likely to be easily doable on the software running in a router or switch, in terms of making “special” decisions.

Basically, SDN in some respects looks like the next step in something we’ve been doing for years. It’s an evolution, does it need to be a revolution?

So, here’s where I think what’s currently a “missed trick” lies…

The forwarding hardware used in SDN doesn’t have to or need to be totally dumb. Some decisions you might be happy for it to make for itself. It could pick up details of what those are from the policy in the Controller, and these can be written to the local CPU and the local forwarding silicon.

But, other things you know you do want to “punt” to the Controller, get set up (through applied policy) and handled that way.

I can think of occasions in the past where I would have loved to be able to take the stream of CPU punts in a device that can otherwise happily make a lot of it’s own decisions and be able to farm them out for analysing and processing in such a way which wasn’t possible on the device, and convert this back to policy, config, and ultimately CAM programming. But, to be able to do this without basically lobotomising the device?

Does SDN have to be the “all or nothing” approach which seems to be what’s getting proposed by some of the SDN evangelists? Or is a hybrid approach more realistic and aligned with how we actually build networks?

What might OpenFlow actually open up?

A few weeks ago, I attended the PacketPushers webinar on OpenFlow – a networking technology that, while not seeing widespread adoption as yet, is still creating a buzz on the networking scene.

It certainly busted a lot of myths and misconceptions folks in the audience may have had about OpenFlow, but the big questions it left me with are what OpenFlow stands to open up, and what effect it might have on many well established vendors who currently depend on selling “complete” pieces of networking hardware and software – the classic router, switch or firewall as we know it.

If I think back to my annoyances back in the early 2000’s it was of the amount of feature bloat creeping into network devices, while we still tended to have a lot of monolithic operating systems in use, so a bug in a feature that wasn’t even in use could crash the device, because the code would be running, even if it wasn’t in use. I was annoyed because there was nothing I could do other than apply kludgy workarounds, and nag the vendors to ship patched code. I couldn’t decide to rip that bit of code out and replace it with some fixed code myself. When the vendors finally shipped fixed code, it was a reboot to install it. I didn’t much like being so dependant on a vendor, as not working for an MCI or UUnet (remember, we’re talking early 1999-2001 here, they are the big guys), at times my voice in the “fix this bug” queue would be a little mouse squeak to their lion’s roar, in spite of heading up a high-profile Internet Exchange.

Eventually, we got proper multi-threaded and modular OS in networking hardware, but I remember asking for “fast, mostly stupid” network hardware a lot back then. No “boiling the sea”, an oft-heard cliché these days.

The other thing I often wished I could do was have hardware vendor A’s forwarding hardware because it rocked, but use vendor B’s routing engine, as vendor A’s was unstable or feature incomplete, or vendor B’s just had a better config language or features I wanted/needed.

So, in theory, OpenFlow could stand to enable network builders to do the sorts of things I describe above – allowing “mix-and-match” of “stuff that does what I want”.

This could stand to threaten the established “classic” vendors who have built their business around hardware/software pairings. So, how do they approach this? Fingers-in-ears “la, la, la, we can’t hear you”? Or embrace it?

You should, in theory, and with the right interface/shim/API/magic in your OpenFlow controller, be able to plug in whatever bits you like to run the control protocols and be the “brains” of your OpenFlow network.

So, using the “I like Vendor A’s hardware, but Vendor B’s foo implementation” example, a lot of people like the feature support and predictability of the routing code from folks such as Cisco and Juniper, but find they have different hardware needs (or a smaller budget), and choose a Brocade box.

Given that so much merchant silicon is going into network gear these days, the software is now the main ingredient in the “secret sauce”, the sort approach that folks such as Arista are taking.

In the case of a Cisco, their “secret sauce” is their industry standard routing engine. Are they enlightened enough to develop a version of their routing engine which can run in an OpenFlow controller environment? I’m not talking about opening the code up, but as a “black box” with appropriate magic wrapped around it to make it work with other folks’ controllers and silicon.

Could unpackaging these crown jewels be key to long term survival?

Abstracting BGP from forwarding hardware with OpenFlow?

Interesting hallway discussion at RIPE 63 last week. Olaf Kolkman stopped me in the coffee break and asked if I could think of any way of intercepting BGP data in a switch/router, and somehow farming if out to an external control plane to process the BGP update, making the routing decisions external from the forwarding device and then updating the FIB in switch/router.

I had a think for about 15-20 seconds and asked “What about OpenFlow?”

In a classic switch/router type device, BGP packets are identified in the ingress packet processor and punted up to the control plane of the box. The routing process doesn’t run in the forwarding silicon, it’s running in software in the system CPU(s). The routing process evaluates the routing update, makes changes to the RIB in accordance, updates the system FIB, and then sends an internal message to program the forwarding silicon to do the right thing with the packet.

I’m assuming at this stage, that you understand the rationale behind wanting to move the routing decisions outside of the forwarding hardware? There are many reasons why you might want to do this: centralised routing decisions being one (hopefully quicker convergence?), the ability to apply routing policy based on higher level application needs (for example in cluster environments), to run routing decisions in powerful commodity hardware (rather than specialised, expensive, networking hardware), to run customised routing code to suit local requirements, or as Dave Meyer helpfully said, “Allows you do lots of abstractions”.

So, why not try and do this with Openflow?

It’s designed to make pattern matches on incoming packets in a switch and where necessary punt these packets to an OpenFlow controller for further processing. The OpenFlow controller can also signal back to the switch on what to do with further packets with the same properties, effectively programming the forwarding hardware in the switch. Not significantly different with how BGP updates are processed now, except it’s all happening inside the box.

It looks like OpenFlow could be the thing – except it’s probably a little young/incomplete to do what we’re asking of it at this stage – but it’s a rapidly developing protocol, and we’ve got folks who are well deployed in the SP arena that have already said they are roadmapping to build OpenFlow functionality into their SP product lines, such as Brocade.

It seems to me that there are plenty of open source routing daemons out there (Quagga, OpenBGPd, BIRD) which could serve as the outboard routing control plane.

So what seems to be needed is some sort of OpenFlow controller to routing daemon shim/abstraction layer, so that an existing BGP daemon can communicate with the OpenFlow controller, which seems to be what the QuagFlow project is doing.