Was the LINX hit by an attack yesterday?

The short answer is “No“.

There has been speculation in the press, such as this Computer Weekly article, but I would say that it’s poorly informed, and even suggests that LINX’s pioneering deployment of Juniper’s PTX MPLS core switch might be a factor (which I think is a red herring).

It looks to have been some sort of storm of flooded traffic (such as unknown unicast, or broadcast) or problem in a network that’s attached to LINX, which managed to either congest the bandwidth of various ISP’s access lines into LINX, or congest the CPU on some of the attached routers, to the extent that they became unable to forward customer traffic, or unable to maintain accurate routing information (i.e. lost control plane integrity).

But, why did it appear to start on one of the two LINX peering platforms (the Extreme-based network) and then cascade to the physically seperate Juniper-based LAN?

I think one of the main reasons is because lots of ISP routers are connected to both LANs, as are the routers operated by the likely “problem” network which originated the flood of traffic in the first place. I’ve written before on this blog about why having a small number of routers connected to a larger number of internet exchanges can be a bad idea.

I’m pressed for time (about to get on a plane), so I’ll quickly sum up with some informed speculation:

I don’t think…

  • The LINX was DDoS-ed (or specifically attacked)
  • The deployment of the Juniper PTX in the preceeding 24 hours had anything to do with it -LINX also seem to think this, as they switched a further PTX into service overnight last night
  • That there was any intentional action which caused this, more likely some sort of failure or bug

I do think…

  • A LINX-attached network had a technical problem which wasn’t isolated and caused a traffic storm
  • It initially affected the Extreme-based platform
  • It affected the CPU of LINX-connected routers belonging to LINX members
  • Some LINX members deliberately disconnected themselves from LINX at the time to protect their own platform
  • The reported loss of peer connectivity on the Juniper platform was “collateral damage” from the initial incident, for reasons I’ve outlined above – busy routers
  • LINX did the right thing continuing their PTX deployment

I’m sure there will be more details forthcoming from LINX in due course. Their staff are trained not to make speculation, nor to talk to the press, during an incident. Even those who handle press enquiries are very careful not to speculate or sensationalise, which I’m sure dissapoints those looking for a story.

The moral of this story is redundancy and diversity are important elements of good network engineering and you shouldn’t be putting all your eggs in one basket.

Disclaimer: I used to work for LINX, and I like to think I’ve got more than half a clue when it comes to how peering and interconnect works.