At the end of the last post, I vaguely threatened that at some point I’d go on to discuss IX Participant Connectivity.
Netnod have been facing a dilemma. They provide a highly resilient peering service for Sweden, consisting of multiple discrete exchanges in various Swedish cities, with the biggest being in Stockholm – where they operate two physically seperate, redundant exchanges. They currently provide all their participants in Stockholm with the facility to connect to both fabrics, so they can benefit from the redundancy this provides. Sounds great doesn’t it? If one platform in Stockholm goes down, the other is up, traffic keeps flowing.
The same also applies if IXP peer X’s router on platform A goes down, if they have different routers attached to platform B, BGP fails over, traffic keeps moving, nothing to see here.
Hopefully, with another in-country, local peering, this failover and reconvergence is a quick affair, and does not cause serious disruption to traffic, or push too much traffic to transit – this comes with it’s own issues of potentially higher costs and suboptimal paths.
However, Netnod are wondering if the ISPs really care about all this redundancy. They may even be finding themselves under pressure from their members who “don’t care” about the redundancy, as they get it “elsewhere” (e.g. transit, peering at other IXPs). Should they consider continuing to provide it “as standard”? They seem to think there’s a 50/50 split in the Netnod membership as it stands.
From previous experience I know for a fact that in the UK there are a number of ISPs for whom the second LAN at LINX formed a critical part of their business continuity. Despite a mounting pressure from some folk that the second LAN didn’t serve a purpose (for them at least), there seemed to be some decent reasons to continue to provide a redundant alternative, beyond doing it just to be altruistic. Those who appreciated the second LAN tended to be UK-based or UK-centric networks, who needed the second LAN to maintain good routing to networks similar to themselves.
If money (and rackspace, and power) was no object, the ideal might be to have a seperate BGP-speaking peering router for every IXP port your network has. Nice and redundant. Utopia in some ways. Well, at least until you have to upgrade all your IOS/JunOS images in a hurry.
In the real world, you’re more likely to have a couple of peering routers in each major city you peer in, and you’ll connect your IXP ports to those:
Occasionally, there may be more than one IXP in the same building, but the extra IXP doesn’t justify an extra router, so you hook that IXP up to those peering routers as well:
That’s not a bad situation either. This scenario commonly exists in places such as London, where both LINX and LONAP are present in the same co-location facility. You’ve still got some level of redundancy.
But, if cash is strapped, or the city isn’t particularly big for you in terms of traffic, you might be tempted to do something like this:
Here, you have one peering router, and it’s doing all the work. That’s now a single point of failure (SPOF) for your peering in this city, regardless of the high availability features your choice of router has.
But the SPOF doesn’t end there. Let’s imagine that we connect this single router directly to a few more exchange points, in other cities, using some long-haul ethernet service (e.g. an Ethernet-over-MPLS VLL) that’s readily available:
Here you can see this one router is now connected to 7 different IXPs, spread across four different cities.
Even if this network’s peers think they are peering redundantly in multiple cities (maybe in multiple countries) with the ISP in question, they aren’t. A failure in the ISP’s peering router (Peering Router 1) could take down all peerings in all cities that router is connected to.
As the ISP is using the same long haul Ethernet carrier to hit cities B, C and D, an outage in the Ethernet carrier might take the ISP’s peering down in B, C and D at once. It’s yet another SPOF to worry about. Also maybe it looks like the peers are being misled to some extent that they have some redundant paths to the ISP in question?
It’s fairly obvious that the ISP who has done this doesn’t seem to be placing much priority on redundancy in their peering design. Maybe they are just trying to hit as many IXPs and peers as possible while spending as little as they possibly can? Perhaps they want to look bigger than they really are? Maybe they just don’t care?
This also makes a mockery of any redundancy that the IXP is trying to provide, say the two IXPs in City D are Stockholm’s Netnod, for instance. This is Kurtis’ dilemma. It’s something that the IXP operations community have been concerned about for some time. I applaud Kurtis for trying to approach it. That in itself is a tough subject, as it’s not about biting the hand that feeds the members to the IXPs, and it’s not about throwing stones at your participants’ connectivity choices. It’s about trying to build a better network.
Don’t get me wrong here. I am not saying that using long-haul Ethernet transport to reach exchanges is a bad thing. Long-haul Ethernet transport plays an important part in the peering ecosystem.
The healthy peering environment we enjoy in many of the European exchanges is partly thanks to the ready availability to connect a router to a remote exchange. It has enabled networks who could previously only afford to peer very locally (in their main country or city of operation) to reach new peers while keeping the fixed costs usually associated with international peering (international circuits, co-location costs, router costs, tail circuits, etc.) to be removed from the equation. It provides a low-risk way for a network to “test drive” the peering community in a remote exchange, possibly as a prelude to building physical infrastructure into the exchange itself.
Long-reach ethernet is almost certainly a good means of connecting to exchanges in secondary markets, which don’t support enough traffic, peers, customers or growth potential to justify a full network build.
The downside is that some networks continue to deploy long-reach Ethernet transport to reach exchanges in their primary markets, moving significant (>10G) traffic volumes, in non-redundant configurations, rather than doing the right thing and deploying equipment into the locality. It’s almost asking for trouble when something goes wrong, and this is what happens: We’ve seen routers fail to converge smoothly (or at all) because they are so heavily peered with the same neighbour ASes, over and over again, that the BGP process gets bogged down. That one router falling over has the ability to be highly disruptive in multiple geographically diverse locations.
This sort of deployment also has similar effects on regional peering exchanges: While the long-haul Ethernet service might make it easier to reach a growing regional IXP from your existing equipment, it doesn’t help reach the diversity and redundancy goals that the regional IXP is aiming to provide. What’s the point of having an IXP switch in Manchester, if the majority of routers attached to it are actually in London?
So, coming back to the core topic of regional interconnect: Multiple options for in-country peering are a good thing. But, for a regional IXP to flourish, it seems that it needs to have a core of locally-based participants, with significant network deployed in the locality, so that the IXP actually serves it’s purpose in terms of providing a geographically diverse alternative.