Network Operations – Page 2 – Mike's Twitter Overflow

AMS-IX: Green Light to Incorporate US entity

Members of the Dutch Amsterdam Internet Exchange have given the organisation a green light to incorporate a US entity in order to engage with the Open IX initiative and have the ability to run an exchange in the US while minimising risk to the Dutch association and the Dutch operating company.

This completes the announcements from the big 3 European exchanges (LINX, AMS-IX and DECIX) to operate interconnection services in the US, with the first to make an overt move being LINX, who are in the process of establishing an operation in Northern Virginia. DECIX issued a press release last week that they plan to enter the New York market, and now AMS-IX have a member endorsement to make a move.

There have been concerns amongst the Dutch technical community, who have long held AMS-IX in high regard, that establishing operations in the US will leave the AMS-IX as a whole vulnerable to the sort of systemic monitoring that has been revealed in the press in past weeks. While this is partly the reason for the AMS-IX company suggesting a separate legal entity, in order to hold the US operations at arms length, is it enough for some of the Dutch community? Seems not. In this message the Dutch R&E Network SURFnet seem to think the whole thing was rushed, might not be in the best interests of the community, and voted against the move.

It has been noted that members of the Open IX community, including members of the Open IX Board, were openly calling for AMS-IX members to vote “YES”, and suggesting they also “go out and get 5 other votes”.

What do people think about that? Given that an IX that affiliates to Open IX will have to pay Open IX membership dues, was it right of them to appear to lobby AMS-IX members?

What do people think about the establishment of the separate legal entity? Will this be enough?

Has this done lasting damage to the standing of AMS-IX in the Dutch networking community? Does this matter, or has AMS-IX grown so large that such goodwill doesn’t matter anymore?

On the bigger question, is this sort of thing damaging in the long term to the EU peering community? Does the growth into different countries with different cultures threaten to dilute the member-based ethos that defines a lot of EU exchanges? Or is that just another management challenge for the IX operator to solve?

Might Equinix, who have so far not directly competed with the established EU exchanges, decide they are taking the gloves off and start their own European IX operations in a turf war?

Interesting times.

The Network Engineering “Skills Gap”

Talking to colleagues in the industry, there’s anecdotal evidence that they are having trouble finding suitable candidates for mid-level Network Engineering roles. They have vacancies which have gone unfilled for some time for want of the right people, or ask where they can go to find good generalists that have a grasp of the whole ecosystem rather than some small corner of it.

Basically, a “skills gap” seems to have opened up in the industry, whereby there are some good all-rounders at a fairly senior level, but trying to find an individual with a few years experience, and a good grounding in IP Networking, system administration (and maybe a bit of coding/scripting), network services (such as DNS) and basic security is very difficult.

Instead, candidates have become siloed, from the basic “network guy/systems guy” split to vendor, technology and service specific skills.

This is even more concerning given the overall trend in the industry toward increasing automation of networking infrastructure deployment and management and a tendency to integrate and coalesce with the service infrastructure such as the data centre and the things in it (such as servers, storage, etc.) – “the data centre as the computer”.

This doesn’t work when there are black and white divisions between the “network guy” and the “server guy” and their specific knowledge.

So, how did we get where we are? Firstly, off down a side-track into some self-indulgence…

I consider myself to be one of the more “all round” guys, although I’ve definitely got more of a lean toward physical networking infrastructure as a result of the roles I’ve had and the direction these took me in.

I come from a generation of engineers who joined the industry during the mid-90’s, when the Internet started to move from the preserve of researchers, academics, and the hardcore geeks, to becoming a more frequently used tool of communication.

Starting out as an Internet user at University (remember NCSA Mosaic and Netscape 0.9?) I got myself a modem and a dialup connection, initially for use when I was back home during the holidays and away from the University’s computing facilities, all thanks to Demon Internet and their “tenner a month” philosophy that meant even poor students like me could afford it. Back then, to get online via dialup, you had to have some grasp of what was going on under the skin when you went online, so you could work out what had gone wrong when things didn’t work. Demonites will have “fond” memories of KA9Q, or the motley collection of things which allowed you to connect using Windows. Back then, TCP/IP stacks were not standard!

So, out I came from University, and fell into a job in the ISP industry.

Back then, you tended to start at the bottom, working in “support”, which in some respects was your apprenticeship in “the Internet’, learning along the way, and touching almost all areas – dialup, hosting, leased lines, ISDN, mail, nntp, Unix sysadmin, etc.

Also, the customers you were talking to were either fellow techies running the IT infrastructure in a business customer, or fellow geeks that were home users. They tended to have the same inquisitiveness that attracted you to the industry, and were on some level a peer.

Those with ambition, skill or natural flair soon found themselves climbing the greasy pole, moving up into more senior roles, handling escalations, or transferring into the systems team that maintained the network and servers. My own natural skill was in networking, and that’s where I ended up. But that didn’t mean I forgot how to work on a Unix command line. Those skills came in useful when building the instrumentation which helped me run the network. I could set up stats collection and monitoring without having to ask someone else to do it for me, which meant I wasn’t beholden to their priorities.

Many of my industry peers date from this period of rapid growth of the Internet.

Where did it start going wrong?

There’s a few sources, like a fire which needs a number of conditions to exist before it will burn, I think a number of things have come together to create the situation that exists today.

My first theory is the growth in outsourcing and offshoring of entry-level roles during the boom years largely cut off this “apprenticeship” route into the industry. There just wasn’t sufficient numbers of jobs for support techs in the countries which now have the demand for the people that most of these support techs might have become.

Coupled with that is the transition of the support level jobs from inquisitive fault-finding and diagnosis to a flowchart-led “reboot/reinstall”, “is it plugged in?” de-skilled operation that seemed to primarily exist for the frustrated to yell at when things didn’t work.

People with half a clue, that had the ability to grow into a good all-round engineer, might not have wanted these jobs, even if they still existed locally and were interested in joining the industry, because they had turned into being verbal punchbags for the rude and technically challenged. (This had already started to some extent in the mid-90s.)

Obviously, the people in these roles by the 2000s weren’t on a fast track to network engineering careers, they were call-centre staff.

My second theory is that vendor specific certification caused a silo mentality to develop. As the all-round apprenticeship of helpdesk work evaporated, did people look to certification to help them get jobs and progress their careers? I suspect this is the case, as there was a growth in the number of various certifications being offered by networking equipment vendors.

This isn’t a criticism of vendor certification per se, it has it’s place when it’s put in the context of a network engineer’s general knowledge. But, when the vendor certification is the majority of that engineer’s knowledge, what this leaves is a person who is good on paper, but can’t cope with being taken off the map, and tends to have difficulty with heterogeneous networking environments.

The other problem sometimes encountered is that people have done enough training to understand the theory, but they haven’t been exposed to enough real-world examples to get their head around the practice. Some have been taught the network equivalent how to fly the equivalent of a Boeing 747 or Airbus A380 on it’s extensive automation without understanding the basics (and fun) of flying stick-and-rudder in a little Cessna.

They haven’t got the experience that being in a “learning on the job” environment brings, and can’t always rationalise why things didn’t work out the way they expected.

The third theory is that there was a divergence of the network from the systems attached to it. During the 2000s, it started to become too much work for the same guys to know everything, and so where there used to be a group of all-rounders, there ended up being “server guys” and “network guys”. The network guys often didn’t know how to write scripts or understand basic system administration.

Finally, it seems we made networking about as glamorous as plumbing. Young folk wanted to go where the cool stuff is, and so fell into Web 2.0 companies and app development, rather than following a career in unblocking virtual drainpipes.

How do we fix it?

There’s no mistaking that this needs to be fixed. The network needs good all-round engineers to be able to deliver what’s going to be asked of it in the coming years.

People wonder why technologies such as IPv6, RPKI and DNSSEC are slow to deploy. I strongly believe that this skills gap is just one reason.

We’ve all heard the term “DevOps”, and whether or not we like it – it can provoke holy-wars, this is an embodiment of the well-rounded skill set that a lot of network operators are now looking for.

Convergence of the network and server environment is growing too. I know Software Defined Networking is often used as a buzzword, but there’s a growing need for people that can understand the interactions, and be able to apply their knowledge to the software-based tools which will be at the heart of such network deployments.

There’s no silver bullet though.

Back in the 2000s, my former employer, LINX, became so concerned about the lack of good network engineering talent, and woeful vendor specific training, that it launched the LINX Accredited Internet Technician programme, working with a training partner to build and deliver a series of platform-agnostic courses which built good all-round Network Engineering skills and how to apply these in the field. These courses are still delivered today through the training partner (SNT), while the syllabus is reviewed and updated to ensure it’s continuing relevance.

IPv6 pioneers HE.net offer a number of online courses in programming languages which are useful to the Network Engineer, in addition to their IPv6 certification programme.

There is also an effort called OpsSchool, which is building a comprehensive syllabus of things Operations Engineers need to know – trying to replicated the solid grounding in technology and techniques that would previously be picked up on the job while working in a helpdesk role, but for the current environment.

We’ve also got attempts to build the inquisitiveness in younger people with projects such as the Raspberry Pi, while venues such as hackspaces and “hacker camps” such as OHM, CCC and EMF exist as venues to exchange knowledge with like-minded folk and maybe learn something new.

We will need to cut our existing network and systems people a bit of slack, and let them embark on their own learning curves to fill the gaps in their knowledge, recognise that their job has changed around them, and make sure they are properly supported.

The fact is that we’re likely to be in this position for a few years yet…

Anti-spoofing filters, BCP38, IETF SAVVI and your network

I was invited to present at the recent IX Leeds open meeting, as “someone neutral” on the topic of BCP38 – largely in relation to the effects from not deploying it, not just on the wider Internet, but on your IP networking business (if you have one), and on the networks you interconnect with.

I basically broke the topic down:

Introduction: I started by introducing the problem in respect of the attack (“that nearly broke the Internet”) on the CloudFlare hosted Spamhaus website in March 2013.

What and how: Quick overview of address spoofing and how a backscatter amplification attack works.

What you should do: BCP38, uRPF, etc., and what you need to do, and what to ask your suppliers.

Why you should care: Yes, it benefits others, but you have costs in terms of bandwidth and abuse/security response too.

The bleeding edge: IETF SAVI working group.

It wasn’t meant to be a technical how-to, but a non-partisan awareness raiser, as the IX Leeds meeting audiences aren’t full of “usual suspects” but people who are less likely to have been exposed to this.

It’s important to get people doing source address filtering and validation, both themselves, and asking their suppliers for it where it’s appropriate.

Here’s the slide deck (.pdf) if you’re interested.

Why a little thing called BCP38 should be followed

A couple of weeks ago, there was a DDoS attack billed as “the biggest attack to date” which nearly broke the Internet (even if that hasn’t been proved).

If you’ve been holidaying in splendid isolation, an anti-spam group and a Dutch hosting outfit had a fallout, resulting in some cyber-floods, catching hosting provider CloudFlare in the middle.

The mode of the attack was such that it used two vulnerabilities in systems attached to the internet:

Open DNS Resolvers – “directory” servers which were poorly managed, and would answer any query directed to it, regardless of it’s origin.
- Ordinarily, a properly configured DNS resolver will only answer queries for it’s defined subscriber base.
The ability of a system to send traffic to the internet with an IP address other than the one configured.
- Normally, an application will use which ever address is configured on the interface, but it is possible to send with another address – commonly used for testing, research or debugging.

The Open Resolver issue has already been well documented with respect to this particular attack.

However, there’s not been that much noise about spoofed source addresses, and how ISPs could apply a thing called BCP 38 to combat this.

For the attack to work properly, what was needed was an army of “zombie” computers, compromised, and under the control of miscreants, which were able to send traffic onto the Internet with a source address other than it’s own, and the Open Resolvers.

Packets get sent from the compromised “zombie army” to the open resolvers, but not with the real source IP addresses, instead using the source address of the victim(s).

The responses therefore don’t return to the zombies, but all to the victim addresses.

It’s like sending letters with someone else’s address as a reply address. You don’t care that you don’t get the reply, you want the reply to go to the victim.

Filtering according to BCP 38 would stop the “spoofing” – the ability to use a source IP address other than one belonging to the network the computer is actually attached to. BCP 38 indicates the application of IP address filters or a check that an appropriate “reverse path” exists, which only admits traffic from expected source IP addresses.

BCP stands for “Best Current Practice” – so if it’s “Best” and “Current” why are enough ISPs not doing it to allow for an attack as big as this one?

The belief seems to be that applying BCP 38 is “hard” (or potentially too expensive based on actual benefit) for ISPs to do. It certainly might be hard to apply BCP 38 filters in some places, especially as you get closer to the “centre” of the Internet – the lists would be very big, and possibly a challenge to maintain, even with the necessary automation.

However, if that’s where people are looking to apply BCP 38 – at the point where ISPs interconnect, or where ISPs attach multi-homed customers – then they are almost certainly looking in the wrong place. If you filter there, if you’ve any attack traffic from customers in your network, you’ve already carried it across your network. If you’ve got Open Resolvers in your network, you’ve already delivered the attack traffic to the intermediate point in the attack.

The place where BCP 38 type filtering is best implemented is close to the downstream customer edge – in the “stub” networks – such as access networks, hosting networks, etc. This is because the network operator should know exactly which source IP addresses it should be expecting at that level in the network – it doesn’t need to be as granular as per-DSL customer or per-hosting customer, but at least don’t allow traffic to pass from “off net” source addresses.

I actually implement BCP 38 myself on my home DSL router. It’s configured so it will only forward packets to the Internet from the addresses which are downstream of the router. I suspect my own ISP does the “right thing”, and I know that I’ve got servers elsewhere in the world where the hosting company does apply BCP 38, but it can’t be universal. We know that from the “success” of the recent attack.

Right now, the situation is that many networks don’t seem to implement BCP 38. But if enough networks started to implement BCP 38 filtering, the ones who didn’t would be in the minority, and this would allow peer pressure to be brought to bear on them to “do the right thing”.

Sure, it may be a case of the good guys closing one door, only for the bad guys to open another, but any step which increases the difficulty for the bad guys can’t be a bad thing, right?

We should have a discussion on this at UKNOF 25 next week, and I dare say at many other upcoming Internet Operations and Security forums.

Are venue wifi networks turning the corner?

I’m currently at the APRICOT 2013 conference in Singapore. The conference has over 700 registered attendees, and being Internet geeks (and mostly South-East Asian ones, at that), there are lots of wifi enabled devices here. To cope with the demands, the conference does not use the hotel’s own internet access.

Anyone who’s been involved with Geek industry events knows by painful experience that most venue-provided internet access solutions are totally inadequate. They can’t cope with the density of wifi clients, nor can their gateways/proxy servers/NATs cope with the amount of network state created by the network access demands created by us techies. The network disintegrates into a smouldering heap.

Therefore, the conference installs it’s own network. It brings it’s own internet access bandwidth into the hotel. Usually at least 100Mb/sec, and generally speaking, a lot more, sometimes multiple 1Gbps connections. The conference blankets the ballrooms and various meeting rooms in a super high density of access points. All this takes a lot of time and money.

According to the NOC established for the Conference, most concurrent connections to the network are over 1100, s0 about 1.6 devices per attendee. Sounds about right: everyone seems to have a combination of laptop and phone, or tablet and phone, or laptop and tablet.

One thing which impressed me was how the hotel hosting the conference has worked in harmony with the conference. Previous experience has been that some hotels and venues won’t allow installation of third party networks, and insist the event uses their own in house networks. Or even when the event brings it’s own infrastructure, the deployment isn’t the smoothest.

Sure, we’re in a nice (and not cheap!) hotel, the Shangri-La. It’s very obviously got a recently upgraded in-house wifi system, with a/b/g/n capability, using Ruckus Wireless gear. The wifi in the rooms just works. No constant re-authentication needed from day-to-day. I can wander around the hotel on a VOIP call on my iPhone, and call quality is rock solid. Handoff between the wifi base stations wasn’t noticeable. Even made VOIP calls outside by the pool. Sure, it’s a top-notch five-star hotel, but so many supposedly equivalent hotels don’t offer such a stable and speedy wifi, which makes the Shangri-La stand out in my experience.

There’s even been some anecdotal evidence that performance was better over the hotel network to certain sites, which is almost unheard of!

(This may be something to do with the APRICOT wifi being limited to allow only 24Mb connections on their 802.11-a infrastructure. Not sure why they did that?)

As the Shangri-La places aesthetics very high on the list of priorities, they weren’t at all in favour of the conference’s NOC team running cables all over the place, so their techs were happy to provide them with VLANs on the hotel’s switched infrastructure, as well as access to the structured cabling plant.

This also allowed the APRICOT NOC team to extend the conference LAN onto the hotel’s own wifi system – the conference network ID was visible in the lobby, bar and other communal areas in the hotel without having to install extra (and unsightly) access points into the public areas.

This is one of the few times I’ve seen this done and seen it actually work.

So, in the back of my mind, I’m wondering if we’re actually turning a corner, to reach a point where in-house wifi can be depended on by event managers (and hotel guests!) to such an extent they don’t need to DIY anymore?

Beware the NTP “false-ticker” – or do the time warp again…

For the uninitiated, it’s easy to keep the clocks of computers on the Internet in synch using a protocol called NTP (Network Time Protocol).

Why might you want to do this? It’s very helpful in a large network to know that all your gear is synchronised to the same time, so that things such as transactional and logging information has the correct timestamps. It’s a must have for when you’re debugging and trying to get to the bottom of a problem.

There was an incident earlier this week where the two open NTP servers run by the US Naval Observatory (the “authority” for time within the US) both managed to give out incorrect time – there are reports of computers which synchronised against these (and more importantly, only these, or one or two other systems) had their clocks reset to 2000. The error then corrected, and clocks got put back.

Because the affected systems were chiming either only against the affected master clocks, or a limited number of others, the two incorrect times, but from a high stratum source, were taken as being correct and the affected systems had their local clocks reset.

There’s been discussion about the incident on the NANOG list…

Continue reading “Beware the NTP “false-ticker” – or do the time warp again…”

UK 4G LTE Launch and the scramble for spectrum

So, the next path on the road to fast mobile data, 4G LTE finally launches in the UK, after much barracking from competitors, on the “EE” network (the combined Orange and T-Mobile brand).

It’s only available in a handful of markets at the moment, and the BBC’s tech correspondent, Rory Cellan-Jones, did many articles for TV and Radio yesterday, while conducting countless speedtests, which he has extensively blogged about.

Some of the comments have been that it’s no better in terms of speed than a good 3G service in some circumstances, while others complain about the monthly cost of the contracts.

Get locked-in to 4G(EE)?

The initial cost for the early adopters was always going to attract a premium, and those who want it will be prepared to pay for it. It’s also worth noting that there are no “all you can eat” data plans offered on EE’s 4G service. Everything comes with an allowance, and anything above that has to be bought as “extra”.

The most concerning thing as far as the commercial package goes are the minimum contract terms.

12 months appears to be the absolute minimum (SIM only), while 18 months seems to be the offering if you need a device (be it a phone, dongle or MiFi), and 24 month contracts are also being offered.

Pay As You Go is not being offered on EE’s 4G service (as yet), probably because they’ve no incentive to, because there’s no competition.

Are EE trying to make the most of the headstart they have over competitors 3, O2 and Voda and capture those early adopters?

Penetrating matters?

Rory Cellan-Jones referred in his blog about problems with reduced performance when in buildings.

A number of factors affect the propagation of radio waves and how well they penetrate buildings and other obstacles, such as the nature of the building’s construction (for instance a building which exhibits the properties of a Faraday Cage would block radio signals, or attenuate them to the point of being useless), but also the frequency of the radio signal.

Longer wavelengths (lower frequencies) can travel further and are less impacted by having to pass through walls. I’m sure there’s an xkcd on this somewhere, but the best I can find is this….

Electromagnetic Spectrum according to xkcd

The reason EE were able to get a steal on the other mobile phone companies was because OFCOM (the UK regulator, who handle radio spectrum licensing for the Nation) allowed EE to “refarm” (repurpose) some of their existing allocated frequency, previously used for 2G (GSM), and convert it to support 4G. The 2G spectrum available to EE was in the 1800 Mhz range, as that was the 2G spectrum allocated to EE’s constituent companies, Orange and T-Mobile.

Now, 1800 does penetrate buildings, but not as well as the 900 Mhz which are the 2G spectrum allocated to Voda and O2.

Voda are apparently applying to OFCOM for authority to refarm their 900 Mhz spectrum for 4G LTE. Now, this would give a 4G service which had good propagation properties (i.e. travel further from the mast) and better building penetration. Glossing over (non-)availability of devices which talk LTE in the 900 Mhz spectrum, could actually be good for extra-urban/semi-rural areas which are broadband not-spots?

Well, yes, but it might cause problems in dense urban areas where the device density is so high it’s necessary to have a large number of small cells, in order to limit the number of devices associated with a single cell to a manageable amount – each cell can only deal with a finite number of client devices. This is already the case in places suce as city centres, music venues and the like.

Ideally, a single network would have a situation whereby you have a high density of smaller cells (micro- and femto-cells) running on the higher frequency range to intentially limit (and therefore number of connected devices) it’s reach in very dense urban areas such as city centres, and a lower density of large cells (known as macro-cells) running on lower frequencies to cover less built-up areas and possibly better manage building penetration.

But, that doesn’t fit with our current model of how spectrum is licensed in the UK (and much of the rest of the world).

Could the system of spectrum allocation and use be changed?

One option could be for the mobile operators to all get together and agree to co-operate, effectively exchanging bits of spectrum so that they have the most appropriate bit of spectrum allocated to each base station. But this would involve fierce competitors to get to together and agree, so there would have to be something in it for them, the best incentive being cost savings. This is happening to a limited extent now.

The more drastic approach could be for OFCOM to decouple the operation of base stations (aka cell towers) from the provision of service – effectively moving the radio part of the service to a wholesale model. Right now, providing the consumer service is tightly coupled to building and operating the radio infrastructure, the notable exception being the MVNOs such as Virgin (among others), who don’t own any radio infrastructure, but sell a service provided over one of the main four.

It wouldn’t affect who the man in the street buys his phone service from – it could even increase consumer choice by allowing further new entrants into the market, beyond the MVNO model – but it could result in better use of spectrum which is, after all, a finite resource.

Either model could ensure that regardless of who is providing the consumer with service, the most appropriate bit of radio spectrum is used to service them, depending on where they are and which base stations their device can associate with.

Is the Internet facing a “perfect storm”?

The Internet has become a massive part of our everyday lives. If you walk down a British high street, you can’t fail to notice people staring into their phones rather than looking where they are going! I did see a comment on TV this week that you have a 1-in-10 chance of tripping and falling over when walking along looking at your phone and messaging…

There are massive pushes for faster access in countries which already have widespread Internet adoption, both over fixed infrastructure (such as FTTC and FTTH) and wireless (LTE, aka 4G), which at times isn’t without controversy. In the UK, the incumbent, BT, is commonly (and sometimes unfairly) criticised for trying to sweat more and more out of it’s copper last mile infrastructure (the wires that go into people’s homes), while not doing enough to “future-proof” and enable remote areas by investing in fibre. There’s also been problems over the UK regulator’s decision to allow one mobile phone network get a head-start on it’s competitors in offering LTE/4G service ahead of them, using existing allocated radio frequencies (a process known as “spectrum refarming”).

Why do people care? Because the Internet helps foster growth and can reduce the costs of doing business, and it’s why the developing countries are working desperately hard to drive internet adoption, along the way having to manage the threats of “interfering” actors who either don’t fully understand or fear change.

However, a bigger threat could be facing the Internet, and it’s coming from multiple angles, technical and non-technical. A perfect storm?

IPv4 Resource Exhaustion
- The existing addressing (numbering) scheme used by the Internet is running out
- A secondary market for “spare” IPv4 resources is developing, IPv4 addresses will have a monetary value, driven by lack of IPv6 deployment
Slow IPv6 Adoption
- Combined with IPv4 exhaustion and long-term adoption of supposedly “transitional techniques” can be expected to drive up operating costs and raise barriers to entry
Increasing Regulatory attention
- On a national level, such as the French Regulator, ARCEP, wishing to collect details on all interconnects in France or involving French entities
- On a regional level, such as ETNO pushing for regulation of interconnect through use of QoS – nicely de-constructed by my learned colleague Geoff Huston – possibly an attempt to retroactively fix a broken business model?
- On a Global level through the ITU, who, having disregarded the Internet as “something for academics” and not relevant to public communications back in 1988, now want to update the International Telecommunication Regulations to extend these to who “controls the Internet” and how.

All of these things threaten some of the basic foundations of the Internet we have today:

The Internet is “open” – anyone can connect, it’s agnostic to the data which is run over it, and this allows people to innovate
The Internet is “transparent” – managed using a bottom-up process of policy making and protocol development which is open to all
The Internet is “cheap” – relatively speaking, Internet service is inexpensive

These challenges facing the Internet combine to break all of the above.

Close the system off, drive costs up, and make development and co-ordination an invite-only closed shop in which it’s expensive to participate.

Time and effort, and investing a little money (in deploying IPv6, in some regulatory efforts, and in checking your business model is still valid), are the main things which will head off this approaching storm.

Adopting IPv6 should just be a (stay in) business decision. It’s something operational and technical that a business is in control of.

But, the regulatory aspect is tougher, unless you are big enough to be able to afford your own lobbyists. Fortunately, if you live in the UK, it’s not reached “write to your MP time”, not just yet. The UK’s position remains one of “light touch” regulation, largely letting the industry self-regulate itself through market forces, and this is being advocated to the ITU. There’s also some very bright, talented and respected people trying to get the message through that it’s economically advantageous not to make the Internet a closed top-down operated system.

Nevertheless, the challenges remain very much real. We live in interesting times.

Recent IPv4 Depletion Events

Those of you who follow these things can’t have missed that the RIPE NCC had got down to it’s last /8 of unallocated IPv4 space last week.

They even made a cake to celebrate…

185-slash-8-cake — Photo (and cake?) by Rumy Spratley-Kanis

This means the RIPE NCC are down to their last 16 million IPv4 IP addresses, and they can’t get another big block allocated to them, because there aren’t any more to give out.

Continue reading “Recent IPv4 Depletion Events”

Was the LINX hit by an attack yesterday?

The short answer is “No“.

There has been speculation in the press, such as this Computer Weekly article, but I would say that it’s poorly informed, and even suggests that LINX’s pioneering deployment of Juniper’s PTX MPLS core switch might be a factor (which I think is a red herring).

It looks to have been some sort of storm of flooded traffic (such as unknown unicast, or broadcast) or problem in a network that’s attached to LINX, which managed to either congest the bandwidth of various ISP’s access lines into LINX, or congest the CPU on some of the attached routers, to the extent that they became unable to forward customer traffic, or unable to maintain accurate routing information (i.e. lost control plane integrity).

But, why did it appear to start on one of the two LINX peering platforms (the Extreme-based network) and then cascade to the physically seperate Juniper-based LAN?

I think one of the main reasons is because lots of ISP routers are connected to both LANs, as are the routers operated by the likely “problem” network which originated the flood of traffic in the first place. I’ve written before on this blog about why having a small number of routers connected to a larger number of internet exchanges can be a bad idea.

I’m pressed for time (about to get on a plane), so I’ll quickly sum up with some informed speculation:

I don’t think…

The LINX was DDoS-ed (or specifically attacked)
The deployment of the Juniper PTX in the preceeding 24 hours had anything to do with it -LINX also seem to think this, as they switched a further PTX into service overnight last night
That there was any intentional action which caused this, more likely some sort of failure or bug

I do think…

A LINX-attached network had a technical problem which wasn’t isolated and caused a traffic storm
It initially affected the Extreme-based platform
It affected the CPU of LINX-connected routers belonging to LINX members
Some LINX members deliberately disconnected themselves from LINX at the time to protect their own platform
The reported loss of peer connectivity on the Juniper platform was “collateral damage” from the initial incident, for reasons I’ve outlined above – busy routers
LINX did the right thing continuing their PTX deployment

I’m sure there will be more details forthcoming from LINX in due course. Their staff are trained not to make speculation, nor to talk to the press, during an incident. Even those who handle press enquiries are very careful not to speculate or sensationalise, which I’m sure dissapoints those looking for a story.

The moral of this story is redundancy and diversity are important elements of good network engineering and you shouldn’t be putting all your eggs in one basket.

Disclaimer: I used to work for LINX, and I like to think I’ve got more than half a clue when it comes to how peering and interconnect works.