The Network Engineering “Skills Gap”

Talking to colleagues in the industry, there’s anecdotal evidence that they are having trouble finding suitable candidates for mid-level Network Engineering roles. They have vacancies which have gone unfilled for some time for want of the right people, or ask where they can go to find good generalists that have a grasp of the whole ecosystem rather than some small corner of it.

Basically, a “skills gap” seems to have opened up in the industry, whereby there are some good all-rounders at a fairly senior level, but trying to find an individual with a few years experience, and a good grounding in IP Networking, system administration (and maybe a bit of coding/scripting), network services (such as DNS) and basic security is very difficult.

Instead, candidates have become siloed, from the basic “network guy/systems guy” split to vendor, technology and service specific skills.

This is even more concerning given the overall trend in the industry toward increasing automation of networking infrastructure deployment and management and a tendency to integrate and coalesce with the service infrastructure such as the data centre and the things in it (such as servers, storage, etc.) – “the data centre as the computer”.

This doesn’t work when there are black and white divisions between the “network guy” and the “server guy” and their specific knowledge.

So, how did we get where we are? Firstly, off down a side-track into some self-indulgence…

I consider myself to be one of the more “all round” guys, although I’ve definitely got more of a lean toward physical networking infrastructure as a result of the roles I’ve had and the direction these took me in.

I come from a generation of engineers who joined the industry during the mid-90’s, when the Internet started to move from the preserve of researchers, academics, and the hardcore geeks, to becoming a more frequently used tool of communication.

Starting out as an Internet user at University (remember NCSA Mosaic and Netscape 0.9?) I got myself a modem and a dialup connection, initially for use when I was back home during the holidays and away from the University’s computing facilities, all thanks to Demon Internet and their “tenner a month” philosophy that meant even poor students like me could afford it. Back then, to get online via dialup, you had to have some grasp of what was going on under the skin when you went online, so you could work out what had gone wrong when things didn’t work. Demonites will have “fond” memories of KA9Q, or the motley collection of things which allowed you to connect using Windows. Back then, TCP/IP stacks were not standard!

So, out I came from University, and fell into a job in the ISP industry.

Back then, you tended to start at the bottom, working in “support”, which in some respects was your apprenticeship in “the Internet’, learning along the way, and touching almost all areas – dialup, hosting, leased lines, ISDN, mail, nntp, Unix sysadmin, etc.

Also, the customers you were talking to were either fellow techies running the IT infrastructure in a business customer, or fellow geeks that were home users. They tended to have the same inquisitiveness that attracted you to the industry, and were on some level a peer.

Those with ambition, skill or natural flair soon found themselves climbing the greasy pole, moving up into more senior roles, handling escalations, or transferring into the systems team that maintained the network and servers. My own natural skill was in networking, and that’s where I ended up. But that didn’t mean I forgot how to work on a Unix command line. Those skills came in useful when building the instrumentation which helped me run the network. I could set up stats collection and monitoring without having to ask someone else to do it for me, which meant I wasn’t beholden to their priorities.

Many of my industry peers date from this period of rapid growth of the Internet.

Where did it start going wrong?

There’s a few sources, like a fire which needs a number of conditions to exist before it will burn, I think a number of things have come together to create the situation that exists today.

My first theory is the growth in outsourcing and offshoring of entry-level roles during the boom years largely cut off this “apprenticeship” route into the industry. There just wasn’t sufficient numbers of jobs for support techs in the countries which now have the demand for the people that most of these support techs might have become.

Coupled with that is the transition of the support level jobs from inquisitive fault-finding and diagnosis to a flowchart-led “reboot/reinstall”, “is it plugged in?” de-skilled operation that seemed to primarily exist for the frustrated to yell at when things didn’t work.

People with half a clue, that had the ability to grow into a good all-round engineer, might not have wanted these jobs, even if they still existed locally and were interested in joining the industry, because they had turned into being verbal punchbags for the rude and technically challenged. (This had already started to some extent in the mid-90s.)

Obviously, the people in these roles by the 2000s weren’t on a fast track to network engineering careers, they were call-centre staff.

My second theory is that vendor specific certification caused a silo mentality to develop. As the all-round apprenticeship of helpdesk work evaporated, did people look to certification to help them get jobs and progress their careers? I suspect this is the case, as there was a growth in the number of various certifications being offered by networking equipment vendors.

This isn’t a criticism of vendor certification per se, it has it’s place when it’s put in the context of a network engineer’s general knowledge. But, when the vendor certification is the majority of that engineer’s knowledge, what this leaves is a person who is good on paper, but can’t cope with being taken off the map, and tends to have difficulty with heterogeneous networking environments.

The other problem sometimes encountered is that people have done enough training to understand the theory, but they haven’t been exposed to enough real-world examples to get their head around the practice. Some have been taught the network equivalent how to fly the equivalent of a Boeing 747 or Airbus A380 on it’s extensive automation without understanding the basics (and fun) of flying stick-and-rudder in a little Cessna.

They haven’t got the experience that being in a “learning on the job” environment brings, and can’t always rationalise why things didn’t work out the way they expected.

The third theory is that there was a divergence of the network from the systems attached to it. During the 2000s, it started to become too much work for the same guys to know everything, and so where there used to be a group of all-rounders, there ended up being “server guys” and “network guys”. The network guys often didn’t know how to write scripts or understand basic system administration.

Finally, it seems we made networking about as glamorous as plumbing. Young folk wanted to go where the cool stuff is, and so fell into Web 2.0 companies and app development, rather than following a career in unblocking virtual drainpipes.

How do we fix it?

There’s no mistaking that this needs to be fixed. The network needs good all-round engineers to be able to deliver what’s going to be asked of it in the coming years.

People wonder why technologies such as IPv6, RPKI and DNSSEC are slow to deploy. I strongly believe that this skills gap is just one reason.

We’ve all heard the term “DevOps”, and whether or not we like it – it can provoke holy-wars, this is an embodiment of the well-rounded skill set that a lot of network operators are now looking for.

Convergence of the network and server environment is growing too. I know Software Defined Networking is often used as a buzzword, but there’s a growing need for people that can understand the interactions, and be able to apply their knowledge to the software-based tools which will be at the heart of such network deployments.

There’s no silver bullet though.

Back in the 2000s, my former employer, LINX, became so concerned about the lack of good network engineering talent, and woeful vendor specific training, that it launched the LINX Accredited Internet Technician programme, working with a training partner to build and deliver a series of platform-agnostic courses which built good all-round Network Engineering skills and how to apply these in the field. These courses are still delivered today through the training partner (SNT), while the syllabus is reviewed and updated to ensure it’s continuing relevance.

IPv6 pioneers HE.net offer a number of online courses in programming languages which are useful to the Network Engineer, in addition to their IPv6 certification programme.

There is also an effort called OpsSchool, which is building a comprehensive syllabus of things Operations Engineers need to know – trying to replicated the solid grounding in technology and techniques that would previously be picked up on the job while working in a helpdesk role, but for the current environment.

We’ve also got attempts to build the inquisitiveness in younger people with projects such as the Raspberry Pi, while venues such as hackspaces and “hacker camps” such as OHM, CCC and EMF exist as venues to exchange knowledge with like-minded folk and maybe learn something new.

We will need to cut our existing network and systems people a bit of slack, and let them embark on their own learning curves to fill the gaps in their knowledge, recognise that their job has changed around them, and make sure they are properly supported.

The fact is that we’re likely to be in this position for a few years yet…

Why a little thing called BCP38 should be followed

A couple of weeks ago, there was a DDoS attack billed as “the biggest attack to date” which nearly broke the Internet (even if that hasn’t been proved).

If you’ve been holidaying in splendid isolation, an anti-spam group and a Dutch hosting outfit had a fallout, resulting in some cyber-floods, catching hosting provider CloudFlare in the middle.

The mode of the attack was such that it used two vulnerabilities in systems attached to the internet:

  • Open DNS Resolvers – “directory” servers which were poorly managed, and would answer any query directed to it, regardless of it’s origin.
    • Ordinarily, a properly configured DNS resolver will only answer queries for it’s defined subscriber base.
  • The ability of a system to send traffic to the internet with an IP address other than the one configured.
    • Normally, an application will use which ever address is configured on the interface, but it is possible to send with another address – commonly used for testing, research or debugging.

The Open Resolver issue has already been well documented with respect to this particular attack.

However, there’s not been that much noise about spoofed source addresses, and how ISPs could apply a thing called BCP 38 to combat this.

For the attack to work properly, what was needed was an army of “zombie” computers, compromised, and under the control of miscreants, which were able to send traffic onto the Internet with a source address other than it’s own, and the Open Resolvers.

Packets get sent from the compromised “zombie army” to the open resolvers, but not with the real source IP addresses, instead using the source address of the victim(s).

The responses therefore don’t return to the zombies, but all to the victim addresses.

It’s like sending letters with someone else’s address as a reply address. You don’t care that you don’t get the reply, you want the reply to go to the victim.

Filtering according to BCP 38 would stop the “spoofing” – the ability to use a source IP address other than one belonging to the network the computer is actually attached to. BCP 38 indicates the application of IP address filters or a check that an appropriate “reverse path” exists, which only admits traffic from expected source IP addresses.

BCP stands for “Best Current Practice” – so if it’s “Best” and “Current” why are enough ISPs not doing it to allow for an attack as big as this one?

The belief seems to be that applying BCP 38 is “hard” (or potentially too expensive based on actual benefit) for ISPs to do. It certainly might be hard to apply BCP 38 filters in some places, especially as you get closer to the “centre” of the Internet – the lists would be very big, and possibly a challenge to maintain, even with the necessary automation.

However, if that’s where people are looking to apply BCP 38 – at the point where ISPs interconnect, or where ISPs attach multi-homed customers – then they are almost certainly looking in the wrong place. If you filter there, if you’ve any attack traffic from customers in your network, you’ve already carried it across your network. If you’ve got Open Resolvers in your network, you’ve already delivered the attack traffic to the intermediate point in the attack.

The place where BCP 38 type filtering is best implemented is close to the downstream customer edge – in the “stub” networks – such as access networks, hosting networks, etc. This is because the network operator should know exactly which source IP addresses it should be expecting at that level in the network – it doesn’t need to be as granular as per-DSL customer or per-hosting customer, but at least don’t allow traffic to pass from “off net” source addresses.

I actually implement BCP 38 myself on my home DSL router. It’s configured so it will only forward packets to the Internet from the addresses which are downstream of the router. I suspect my own ISP does the “right thing”, and I know that I’ve got servers elsewhere in the world where the hosting company does apply BCP 38, but it can’t be universal. We know that from the “success” of the recent attack.

Right now, the situation is that many networks don’t seem to implement BCP 38. But if enough networks started to implement BCP 38 filtering, the ones who didn’t would be in the minority, and this would allow peer pressure to be brought to bear on them to “do the right thing”.

Sure, it may be a case of the good guys closing one door, only for the bad guys to open another, but any step which increases the difficulty for the bad guys can’t be a bad thing, right?

We should have a discussion on this at UKNOF 25 next week, and I dare say at many other upcoming Internet Operations and Security forums.

BT and Virgin Media challenge Birmingham’s Broadband deployment

BBC News are reporting that incumbent high speed broadband providers BT and Virgin Media have launched a legal challenge to Birmingham City Council’s proposed independant Superfast Broadband Network.

The city has successfully applied for EU state aid to build network into underserved areas of the city, aligned with the Council’s regeneration plans for those areas. Virgin contest that it is “overbuilding” on their existing network footprint, and as such is unnecessary, effectively using EU subsidy to attack their revenue stream.

Broadband campaigner Chris Conder, one of the people behind the B4RN project, says that this is a case of VM and BT trying to close the stable door after the horse has bolted.

It’s going to be an interesting and important test case.

Is the Internet facing a “perfect storm”?

The Internet has become a massive part of our everyday lives. If you walk down a British high street, you can’t fail to notice people staring into their phones rather than looking where they are going! I did see a comment on TV this week that you have a 1-in-10 chance of tripping and falling over when walking along looking at your phone and messaging…

There are massive pushes for faster access in countries which already have widespread Internet adoption, both over fixed infrastructure (such as FTTC and FTTH) and wireless (LTE, aka 4G), which at times isn’t without controversy. In the UK, the incumbent, BT, is commonly (and sometimes unfairly) criticised for trying to sweat more and more out of it’s copper last mile infrastructure (the wires that go into people’s homes), while not doing enough to “future-proof” and enable remote areas by investing in fibre. There’s also been problems over the UK regulator’s decision to allow one mobile phone network get a head-start on it’s competitors in offering LTE/4G service ahead of them, using existing allocated radio frequencies (a process known as “spectrum refarming”).

Why do people care? Because the Internet helps foster growth and can reduce the costs of doing business, and it’s why the developing countries are working desperately hard to drive internet adoption, along the way having to manage the threats of “interfering” actors who either don’t fully understand or fear change.

However, a bigger threat could be facing the Internet, and it’s coming from multiple angles, technical and non-technical. A perfect storm?

  • IPv4 Resource Exhaustion
    • The existing addressing (numbering) scheme used by the Internet is running out
    • A secondary market for “spare” IPv4 resources is developing, IPv4 addresses will have a monetary value, driven by lack of IPv6 deployment
  • Slow IPv6 Adoption
  • Increasing Regulatory attention
    • On a national level, such as the French Regulator, ARCEP, wishing to collect details on all interconnects in France or involving French entities
    • On a regional level, such as ETNO pushing for regulation of interconnect through use of QoS – nicely de-constructed by my learned colleague Geoff Huston – possibly an attempt to retroactively fix a broken business model?
    • On a Global level through the ITU, who, having disregarded the Internet as “something for academics” and not relevant to public communications back in 1988, now want to update the International Telecommunication Regulations to extend these to who “controls the Internet” and how.

All of these things threaten some of the basic foundations of the Internet we have today:

  • The Internet is “open” – anyone can connect, it’s agnostic to the data which is run over it, and this allows people to innovate
  • The Internet is “transparent” – managed using a bottom-up process of policy making and protocol development which is open to all
  • The Internet is “cheap” – relatively speaking, Internet service is inexpensive

These challenges facing the Internet combine to break all of the above.

Close the system off, drive costs up, and make development and co-ordination an invite-only closed shop in which it’s expensive to participate.

Time and effort, and investing a little money (in deploying IPv6, in some regulatory efforts, and in checking your business model is still valid), are the main things which will head off this approaching storm.

Adopting IPv6 should just be a (stay in) business decision. It’s something operational and technical that a business is in control of.

But, the regulatory aspect is tougher, unless you are big enough to be able to afford your own lobbyists. Fortunately, if you live in the UK, it’s not reached “write to your MP time”, not just yet. The UK’s position remains one of “light touch” regulation, largely letting the industry self-regulate itself through market forces, and this is being advocated to the ITU. There’s also some very bright, talented and respected people trying to get the message through that it’s economically advantageous not to make the Internet a closed top-down operated system.

Nevertheless, the challenges remain very much real. We live in interesting times.

Recent IPv4 Depletion Events

Those of you who follow these things can’t have missed that the RIPE NCC had got down to it’s last /8 of unallocated IPv4 space last week.

They even made a cake to celebrate…

Photo (and cake?) by Rumy Spratley-Kanis

This means the RIPE NCC are down to their last 16 million IPv4 IP addresses, and they can’t get another big block allocated to them, because there aren’t any more to give out.

Continue reading “Recent IPv4 Depletion Events”