Net Eng Skills Gap Redux: Entry Routes into Network Engineering

While I was recently at the LINX meeting in London, I ended up having a side-discussion about entry routes into the Internet Engineering industry, and the relatively small amount of new blood coming into the industry.

With my UKNOF Director’s hat on for a moment, we’re concerned about the lack of new faces showing up to our meetings too.

Let me say one thing here and now:

If you work in any sort of digital business, remember that you are nothing without the network, nothing without the infrastructure. This eventually affects you too.

Yes, I know you can just “shove it in the Cloud”, but this has to be built and operated. It has real costs associated with it, and needs real people to keep it healthily developing and running.

I’ve written about this before here, almost 3 years ago. But it seems we’re still not much better off. I think that’s because we’ve not done enough about it.

One twitter correspondent said, “I didn’t know the entry route, so ended up in sysadmin, then internet research, and not netops.”

This pretty much confirmed some of my previous post, that we’d basically destroyed the previous entry route through commoditisation of first-line support, and that was already happening some time around 1998/1999.

It’s too easy to sit here and bleat, blaming “sexy devops” for robbing Net Eng and Network Infrastructure of keen individuals.

But why are things such as devops and more digital and software oriented industries attracting the new entrants?

One comment is that because a large number of network infra companies are well established, there isn’t the same pioneering spirit, nor the same chance to experiment and build, with infrastructure compared to the environment I joined 20 years ago.

My colleague, Paul Thornton, characterised this pioneering spirit in a recent UKNOF presentation titled “None of us knew what we were doing, we made it up as we went along” – note that it is full of jargon and colloquialism, aimed at a specific techie audience, but if you can excuse that, it really captures in a nutshell the mid-90’s Internet engineering environment the likes of he and I grew up in.

Typing “debug all” on a core router can liven up your afternoon no end… But I didn’t really know what I wanted to do back then, I was green and wet behind the ears.

Many infrastructure providers are dominated by obsessions with high-availability, and as a result resistance to change, because they view a stable and available infrastructure as the utopia. An infrastructure which is being changed and experimented upon, by implication, is not as stable.

do-not-touch-any-of-these-wires
DO NOT TOUCH ANY OF THESE WIRES

Has a desire to learn (from mistakes if necessary!) become mutually exclusive from running infrastructure?

In many organisations, the “labs” – the development and staging environments – are pitiful. They often aren’t running the same equipment as that which exists in production, but are cobbled together from various hand-me-down pieces of gear. This means it’s not always possible to compare apples with
apples, or exactly mimic conditions which will exist in production.

Compare this to the software world, where everything is on fairly generic compute, and the software is largely portable from the development and staging environments, especially so in a world of virtualisation and containerisation. There’s more chances to experiment, test, fail, fix and learn in this environment, than there is in an environment where people are discouraged from touching anything for fear of causing an outage.

This means we Network Engineering types need to spend a lot of time on preparation and nerves of steel before making any changes.

Why are the lab environments often found wanting? Classically it’s because of the high capital cost of network gear, which doesn’t directly earn any revenue. It’s harder to get signoff, unless your company has a clear policy about lab infrastructure.

I’m not saying a blanket “change control is bad”, but a hostile don’t touch anything” environment may certainly drive away some of the inquisitive folks who are keen to learn through experimentation.

Coupled with the desire of organisations to achieve high availability with the lowest realistically achievable capital spend, it means that when these organisations hire for Network Engineering posts, they often want seasoned and experienced individuals, sometimes with vendor specific certifications. You know how I hold those in high esteem, or not as the case may be, right?

So what do we need to do?

I can’t take all the credit for this, but it’s partly my own opinions, mixed in with what I’ve aggregated from various discussions.

We need to create clear Network and Infra Engineering apprenticeship and potential career paths.

The “Way In” needs to be clearly signposted, and “what’s in it for you” made obvious.

There needs to be an established and recognised industry standard for the teaching in solid basic network engineering principles, that is distinct from vendor-led accreditations.

In some areas of the sector, the “LAIT” (LINX Accredited Internet Technician) programme is recognised and respected for it’s thoroughness in teaching basic Internet engineering skill, but it’s quite a narrow niche. Is there room to expand the recognition this scheme, and possibly others have?

A learning environment needs to exist where we enable people to make mistakes and learn from them, where failure can be tolerated, and priority placed on teaching and information sharing.

This means changing how we approach running the network. Proper labs. Proper tooling. Proper redundant infrastructure. No hostile “change control” environment.

Possibly running more outreach events that are easier for the curious and inquisitive to get into? That’s a whole post in itself. Stay tuned.

The Network Engineering “Skills Gap”

Talking to colleagues in the industry, there’s anecdotal evidence that they are having trouble finding suitable candidates for mid-level Network Engineering roles. They have vacancies which have gone unfilled for some time for want of the right people, or ask where they can go to find good generalists that have a grasp of the whole ecosystem rather than some small corner of it.

Basically, a “skills gap” seems to have opened up in the industry, whereby there are some good all-rounders at a fairly senior level, but trying to find an individual with a few years experience, and a good grounding in IP Networking, system administration (and maybe a bit of coding/scripting), network services (such as DNS) and basic security is very difficult.

Instead, candidates have become siloed, from the basic “network guy/systems guy” split to vendor, technology and service specific skills.

This is even more concerning given the overall trend in the industry toward increasing automation of networking infrastructure deployment and management and a tendency to integrate and coalesce with the service infrastructure such as the data centre and the things in it (such as servers, storage, etc.) – “the data centre as the computer”.

This doesn’t work when there are black and white divisions between the “network guy” and the “server guy” and their specific knowledge.

So, how did we get where we are? Firstly, off down a side-track into some self-indulgence…

I consider myself to be one of the more “all round” guys, although I’ve definitely got more of a lean toward physical networking infrastructure as a result of the roles I’ve had and the direction these took me in.

I come from a generation of engineers who joined the industry during the mid-90’s, when the Internet started to move from the preserve of researchers, academics, and the hardcore geeks, to becoming a more frequently used tool of communication.

Starting out as an Internet user at University (remember NCSA Mosaic and Netscape 0.9?) I got myself a modem and a dialup connection, initially for use when I was back home during the holidays and away from the University’s computing facilities, all thanks to Demon Internet and their “tenner a month” philosophy that meant even poor students like me could afford it. Back then, to get online via dialup, you had to have some grasp of what was going on under the skin when you went online, so you could work out what had gone wrong when things didn’t work. Demonites will have “fond” memories of KA9Q, or the motley collection of things which allowed you to connect using Windows. Back then, TCP/IP stacks were not standard!

So, out I came from University, and fell into a job in the ISP industry.

Back then, you tended to start at the bottom, working in “support”, which in some respects was your apprenticeship in “the Internet’, learning along the way, and touching almost all areas – dialup, hosting, leased lines, ISDN, mail, nntp, Unix sysadmin, etc.

Also, the customers you were talking to were either fellow techies running the IT infrastructure in a business customer, or fellow geeks that were home users. They tended to have the same inquisitiveness that attracted you to the industry, and were on some level a peer.

Those with ambition, skill or natural flair soon found themselves climbing the greasy pole, moving up into more senior roles, handling escalations, or transferring into the systems team that maintained the network and servers. My own natural skill was in networking, and that’s where I ended up. But that didn’t mean I forgot how to work on a Unix command line. Those skills came in useful when building the instrumentation which helped me run the network. I could set up stats collection and monitoring without having to ask someone else to do it for me, which meant I wasn’t beholden to their priorities.

Many of my industry peers date from this period of rapid growth of the Internet.

Where did it start going wrong?

There’s a few sources, like a fire which needs a number of conditions to exist before it will burn, I think a number of things have come together to create the situation that exists today.

My first theory is the growth in outsourcing and offshoring of entry-level roles during the boom years largely cut off this “apprenticeship” route into the industry. There just wasn’t sufficient numbers of jobs for support techs in the countries which now have the demand for the people that most of these support techs might have become.

Coupled with that is the transition of the support level jobs from inquisitive fault-finding and diagnosis to a flowchart-led “reboot/reinstall”, “is it plugged in?” de-skilled operation that seemed to primarily exist for the frustrated to yell at when things didn’t work.

People with half a clue, that had the ability to grow into a good all-round engineer, might not have wanted these jobs, even if they still existed locally and were interested in joining the industry, because they had turned into being verbal punchbags for the rude and technically challenged. (This had already started to some extent in the mid-90s.)

Obviously, the people in these roles by the 2000s weren’t on a fast track to network engineering careers, they were call-centre staff.

My second theory is that vendor specific certification caused a silo mentality to develop. As the all-round apprenticeship of helpdesk work evaporated, did people look to certification to help them get jobs and progress their careers? I suspect this is the case, as there was a growth in the number of various certifications being offered by networking equipment vendors.

This isn’t a criticism of vendor certification per se, it has it’s place when it’s put in the context of a network engineer’s general knowledge. But, when the vendor certification is the majority of that engineer’s knowledge, what this leaves is a person who is good on paper, but can’t cope with being taken off the map, and tends to have difficulty with heterogeneous networking environments.

The other problem sometimes encountered is that people have done enough training to understand the theory, but they haven’t been exposed to enough real-world examples to get their head around the practice. Some have been taught the network equivalent how to fly the equivalent of a Boeing 747 or Airbus A380 on it’s extensive automation without understanding the basics (and fun) of flying stick-and-rudder in a little Cessna.

They haven’t got the experience that being in a “learning on the job” environment brings, and can’t always rationalise why things didn’t work out the way they expected.

The third theory is that there was a divergence of the network from the systems attached to it. During the 2000s, it started to become too much work for the same guys to know everything, and so where there used to be a group of all-rounders, there ended up being “server guys” and “network guys”. The network guys often didn’t know how to write scripts or understand basic system administration.

Finally, it seems we made networking about as glamorous as plumbing. Young folk wanted to go where the cool stuff is, and so fell into Web 2.0 companies and app development, rather than following a career in unblocking virtual drainpipes.

How do we fix it?

There’s no mistaking that this needs to be fixed. The network needs good all-round engineers to be able to deliver what’s going to be asked of it in the coming years.

People wonder why technologies such as IPv6, RPKI and DNSSEC are slow to deploy. I strongly believe that this skills gap is just one reason.

We’ve all heard the term “DevOps”, and whether or not we like it – it can provoke holy-wars, this is an embodiment of the well-rounded skill set that a lot of network operators are now looking for.

Convergence of the network and server environment is growing too. I know Software Defined Networking is often used as a buzzword, but there’s a growing need for people that can understand the interactions, and be able to apply their knowledge to the software-based tools which will be at the heart of such network deployments.

There’s no silver bullet though.

Back in the 2000s, my former employer, LINX, became so concerned about the lack of good network engineering talent, and woeful vendor specific training, that it launched the LINX Accredited Internet Technician programme, working with a training partner to build and deliver a series of platform-agnostic courses which built good all-round Network Engineering skills and how to apply these in the field. These courses are still delivered today through the training partner (SNT), while the syllabus is reviewed and updated to ensure it’s continuing relevance.

IPv6 pioneers HE.net offer a number of online courses in programming languages which are useful to the Network Engineer, in addition to their IPv6 certification programme.

There is also an effort called OpsSchool, which is building a comprehensive syllabus of things Operations Engineers need to know – trying to replicated the solid grounding in technology and techniques that would previously be picked up on the job while working in a helpdesk role, but for the current environment.

We’ve also got attempts to build the inquisitiveness in younger people with projects such as the Raspberry Pi, while venues such as hackspaces and “hacker camps” such as OHM, CCC and EMF exist as venues to exchange knowledge with like-minded folk and maybe learn something new.

We will need to cut our existing network and systems people a bit of slack, and let them embark on their own learning curves to fill the gaps in their knowledge, recognise that their job has changed around them, and make sure they are properly supported.

The fact is that we’re likely to be in this position for a few years yet…