A brief intro to the Network Automation journey

With many things, getting started is usually the toughest part. Taking the first step is often the most challenging as it frequently involves entering the unknown. The same is true when introducing automation into networks, or more precisely, into the infrastructure as a whole, as nowadays, due to extensive virtualization and containerization, it is difficult to draw a line between the network and the infrastructure. This dividing line tends to be blurry, and those two worlds are starting to melt together.

In this article, we want to focus on the usual steps that need to be taken to achieve network automation, but we also discuss the common apprehensions towards it. What we don’t question is the fact that infrastructure automation is needed, as we believe this is the way to go.

Why are we afraid of automating networks?

There are a few reasons why the adoption of network automation differs depending on the organization. It is not surprising that the reason for not moving forward is quite common for almost everything that is new - a fear of change. And that fear has different backgrounds as well:

some are concern they will not be needed anymore, that the automation will make their role redundant,
some do not want to learn to use new tools or technology,
some are afraid of losing control of the network.

Let’s take a look at the above aspects from a different perspective and view them in a more positive light.

The concern of being replaced

A common concern, as mentioned above, is the fear of not being needed anymore - that automation will replace network engineers. But if we take a look at the amount of work that is being done by engineers, or more precisely at the amount of work that is NOT being done because of the lack of time, one can see a potential rather than a threat. Think of all the times that actual network engineering work, like network design or creating a new service architecture, had to be put aside because of some immediate need for an upgrade, re-configuration, or some additional checks. Think how much time those tasks take and how cumbersome they are. Now imagine that all those boring activities could be sped up thanks to automation. With engineers’ supervision, the same work could be done faster, and the time saved that way could be spent on more attractive types of work.

The reluctance to learn new tools

However, to be able to automate daily network operations, one needs to get acquainted with a new set of tools for that purpose. Ansible, Terraform, or Python are quite common in that area. There might be some reluctance towards learning those, as at first glance, they can seem complicated. And it is true there is a learning curve for each of those technologies, but in fact, some are easier to comprehend than others. And let’s be honest - learning how to write a YAML file is not more difficult than learning how to configure a new vendor.

The fear of losing control

Another aspect to be considered and an impediment towards automation is very human - we are afraid of losing control of our infrastructure. This is understandable - with scripts crawling around and doing multiple modifications on the devices, we may be afraid of losing visibility.

But is it really the case? After all, it is an engineer that runs and supervises the changes, and even if at some advanced stage there is a closed loop automation in use (i.e. the operational changes in the network are automated, without any human interaction), one still has the observability tools to monitor the operational state.

The observability aspects are very important, and they tend to grow in importance with the growth of infrastructure, adding AI-based tooling (and the adoption of automation). What is more, network automation, in most cases, is paired with the implementation of a Single Source of Truth / Single Source of Intent (such as through approaches like GitOps described in Unlocking the power of GitOps for network automation), improving overall visibility and data consistency.

Tools for network automation

There are many tools and many solutions that can be used to build a network/infrastructure automation ecosystem. The complexity of automation varies from simple scripting to complex, multistage actions, and the tooling used is necessary to achieve these goals.

To build a simple script, the above-mentioned Ansible, Terraform, or Python can be used. Each of these tools has a different purpose and is used in different situations. For more information on when and how they can be applied, refer to the Network automation tools comparison in code examples: Terraform, Ansible, and Python SDK which compares the three.

The tools mentioned above are commonly used - they help to automate cumbersome tasks and make engineers’ lives easier. What is often missing, though, is a common repository for the scripts, as each person has their own set of scripts that are not shared with the team. One of the steps in building network automation is creating a common set of scripts, playbooks, and modules that can be used by every engineer working with the network. Such an approach not only helps in speeding up changes, but also ensures that the changes are performed in the same way, so the output configuration is derived from the same template. In the long run, one can expect a unification of infrastructure configuration (with the exception of variables like IP addresses or VLAN numbers, of course), which in turn will allow for automating more complex and advanced tasks.

For a more holistic automation approach, one can choose to implement an automation platform. There are different solutions available on the market, like Itential , BackBox , and the SolarWinds products family, NetBrain , Unimus , Pliant, or NetYCE , which allow for network automation (with a different level of effort and integration with other necessary tools), or like IP Fabric or Forward Networks for modeling the network. In addition, other solutions like Nautobot , NetBox, or a new solution currently being built by OpsMill can serve as a Source of Truth / Source of Intent. It all depends on the needs, the level of automation to be implemented, and what is already present in the network.

As different platforms support different vendors (although the number of supported vendors is quite high) what has already been deployed must be considered, and what is planned, together with which integrations are available out of the box. And most importantly, to be able to use an automation platform very often a set of prerequisites must be fulfilled, like the above-mentioned Source of Truth / Source of Intent.

Steps towards automation

But how to start the automation journey? What are the steps to be taken? And when to stop? Those are fair questions but not easy ones to answer.

As in many cases, the most accurate answer is: it depends. In fact, there is no recipe for the introduction of automation. Each deployment will be different as the needs and expectations differ. Also, there are situations where some of the elements needed for automation are already there - in such cases, it may be better to adapt or integrate an already existing block with the solution that is being built.

Despite what has been written above, let’s try to depict a few steps that can create a kind of skeleton for a network automation journey. It is important for the reader to understand that these steps may not be mandatory, or the order may not be exactly the same as the one presented below. It all depends on the specific situation.

Building a Source of Truth / Source of Intent

Let’s start with the first building block, the one that we actually find to be mandatory for every network or infrastructure automation: Source of Truth, also known as Source of Intent.

Why different names? Because there is no agreement on which one is more suitable. I’d personally go with the Source of Intent, but there are different opinions.

The aim of having a Source of Truth / Source of Intent is to have a single place where all network-related data is stored and managed. While the specifics of what data to store, where to store it, and the processes for inputting the data are not really defined, the goal is that the Source of Truth / Single Source of Intent is the only point for accessing and modifying the data.

Script-based automation

Script-based automation is something that, in fact, is present in almost every network nowadays, to some extent at least. It is a common practice for an engineer to use some type of scripting on a daily basis. These can be Ansible playbooks, Python scripts, or even Bash scripts. The goal is to do some part of the work with the help of scripting instead of doing it manually. However, such scripts are owned and managed by an engineer and are rarely shared with others.

With a script-based automation approach, it is crucial to build a common repository for the scripts to be shared in the organization. That way engineers can use them as a shared resource and can collaborate to build new ones. Using the same way of, for example, configuring devices (which is inherently achievable when using the same scripts) brings unification to the infrastructure configuration.

Digital Twin

Another aspect of the automation journey is something called a Digital Twin. This concept says that every network should have a digital representation, so, for example, all new changes can be first applied to the modeled/simulated network, and only if the outcome is fine are they deployed on the production version. Of course, a virtualized (or containerized) network cannot be used for all types of tests - the testability is mostly focused on the management and control planes. However, it is a perfect playground for testing automation scripts of all types.

Automated pre- and post-checks

Up till now, we have mostly focused on making changes to the infrastructure in an automated way but haven’t really touched the operational aspects explicitly. Let’s do it now.

Most of the changes performed in an infrastructure should result in some changes to the operational state of that infrastructure. The types of changes can vary: from modifying a part of the configuration to upgrading the software version of an entity.

However, in both cases, an operator wants to be sure of two things: first, that the change actually did what it was supposed to do, and second, perhaps even more importantly, that the change didn’t break anything already working in the process. In both of these cases, what needs to be checked is the operational state before and after the change was applied. In the case of a large network, that can be difficult and cumbersome to manage.

Now, let’s imagine we have a large network, and for some reason, we needed to change an OSPF metric on one of the links. On top of that, we have some BGP sessions running. We know that this change will impact IGP routing, but it should not affect any of the BGP neighbor states, nor should it impact the number of BGP routes advertised. To verify all is ok, one could get a snapshot of BGP sessions before the change and after the change, and compare the outputs. It can be done by hand, but logging into the devices, taking snapshots and comparing them takes time and is prone to error, especially for more complex cases than the simple one mentioned here.

With automated pre- and post-checks in play, there are scripts already prepared and available to check the operational state of the network before and after the change. If something is not as expected, the operator knows instantaneously and can apply the appropriate remedies.

Platform-based automation

The next step in the adoption of network automation is incorporating a network automation platform. As mentioned above, there are many already existing solutions on the market. It is very important to understand what each solution can offer out of the box, what it is capable of integrating with (for example, which Source of Truth / Source of Intent can be used), and what the means of building the automation are. Also, it is important to know whether the platform is a read-only solution, i.e. it simply shows the current state of the network or it can actually interact with the infrastructure introducing changes. One should also check if and how pipelines can be built to run a set of actions.

We will not focus on comparing different solutions in this article - that is a topic for a whole new blog post - but we wanted to show that there are already mature solutions that can be used for the purpose of infrastructure automation.

Pros and cons of introducing network automation

We have been discussing the good that comes with automating a network. But to be fair, we also need to consider the not-so-bright sides of the change.

The good stuff

Let’s start with all the good aspects first. Introduction of a Single Source of Truth / Single Source of Intent itself is a big gain in terms of understanding the infrastructure. With small networks this may be a small thing, helping to organize network/infrastructure data in one place, but for bigger, much more complex infrastructures, this can be the difference between seeing just a part of it and understanding the whole network concept. In big deployments, like ISP networks, there are different teams that take care of different parts of the network - for example, a RAN engineer does not usually talk with IP/MPLS core engineers - and none of them sees the whole picture, which could actually be beneficial when delivering end-to-end service.

Another one of the most important benefits of automation is the ability to introduce changes at scale. With manual work the job is usually done by a single engineer (or by a team of two engineers) and the changes are serialized. If there are many devices, the time needed to complete all the changes is long and can require several maintenance windows. With the help of automation the modifications can be done in parallel - the main limit lies with the change management requirements - so the time needed for completing the task is much shorter.

The third important aspect is the reduction of human error. In the example mentioned above, after a few hours of extensive yet repeatable work in the middle of the night, an engineer will sooner or later make a mistake - it is not a question of if, rather of when - we are only humans. With the help of even simple scripting, not to mention a full-blown automation platform, those errors can be avoided. And, should a mistake occur, a rollback can be done faster and easier with the help of automation scripts and the Source of Truth / Source of Intent.

The fourth aspect is unification. Having all jobs done by scripts means that the job will be executed the same way each time. With humans involved in the process, there would be a risk that each job would be done differently, as each administrator has a different set of best practices.

The previously mentioned case, the rollback, is potentially a very important aspect whenever a change is made. But what is equally important is that introducing automation allows for much easier tracking of changes their history and correlating them with the outcomes. So, in case something goes wrong, post-mortem analysis will be more accurate.

The challenges of network automation

Now, let’s take a look at the difficulties of introducing automation into an infrastructure.

The first thing that comes to mind is the need for teams to acquire new skills. Simple hand-made configuration changes are not the method of choice anymore (well, to be fair, it may be used for troubleshooting reasons, for example), and other means of making changes are to be used. Tools like Ansible Terraform or programming languages like Python need to be used on a daily basis. On the other hand, the truth is that many engineers use those tools already - just to make their life easier - so the change is not as drastic as it may seem.

The other aspect of automation is the mindset change of how one interacts with the network and/or infrastructure. There is no direct contact with the device - the changes are done by a script or, in more advanced cases, by an automation platform, so a certain layer of separation is introduced. It's not so easy to give up the hands-on experience, the feeling of actual touching the device and seeing the impact immediately,

The last, but certainly not least aspect is the need for precise planning and understanding of the change before it is actually applied. With powerful tools that can apply changes at scale, if something is done incorrectly, the failure may also be experienced at scale. So it is very important to plan the changes, their order, number of devices, and methods of testing in advance to make sure that nothing goes wrong. This also requires a deep understanding of the tools that are used and how they work. But here, a Digital Twin solution comes in handy, where the modifications and the scripts themselves can be checked and verified.

Summary

In this article, we wanted to offer an understanding of the network automation journey, including the steps taken on the way and what tools and solutions we can use. We hope that this blog post has shed some light on why automation is needed, what it can help with, and why it is the way forward.

DESIGN

SOFTWARE ENGINEERING

NETWORK & CLOUD ENGINEERING

DATA

RESEARCH AND DEVELOPMENT

FOR STARTUPS

EXPERTISE

PUBLICATIONS

NEWS