Blog>>Networks>>Network automation>>Navigating the Network Automation Journey: the power of scripting

Navigating the Network Automation Journey: the power of scripting

In previous articles I have discussed the topic of the Network Automation Journey and two aspects that come with it: Source of Truth / Source of Intent and a Digital Twin. Today I want to delve into the topic of a simple form of network automation (or infrastructure automation): script-based automation. It is necessary to mention that “simple” refers to neither the complexity of the scripts used for exercising the automation nor the tasks that are possible through it. It refers to the environment used for hosting such a solution - although not very complex it can provide the user with a vast amount of interesting possibilities.

Now let's dig into the subject.

Why you need scripts for your growing network infrastructure

If we compare the networks (and IT infrastructures) today with those of, let’s say, 10 years ago, we can see a significant growth in their size and complexity. And this is not the only change: new more powerful hardware is deployed, new services are introduced, virtualization and containerization are used on a daily basis, together with public clouds. And on top of that, users expect immediate changes (as with public clouds) and don’t want to wait for service reconfiguration. So, the dynamics of the network and infrastructure has also changed.

However, the traditional methods of network configuration (let’s focus on traditional networks like ISPs, data centers or enterprise networks) were not designed to such scale and dynamics. In traditional networks, where changes are introduced manually, usually preceded by an approval process, the pace is slow, lacks scalability and as a result cannot meet the requirements of modern networking.

Drawbacks of manual operations

Let’s stop for a moment and think about the unwanted aspects of manual network configuration:

  • Repeatability of the tasks that are performed by engineers. Operational tasks, especially day to day activities, are quite similar (with some differentiation in the values) and after some time they become cumbersome and simply boring. It requires a lot of stamina to keep attention on the task and not make errors. But errors happen anyway.
  • Human errors happen, because we are, well, only human. When engineers work long hours during a night maintenance window they get tired and make errors. Those can be mitigated by having a precise plan of the operation, including prepared and approved configuration, a written set of instructions to be followed and a set of tests to be executed. Nevertheless, sooner or later something will go wrong.
  • Serialization instead of parallelization when it comes to executing the tasks. Of course with more than one engineer some tasks can be parallelized, but we’re speaking here about a factor of two, maybe three, and not 10, which is possible with automation (with a proper change management process in place). To make things more complicated some tasks may depend on each other so the order in which they are executed is critical and even when parallelization is avoided one can run into issues (moreover maintaining the correct order of tasks in scripts can be accomplished relatively easily).
  • Lack of configuration consistency or in other words lack of configuration unification. Even when a template is used the same service is not configured exactly the same on different devices or entities (for example the way an interface description is configured may differ, but there can be more serious discrepancies). Why? Because quite often two different operators configure it and each does so in their own unique way. And it gets even worse without templates.

Does the above sound familiar? If so, let’s have a look at how we can improve the life of a network engineer.

NEEDS Networks

 Let’s script common operations: Day0, Day1, Day2

Before we discuss scripting Day0, Day1 and Day2 operations, let’s define what it means, as those terms can have different meanings, especially when compared to the software development life cycle.

  • By Day0 in network we mean the very first setup of a network device. This is not only physical installation and connectivity, but also initial configuration. Sometimes it is done manually, but it also can employ ZTP (to read more on ZTP refer to Zero-Touch Provisioning: ZTP guide and example usages). Day0 is sometimes referred to as greenfield deployment.
  • Day1 on the other hand means that the device has production configuration applied and is starting its work as a production device. The production configuration is much more complex that the initial one and requires precise settings to be used. Making an error here can cause some unexpected and quite often serious issues in production. Day1 is sometimes referred to as brownfield deployment.
  • Day2 refers to the day to day operation and maintenance of the network. Every network changes: new configuration needs to be added, new services run, patches or upgrades need to be applied. And as before, a mistake can have a lot of unwanted consequences.

So, what if we could script some Day0, Day1, and Day2 tasks in order to avoid human errors and introduce faster changes at the same time? Can we do it for both infrastructure and network? Let’s see how Infrastructure as Code (IaC) and Configuration as Code (CaC) can help us.

Infrastructure as Code and Configuration as Code - how they fit into the picture

In the automation world, two terms have become popular lately: Infrastructure as Code and Configuration as Code. They are quite often used interchangeably, but in fact they have distinct meanings. 

Infrastructure as Code refers to the provisioning and management of a network or infrastructure by automation (and the automation is done thanks to code.) 

Configuration as Code (also automated thanks to code) is used to configure already provisioned resources or devices. To read more about Configuration as Code, refer to our Configuration as Code — moving in the right direction blog post.

An example usage of IaC can be a VNF deployment in a public cloud. It is a Day0 task. One could do it manually, using an available GUor CLI, but that would require a lot of clicking or commands execution, especially when there is more than one resource to provision or when there are some additional resources needed for the VNF to become fully operational. An automated way of doing that would be preparing (most probably) a Terraform configuration file that can do the job for us. And what is important is that the definition is reusable: with proper variable setting one can use it again for the same type of deployment, only with different data (like IP addresses). 

Day1 tasks on the other hand are done with the help of CaC (to be fair, with cloud deployments, IaC can also add some Day1 configs), which configures a device (or a resource) with an initial setup. In cases of traditional networks and traditional network devices one can use, for example, Ansible to make changes to the configuration. This means that an Ansible playbook needs to be prepared beforehand (with all the required variables defined). One advantage of this is that the playbook can be applied in parallel to a set of devices (but only when appropriate tests have been performed and a change management process is in place!) and the time needed for config changes is significantly increased. 

The same approach is valid for Day2 operations. All additional changes are made through code (with Ansible playbooks for example), but also all operational tasks can be performed with  scripts: software or firmware upgrades, or patch applications. And again, if done wisely, those can be parallelized, shortening the time for the whole operation (and the number of maintenance windows required). 

We have mentioned Terraform and Ansible as means of network automation, but in fact one can use other tools or even programming languages. The most common programming language used within the network and infrastructure automation plane is Python. To read about the differences between the network automation tools mentioned above read Network automation tools comparison in code examples: Terraform, Ansible, and Python SDK where these are described in great detail.

Shared repository for a code reuse

When we were discussing the drawbacks of manual operations, in addition to increased error rate and lack of scale, we mentioned uniqueness instead of unification when it comes to device configuration, and we haven't addressed that problem yet. Let’s do it now.

The majority of the discrepancies within network configuration comes from the fact that the config is entered manually by individuals who do not have the same approach, or more precisely each of them has their own unique style. This can be mitigated to some extent by the use of templates, but cannot be eliminated entirely. So, how to make sure that network devices are always configured in the same manner? Let’s see how a shared repository can be of help here.

First of all I must emphasize that having a shared repository is a must when thinking seriously about network automation. Why? Because the truth is that many if not all network engineers use some form of scripting already (and have been using it for some time now), but each script was unique to an engineer, was used only by them, stored somewhere locally on their laptop, and was not shared with others. The result: unique configurations, not to mention the amount of work needed for everyone to have their own script.

So what does having a shared repository change? 

First, all engineers use the same script base, so if one person prepares the script, all others can use it. If there are some changes needed and a new version is prepared, the history of the changes is saved (thanks to the version control feature) and the script (and config) evolution is visible. Also, to address the unification of configuration, with the scripting coming from a shared repository it is done the same way every single time (with engineers following the same discipline). Moreover a review process for the code can (and should!) be introduced, so each piece of code is cross-checked by someone other than the author to catch errors early, before the script goes to production. Apart from that, a shared repository simplifies the testing process by enabling the implementation of automatic, standardized, obligatory test cases for each modification to existing scripts or the addition of new ones. These tests can verify the outcomes of changes, check for common errors, etc.

Using a shared repository is a first step towards managing the network in a more software development-like manner - a step towards DevNetOps.

Which repository to use? Whichever fits you best; there are several to choose from.

Cons of script-based automation

We have described the good that comes from using scripts within the network and infrastructure realm. But what are the disadvantages? 

For one, the learning curve. Someone configuring devices mostly via CLI needs to learn some automation tools or programming language or languages. This is additional work which takes some time. 

Also, the important thing is letting go of direct, hands-on control - another human aspect. One must accept the fact that changes are done via script, and there is no longer constant interaction with the device.

Another concern might be that an automation script with a typo will affect several devices during its script execution. While this is true, such errors are quickly detected during reviews or while using a Digital Twin environment (commonly used as a test environment). 

The above are the human aspects of automation, but in fact the impact of wrongly applied script can be much more serious. Imagine that you have to do an upgrade of 50 devices and decide to do some of them in parallel. But accidently, you have chosen two that are redundant in relation to each other, and started the process at the same time. The result is quite clear: a part of the network gets cut off for some time.

That is why the change management process is crucial, and the code review process (possible with a shared repository) is so important - the gain of using scripts can bring a lot of good, but used unwisely can also do a lot of harm.

How to start?

When I think of starting with network automation (or with almost anything to be honest), I think of starting small: choosing a piece of infrastructure I want to play with and a preferred scripting language. One can run a VM on a laptop, choose Bash and use it to configure some interface parameters. Or select Ansible and run it against some VM. Or choose a network device, physical or simulated, and work with it. Once some base scripting is working, I’d go with setting up a repository to familiarize myself with new ways of working with code. And once that is done, I’d play with scale with the help of a simulated environment. But this is just an example, everyone’s way is different and there is no single recipe for success. If you're looking for professional help with implementing network automation in your company, see how our experts can support you in this process.

Conclusion

This article has focused on a simple form of network automation: script-based automation. With the growing complexity of network infrastructures, manual provisioning and configuration are no longer an option and using script-based automation offers a means to manage it. Script-based automation is a step (although not a mandatory one) on the network automation journey.

Antoniak Monika

Monika Antoniak

Director of Engineering and Professional Services

Monika’s background is in networking. She spent over 15 years in telcos designing, prototyping, testing and implementing solutions. She has worked with different vendors and thus looked at different realizations and different points of view. To keep abreast of rapidly evolving technology, she has broadened...Read about author >

Read also

Get your project estimate

For businesses that need support in their software or network engineering projects, please fill in the form and we'll get back to you within one business day.