Network Automation and Orchestration: A Practical Guide to Tools and Real-World Challenges

Automation and orchestration are two key pillars of modern network management that have become essential for handling today’s IT challenges. While the fundamental differences between automation and orchestration are crucial to understand, and well-covered in our previous article, Network Automation vs. Orchestration: Roles and Key Differences, this time we will take a deeper dive. We'll explore practical applications, the tools of the trade, and the real-world challenges of transforming network workflows.

Automation

The entire IT industry is moving towards automation, and computer networks are no exception. Organizations that fail to automate their network environments risk falling behind those investing in modern, scalable solutions. Well-designed automation allows for managing device configurations, monitoring health, and responding to events with greater efficiency and repeatability than traditional methods - translating directly to a more reliable, innovative, and ultimately more profitable infrastructure.

Network automation market

According to precedenceresearch.com Network Automation Market Size to Reach 43.40 Billion by 2034 , the global network automation market will reach a value of approximately $43.40 billion by 2034, growing at a compound annual growth rate (CAGR) of 23.15% over the forecast period from 2024 to 2034. There are many more such calculations. Even if some of them are overestimated, the trend is that we can expect substantial growth in this market sector. This should be direct evidence for companies that this is an opportunity to grow and differentiate, not a burden and a constraint.

Fig.1: Network Automation Market Size Growth 2024 to 2034 [USD Billion]Source: Network Automation Market Size to Reach 43.40 Billion by 2034

Types of automation

CLI-based (The Traditional Approach) - This long-standing method involves connecting to a device's Command-Line Interface (CLI) via SSH. The automation executes commands and then "scrapes" the human-readable text output to gather information. While universally applicable, this CLI-based approach is inherently fragile, as it can break with minor changes in the text output.
API-based (The Modern Approach) - A more robust and reliable method is to interact with a device's Application Programming Interface (API) using protocols such as NETCONF or RESTCONF. Instead of parsing text, the automation exchanges structured data (typically XML or JSON). This approach is far less prone to errors caused by formatting changes and provides a more programmatic way to manage device state, though it does require that the network hardware supports these modern interfaces.

Benefits of automation

Beyond the obvious benefit of time savings, automation’s key advantage is the elimination of human error through perfect repeatability. Scripts execute tasks identically every time, free from distraction or fatigue. In addition, automation significantly enhances network monitoring and diagnostics, with software often detecting and reporting issues before they escalate. As a result of these combined improvements, administrators are freed from repetitive operations to focus on more strategic tasks, moving from a reactive to a proactive approach in managing the infrastructure.

It is important to remember, however, that automation in computer networks does not mean eliminating human involvement altogether, but rather creating an environment in which human intervention is reduced and ideally required for more complex tasks or workflows. Well-designed automation is transparent, easy to monitor, and most importantly, predictable in its results. For more information on the benefits of automation, check out our other article Network Automation — Why Does it Matter?.

Examples of using automation in everyday network tasks

Device provisioning - focuses on providing an initial configuration to a device. We can set up a device with the simplest parameters to allow remote control, but also with more advanced parameters that affect how it fits into our network and security. You can read more about this process in our dedicated article The Power of Automated Network Provisioning.
Telemetry - Gathering information about the devices and their connections that make up a computer network. This can be used for diagnostics and monitoring, as well as network architecture planning.
Migration - refers to changes made to the network, such as replacing equipment from one vendor to another, or changing the network architecture in general. We can think of both device configuration conversion and automatic device pushback.
Configuration Management - standard changes to the network configuration. For example, a simple configuration of ports according to user needs.
Software Lifecycle Management - Focuses on the ongoing, operational maintenance of operating systems across a fleet of devices. This automates scheduled tasks like deploying software patches or minor version updates to ensure software consistency and supportability throughout the network.
Reporting - generate reports on the status and changes in the network.
Troubleshooting - for example, a one-click check of all parameters on both devices between which OSPF is set up according to their requirements.
Configuration Backup and Restore - A fundamental but critical task focused on creating a safety net. This involves automatically backing up device configurations on a regular schedule.
Compliance - This involves more than just static checks. It starts with verifying that device configurations meet security policies and a defined baseline. This is then expanded to orchestration with proactive test automation to validate changes before they are deployed, and it can even include dynamic responses to threats, such as automatically updating an ACL based on a security alert.

Orchestration

Focusing on individual devices is no longer sustainable when the goal is to deliver complex, end-to-end services quickly and reliably. This is where the shift in perspective offered by orchestration becomes critical. Instead of managing boxes or tasks, you begin to manage processes and workflows. Orchestration provides that strategic layer to control processes spanning multiple systems - from firewalls and routers to virtual networks - to deliver a complete function. This holistic approach makes deploying a new service, updating a configuration, or responding to a failure more predictable and far less error-prone.

Types of orchestration

Policy-based Automation (PBA) - is an approach that manages and automates network processes based on pre-defined policies. In this model, decisions about managing resources, security, access, and other aspects of the network are made based on rules and policies that can be easily configured and customized. PBA enables the automation of network management, ensuring that resources are used in a manner consistent with corporate policies and application requirements.

Software-Defined Networking (SDN) - is a network architecture that separates the control plane from the data plane, allowing the network to be centrally managed by software. In traditional networks, network devices such as switches and routers have both control and data plane functions. In SDN, the control function is performed by a central controller that manages network traffic. Since SDN is a widely used architecture, the topic can be explored in more depth in other articles on our blog.

Intent-based Networking Systems (IBNS) - is an approach where network administrators define a desired high-level outcome (the "intent"), and the orchestration platform automatically translates it into the necessary workflows, network configuration and policies to achieve that state.

Benefits of orchestration

Orchestration makes the network more agile and ready to respond quickly to change-whether it's a planned expansion or a sudden need to adapt to new conditions. One of the first benefits of orchestration can come from managing available resources to get the most out of them while reducing waste. For example, in the context of IP address management.

Another is scalability, the ability to adapt the network infrastructure to the needs of the business. Orchestration also delivers key benefits like real-time coordination, which handles the complex timing and sequencing of tasks across different systems. This is made possible by interoperability, as orchestration integrates diverse tools and platforms into a single, cohesive workflow.

And let's not forget the fundamental benefits of automation itself, such as increased efficiency by eliminating repetitive tasks and minimizing the risk of manual intervention, which is subject to human error.

Another benefit is improved security and visibility. We explore these and other benefits in more detail in our article Why Orchestration Matters.

Perhaps most powerfully, orchestration enables a capability known as "closed-loop automation". In this model, the system can automatically detect an event - such as a security threat or performance degradation - respond by correcting the configuration, and then report the action, all without immediate human intervention. In essence, this approach builds upon the benefits of individual automations, amplifying their collective impact while minimizing the need for manual intervention between tasks.

The Value of Orchestration: A Real-World Scenario

In a large organization with an extensive network infrastructure, there are dozens of requests every day to configure access to resources. Imagine a situation where an employee in one location needs to connect their device to a server in another location. This process requires a number of steps - checking permissions, assigning an IP address, configuring network traffic rules, and firewall settings at both locations. Even with partial automation, performing each of these tasks separately consumes valuable IT team time.

Let's look at it another way. A modern orchestration solution simplifies the entire process to a minimum; the technician connecting the devices enters only two pieces of information: the port location and the IP address of the target resource. The system takes full control of the process - checking permissions, configuring all the necessary infrastructure components, and verifying that the connection is correct. What's more, it documents every change, giving administrators complete visibility into network activity. This illustrates a core principle: the more complex the process, the greater the value delivered by its orchestration. While the system efficiently performs monotonous background tasks, the network administrator can finally enjoy a freshly brewed cup of coffee.

Key Distinctions: A Summary

As pointed out in the introduction, a dedicated article on our blog already explores the nuances between automation and orchestration in depth. Network Automation vs. Orchestration: Roles and Key Differences. The table below, however, serves as a quick reference to these key distinctions before we move on to the practical aspects.

	Automation	Orchestration
Definition	Performing individual tasks without human intervention.	Managing and coordinating multiple automated tasks as a unified process.
Scope	Focuses on a single task or workflow.	Manages multiple tasks and their interdependencies.
Human Intervention	Requires manual steps between tasks.	Minimal human intervention; mostly self-managing.
Scalability	Limited to individual tasks.	Highly scalable across systems and processes.
Complexity Level	Low to medium complexity.	High complexity with multiple dependencies.
Error Handling	Basic error reporting; manual intervention often required.	Advanced error handling with automated recovery.
Implementation Time	Shorter implementation cycles.	Longer implementation, but significant long-term benefits.
System Integration	Operates within a single system or limited scope.	Integrates multiple systems, tools, and platforms.
Monitoring Capabilities	Basic task-specific monitoring.	Comprehensive monitoring across entire workflows.
Business Impact	Localized efficiency improvements.	Organization-wide process optimization.

Automation and orchestration tools

The market for network automation and orchestration tools is complex, with many solutions capable of wearing multiple hats. However, to better understand their primary roles, we can group them based on whether they focus on executing specific tasks (Automation) or coordinating complex processes (Orchestration)

Task Automation

Libraries for communicating with network devices

One of the most popular tools in this category is Netmiko , a Python library based on Paramiko. It allows you to set up SSH connections to network devices and execute commands in a manner similar to manual configuration. Netmiko is adapted to a variety of network platforms, making it ideal for heterogeneous environments. A major advantage of this library is its extensive support for different vendors. Keep in mind, however, that it requires programming knowledge, but at the same time offers almost unlimited automation possibilities.

Another tool based on Python is Nornir . It focuses on performance and concurrent execution, eliminating the need for the programmer to manage multithreading. It provides an abstraction layer over various inventory and task execution tools, including popular choices like Ansible and Netmiko, as well as Napalm.

Napalm (Network Automation and Programmability Abstraction Layer with Multivendor support) is another library that simplifies communication with network devices. It supports NETCONF, RESTCONF, SSH and gRPC protocols, making it easier to retrieve data from devices and standardize the interface for different vendors. The choice of tool depends on the specifics of the network environment and the constraints of its architecture.

Infrastructure as Code (IaC) Tools

One of the most popular tools is Ansible . We use it to create playbooks in the form of YAML files that describe tasks to be performed on a remote host, such as a network switch. Ansible is relatively easy to use and doesn't require much knowledge to get started.

An alternative is SaltStack , where we specify the target state of the system in YAML files. Other alternatives like Chef and Puppet are also powerful, but differ architecturally: unlike the agentless Ansible, they require a dedicated agent to be installed on each managed device. Automation frameworks like Ansible often integrate with a Source of Truth (SoT) like NetBox to ensure all actions are based on reliable network data.

Process Orchestration

While the core Ansible engine executes individual playbooks, tools like Ansible Tower , AWX , and others elevate this capability. They allow you to combine these playbooks into more advanced, multi-step workflows, which is a clear step towards process orchestration.

Beyond the Ansible ecosystem, there are also general-purpose workflow engines designed to orchestrate complex processes. Temporal is one such engine, offering durable, stateful workflow execution where retries, timeouts, and recovery from failures are handled automatically. Prefect , on the other hand, provides a Python-based platform for building and observing flows of tasks, with features like dynamic scheduling, error handling, and event-driven triggers. Both tools are increasingly being explored for orchestrating distributed workflows, including those in network and infrastructure orchestration.

Infrastructure as Code (IaC) Tools

There is also Terraform, which is based on a different concept. In Terraform, we define configurations in HCL (HashiCorp configuration language) that describe the desired state of the infrastructure. The tool automatically determines the steps necessary to achieve that state. While each of these steps is an automation task on its own, it is the coordination of all of them to reach the defined state that we refer to as infrastructure orchestration. To learn more about the differences between Terraform and Ansible, I encourage you to read our article Ansible vs. Terraform in Networks Ansible vs. Terraform in networks: similarities and differences.

SDN: An Architecture for Orchestration

Software-Defined Networking (SDN) represents a paradigm shift towards centralized network control, making it a powerful architecture for orchestration. One of the most popular protocols for controller-to-device communication in such systems is OpenFlow. It is used by many SDN controllers, including OpenDaylight and ONOS. These tools are critical to building modern, flexible networks. However, their implementation often requires the replacement of devices that do not support the separation of the control plane from the data plane.

Tools to support testing

For years, many network administrators have used GNS3, a network device emulation tool. It allows you to create virtual test environments that can be connected to the real infrastructure, allowing you to experiment in an isolated sandbox environment. Note, however, that operating system images of network devices must be obtained directly from the vendor.

In recent years, modern alternatives like Containerlab have also become popular. By using lightweight containers instead of full virtual machines it allows for creating complex lab topologies much faster and with lower resource consumption, making it ideal for CI/CD environments.

Choosing the right virtual lab tool depends on the specific use case. For a detailed comparison of GNS3, Containerlab, and other popular alternatives, see our in-depth Virtual Labs: Running Network Topology on a Plain Laptop.

Another useful tool is pyATS, a Python library developed by Cisco. It allows you to automatically test device configurations, monitor performance, and perform complex configuration changes. pyATS is specifically designed for network testing and allows for a high degree of automation in the process.

Challenges

Integrating with Legacy Hardware

A primary technical hurdle in heterogeneous networks is the coexistence of modern devices with rich APIs (like NETCONF/RESTCONF/gNMI) and legacy hardware offering nothing more than a CLI over SSH. To illustrate the scale of this challenge, the table below shows a real-world scenario from a production network, highlighting universal availability of Netmiko (via SSH) compared to the fragmented support for modern protocols across different device platforms.

	NETMIKO	NETCONF	RESTCONF	gNMI
Cisco IOS	✅	✅	✅	❌
Cisco IOS XE	✅	✅	✅	✅
Cisco NXOS 5k Series	✅	❌	❌	❌
Nokia 7210 SROS	✅	❌	❌	❌
Nokia 7250 SROS	✅	✅	❌	✅
Nokia 7220 SRLinux	✅	❌	❌	✅
Dell OS9	✅	❌	❌	❌
Dell OS10	✅	✅	✅	✅
Quanta	✅	❌	❌	❌
Nokia Nuage WBX	✅	❌	❌	❌
Junos OS	✅	✅	❌	✅

This reality makes the CLI the only common denominator for automation across the entire environment. Tools like Netmiko excel at this, providing a universal way to interact with any device. However, this universality comes at a cost: fragility. Relying on "screen scraping" - parsing the text output of show commands - means that a minor firmware update that changes the output formatting can instantly break the automation.

This presents a strategic dilemma. The choice is often between building all automation around the less reliable CLI-based approach, excluding legacy hardware entirely, or adopting a complex hybrid model. This third option requires maintaining two separate automation codebases - one for CLI interactions and another for modern APIs - significantly increasing complexity.

A mature architectural solution is to build an internal abstraction layer using the 'Adapter' design pattern, which provides a consistent interface for all automation, hiding the underlying communication method. The decision hinges on a pragmatic assessment of the project's goals, budget, timeline, and the team's skills.

Complexity and Tool Sprawl

As automation efforts grow organically within an organization, they often fall into the "tool sprawl" trap. Different teams, addressing similar problems, adopt different solutions: one might use Ansible, another may prefer custom Python scripts, while a third might be tempted to build a new tool from scratch in Go.

This creates isolated "islands of automation" where functionality is duplicated, increasing both costs and the required skillset of the team. Integrating these disparate systems into a single, cohesive workflow becomes a significant engineering challenge. Furthermore, this approach prevents any single platform from reaching its full potential, as resources are spread thin maintaining multiple solutions instead of perfecting one.

Lack of Data Standardization

Automation projects often uncover a foundational challenge: critical data that exists not in a database but as 'tribal knowledge' - unwritten conventions stored only in the minds of engineers. While a human can intuitively work around a minor inconsistency, an automation script requires predictable, standardized data. A perfect example is the port description field used to reserve an uplink: a human can infer intent from inconsistent text, while a script sees only a data mismatch. This forces a manual, network-wide standardization effort before automation can proceed.

A separate, though related, challenge is the lack of a centralized Source of Truth (SoT) for foundational data like IP addresses, VLANs, and device roles. Implementing a dedicated SoT platform is the logical solution. However, its success hinges on organizational discipline. An SoT provides maximum value when it is rigorously maintained and integrated into automated workflows, ensuring it reflects the intended state of the network, rather than becoming just another data silo.

The Skills Gap and Inter-Team Collaboration

Automation initiatives often expose a fundamental human challenge: the skills gap. Many talented network engineers chose their specialization precisely because they were more interested in routing protocols than programming. On the other side of the aisle, skilled software developers often lack the deep, domain-specific knowledge of network operations required to build truly effective tools.

This creates a high demand for rare, hybrid "DevNetOps" talent. More commonly, however, it forces two separate teams - NetOps and a development team - to collaborate, which introduces a new set of challenges rooted in conflicting perspectives.

The network team, as the end-user, requires a practical tool that solves their real-world problems. They are often wary of development cycles focused on milestones that don't deliver immediate, tangible value. Conversely, the development team, typically under time and resource pressure, may not fully grasp the operational nuances that make a tool genuinely useful. This can lead to building a minimum viable product that, while technically functional, fails to meet the network team's day-to-day needs. Bridging this divide in communication, priorities, and understanding is frequently a greater challenge than solving the technical problems themselves.

Cultural Resistance

Perhaps the most significant hurdle is overcoming human nature itself. Resistance is often rooted in two things: cognitive habit and a rational distrust of abstraction. An engineer who instinctively opens a terminal to check a port status has to consciously break years of muscle memory. This is compounded by the belief that the device's CLI is the ultimate source of truth. Querying the device directly is seen as interacting with the primary data source, while the automation tool is a secondary layer that could present stale or misinterpreted information. The tool, therefore, must earn its credibility.

Overcoming this is a slow process of building trust through repeated, positive experiences. New habits form only when a tool proves itself to be consistently reliable. This is why introducing buggy or unpolished tools can be so counterproductive, as a bad first impression can permanently sour a team on the new approach. Ultimately, most engineers are open to innovation, but the onus is on the automation solution to be trustworthy enough to replace their deeply ingrained, battle-tested methods.

Summary

Task automation is the foundational layer, delivering immediate value by perfecting individual actions and ensuring repeatability. Process orchestration builds upon this foundation to create true strategic capability by coordinating multiple automated tasks into complete, business-driven services. This requires a change in perspective - from a device-centric to a service-centric view of the network.

This journey is navigated with a diverse set of tools, from Python libraries like Netmiko for direct device interaction to declarative platforms like Terraform that orchestrate complex state. However, as we've detailed, the real complexity lies not in the tools themselves, but in the environment they operate in. Navigating the challenges of legacy hardware without modern APIs, standardizing data locked away as "tribal knowledge," and overcoming internal tool sprawl are the true tests of an automation strategy.

Ultimately, the most critical factors are twofold. Conquering the foundational human challenges - bridging the skills gap and overcoming cultural resistance by building trust - is essential. Equally important is tackling the unique technical complexities that define advanced orchestration, such as managing task dependencies, planning automated remediation, and performing state analysis in heterogeneous networks.

By focusing on both these human and high-level technical challenges, organizations can build a resilient, automated network ready to leverage future innovations, such as the powerful insights offered by AI analytics.

Services

Knowledge

Network Automation and Orchestration: A Practical Guide to Tools and Real-World Challenges

Table of contents:

Automation

Network automation market

Types of automation

Benefits of automation

Examples of using automation in everyday network tasks

Orchestration

Types of orchestration

Benefits of orchestration

The Value of Orchestration: A Real-World Scenario

Key Distinctions: A Summary

Automation and orchestration tools

Task Automation

Libraries for communicating with network devices

Infrastructure as Code (IaC) Tools

Process Orchestration

Infrastructure as Code (IaC) Tools

SDN: An Architecture for Orchestration

Tools to support testing

Challenges

Integrating with Legacy Hardware

Complexity and Tool Sprawl

Lack of Data Standardization

The Skills Gap and Inter-Team Collaboration

Cultural Resistance

Summary

Read also

Multi-layered abstraction in orchestration

Introduction to GenAI-based multi-agent network orchestration (part 2)

Get your project estimate

Trusted by leaders: