Why the jump from TM Forum Level 2 to Level 3 is bigger than the framework makes it look

TM Forum's Autonomous Network Levels are now the closest thing the industry has to a shared measurement for network automation maturity. Most operators sit somewhere between Level 2 and Level 3. The latest TM Forum benchmarks put roughly 40% at L2 and 17% at L3, with a handful above and the rest below. The published definitions for those two levels look almost interchangeable on the surface:

L2 is "automation driven by statically configured rules",
L3 is "automation driven by dynamically programmable policies".

One sentence apart in the standards documents.

That sentence hides an enormous amount of engineering.

Most teams we work with discover this the hard way. They have Ansible playbooks, a CI pipeline that pushes config to devices, structured templates, a working source of truth. By any reasonable measure, they are automated. The L2 to L3 step seems like it should be a small one. Then someone tries to build the first real closed loop, and a list of architectural problems opens up that the existing tooling was never designed to solve.

This article is about what's actually in that gap. Not the strategy view, not the framework view, but the engineering work that has to happen between "we run scripts" and "the system observes its own state and corrects deviations against a policy."

What L2 actually means in practice

A Level 2 automation estate, in the TM Forum sense, is mostly imperative. Engineers write workflows that say do this, then this, then this: provision a service, push a template, reconcile a config drift, run a backup. The rules are static: they're encoded in playbook logic, in Jinja templates, in a Python module's branching, in scheduled jobs. Humans decide when to run them, and humans pick up the alert if something breaks.

This is the world most CSPs have spent the last decade building. It works. It scales reasonably well. It produces auditable change records. The catch is that all the intelligence lives in the engineer's head and the script's branching, and any change to policy means rewriting and re-testing scripts.

In TM Forum's IG1230 architecture guidance, the L2 stack typically maps to an orchestrator (or several) plus a fairly thin assurance layer. There is monitoring, but the monitoring rarely talks back to the orchestrator in any structured way. When something deviates, a human gets pinged.

What L3 actually means in practice

L3 introduces three architectural elements that aren't optional:

A policy layer that's separate from the workflows that enforce it. Intents describe what should be true about the network: service-level objectives, configuration invariants, capacity thresholds, security posture. They are not playbooks. They are statements about state.
An assurance layer that continuously evaluates actual state against intended state. Telemetry comes in, gets normalized, gets compared against the policy layer's expectations. Drift is detected programmatically, not visually.
A closed loop that connects the two. When assurance detects drift, the system generates a corrective action, validates it against the policy and against the rest of the network state, and applies it through the orchestrator with rollback safety.

This is what TM Forum's IG1230 calls out as the L3 baseline: orchestrator, assurance system with network inventory, policy manager. Three things, and the architecture has to make them talk to each other reliably.

The shift is from imperative ("here is the procedure") to declarative ("here is the desired state, and here are the policies that constrain it"). The same shift Kubernetes brought to compute, except the underlying domain is multi-vendor, multi-protocol, partly physical, often decades old, and far less forgiving when something goes wrong.

The five things that actually have to be built

Once a team commits to jumping from L2 to L3, the work concentrates in five areas. Each one is non-trivial on its own, and they have to be built in roughly this order because each depends on the previous ones.

1. A real source of truth, not just a configuration management database (CMDB)

Every team we've worked with has a source of truth, but not all SoT are made the same. A common failure mode is that the SoT records what the network should contain (devices, IPs, VLANs, service definitions) but doesn't carry the policy expectations that any closed loop has to evaluate against.

A genuine source of truth for L3 has to hold both. It needs to know that a customer service exists, and it needs to know what good behaviour looks like for that service: latency bounds, redundancy posture, security policy. Without that, "drift" can't be defined, and without a definition of drift, there's nothing for an assurance loop to detect.

We recently solved this issue in the data center fabric work we did with eBay on Spectron . There, the source of truth holds an abstract topology: a graph describing which nodes exist, how they're connected, what role each one plays, and what routing behaviour is expected between roles. It doesn't specify vendors or physical ports.

A second layer, the platform spec, binds that abstract graph to specific hardware, for example, declaring that this node is a particular top-of-rack (ToR) model on a particular SONiC release, and that the abstract interface labelled, say, to-spine, maps to a specific physical port on that device. Configuration is rendered per device from those two layers at zero-touch provisioning (ZTP) time, rather than stored as a golden config.

Because intent and hardware are kept separate, a ToR can be swapped between vendors without touching the topology design: the role stays the same, the platform spec changes, and the next ZTP regenerates the correct configuration for the new hardware. The intent survives the swap.

2. Telemetry that's structured for closure, not for dashboards

Most production telemetry pipelines are built for humans. SNMP polled at five-minute intervals, syslog dropped into a SIEM, NetFlow into a traffic analyzer, Grafana on top. This works for human-driven operations, but it is the wrong shape for closed-loop automation.

A loop needs telemetry that is:

Streaming, not polled. Five-minute polling intervals make some closed loops mathematically impossible. Model-driven telemetry is the usual answer.
Modeled. The corrective layer has to reason about telemetry programmatically. That means it needs schema, not text. Vendor-specific MIBs and free-text logs become a translation tax that gets paid on every loop iteration.
Correlated. A single anomaly observed at one device is rarely actionable. A loop typically needs to see the same anomaly across a service path, or compare it against a service-level baseline, before acting.

The telemetry that runs a network operations center (NOC) dashboard is almost never sufficient to drive a closed loop. Building a parallel, structured pipeline is part of the L3 commitment.

3. A policy layer that's actually queryable

Policies in most L2 environments are written down somewhere: in compliance documents, in design specs, in the heads of senior engineers. They are not represented in a form that software can evaluate.

L3 needs policies as code, or at least as data. There are several reasonable patterns:

Declarative configuration models (YANG, OpenConfig) for device-level invariants
Rego/OPA-style policy languages for compliance and access policies
Service-level policies expressed as constraints on telemetry (latency under X, packet loss under Y, redundancy at N+1)
Intent definitions that sit above policies and express business outcomes the policies should serve

The choice between these depends on the domain. The point is that an L3 system needs at least one of them, deployed consistently, with a query interface the assurance and orchestration layers can both call. A policy file that lives only in a Git repository and is consulted by humans during incidents does not count.

4. Stateful, idempotent, execution

This is a problem we ran into directly with eBay's data center fabric automation . The previous tool was Python-based, well-written, and by any reasonable measure automated, but it was a build automation tool, designed to stand the fabric up from scratch rather than operate it once live.

Two things eroded that approach in production. The cloud controller started modifying ToRs directly as workloads came and went (assigning VLANs, adjusting routing), and none of those changes flowed back through the build tool, so its model of each switch went stale within weeks. And hardware swaps were rarely clean substitutions: a replacement ToR might come from a different vendor in the same class, or from the same vendor on a different validated OS release. The build tool's idea of "what should be on this switch" was the day-of-deployment snapshot, not what the network had become, so it had no clean way to push correct configuration to the new hardware.

This is the L2 pattern breaking down: imperative build automation works on a clean run and decays from there. The replacement system, Spectron, was designed around the gap. Intent lives in declarative specs, and the source of truth holds an abstract topology, nodes, roles, connections, routing behaviour, rather than stored configuration. Configuration is rendered on demand from that model, accounting for the specific vendor and OS of the box that's actually there. A hardware swap stops being a special case: the new switch looks up its position in the graph and the right config is generated for it.

Idempotency comes out of this design. Re-running the rendering process produces the same configuration every time, because the configuration is a derivative of the spec rather than an accumulation of changes against it. A GNS3-based digital twin lets the team verify changes in a virtual replica of the fabric before any physical hardware is touched.

Building this often means moving from script-based execution to a stateful orchestration model, one that tracks intended state, current state, and pending operations as first-class objects. Tools like Nornir, structured Python frameworks, or commercial orchestrators that expose state machines all work here. Ansible is not disqualified, but it has to be wrapped in something that owns the state and the failure semantics.

5. Loop governance

The least-discussed part of L3, and the one that slows teams in production.

Once you have one closed loop running, you can probably keep it under control. Once you have ten, you have a coordination problem that doesn't exist at L2. ETSI's ZSM009 specification, which TM Forum's L3 work draws on, is explicit about this: closed loops have to be governed. There has to be a registry of what loops exist, what they're allowed to act on, what their priorities are when they conflict, and how their lifecycle is managed.

This is governance in the engineering sense, not the corporate sense. It's the part of the architecture that prevents a fault-recovery loop and a capacity-optimization loop from fighting each other on the same link, and that lets an operator suspend a loop without disabling automation entirely. Most teams don't build this until their second or third loop is in production and the conflicts start showing up.

The honest summary

The L2 to L3 step looks small in TM Forum's documentation, but isn't. It requires a real source of truth, structured telemetry, an actual policy layer, transactional execution, and loop governance: five engineering programs of work that have to land in roughly that order.

It is mostly a deterministic-software problem, with AI useful in supporting roles. And the teams that succeed at it generally start with one domain and one loop, rather than declaring an enterprise-wide L3 transformation.

If you’re looking to learn more about Autonomous Networks and the work we do to get you there, check out our dedicated home page.

Services

Knowledge

Why the jump from TM Forum Level 2 to Level 3 is bigger than the framework makes it look

Table of contents:

What L2 actually means in practice

What L3 actually means in practice

The five things that actually have to be built

1. A real source of truth, not just a configuration management database (CMDB)

2. Telemetry that's structured for closure, not for dashboards

3. A policy layer that's actually queryable

4. Stateful, idempotent, execution

5. Loop governance

The honest summary

Read also

When buy is only the beginning: making firewall management move at DevOps speed

When build is the right answer: how eBay automated its data center fabrics with Spectron

Get your project estimate

Trusted by leaders: