From security principle to implementation: CodiLime’s notes on the Five Eyes agentic AI guidance

In May 2026, six national cybersecurity agencies from the Five Eyes countries (Australia's ASD, the US CISA and NSA, Canada's Cyber Centre, New Zealand's NCSC-NZ, and the UK's NCSC-UK) published joint guidance titled Careful adoption of agentic AI services. It is a serious document. It provides a useful risk structure for agentic AI in high-consequence environments, names important failure modes clearly, and recommends a sensible set of controls.

If you have not read it yet, you should. It is one of the most authoritative public references specifically focused on the secure adoption of agentic AI by cybersecurity agencies.

By design, the guide operates at the level of what to control rather than how to implement those controls. That is the right choice for a document meant to apply across a wide range of agent architectures, deployment models, and operating environments. But it does mean that for any specific deployment, an engineering team still has to translate each control objective into a concrete design.

This article is our team's notes on that translation, written from inside a narrow slice of the problem space: agentic systems that use the Model Context Protocol (MCP) to operate against live network infrastructure. We walk through five places where we found the gap between the guide's recommendations and a working implementation most instructive, and describe the choices we made. They are not the only valid choices (other architectures, frameworks, and assumptions will lead to different answers), but they are concrete, and the trade-offs may be useful to teams working through the same translation in their own context.

The implementation gap, in one example

The Five Eyes guidance recommends, under "Privileges and authentication":

Require just-in-time credentials for high-impact or privileged actions.

That single line correctly captures the control objective. Translating it into a deployed system in our context (where MCP tools open SSH sessions to network devices) raised a set of design questions we had to work through:

What does just-in-time credential issuance look like when an agent calls a tool that needs to SSH into a router? Who issues the credential? How short is "short-lived"? How does the device know to trust it? How is the user's identity preserved through the credential so that device logs do not all show "mcp-agent connected"? What happens when the credential issuer is unreachable? How is the trust anchor rotated without taking the fleet offline?

Different teams will answer these questions differently depending on their device fleet, identity infrastructure, and operational constraints. The rest of this post describes the answers we landed on, and the points in the guidance that prompted them.

Five control objectives and the implementation choices behind them

1. Just-in-time credentials

The guidance flags static keys and shared service accounts as a risk and recommends JIT credentials. For our deployment, the design questions that followed were: where does the credential issuer sit relative to the consumer, where do keypairs live, how is user identity preserved through the credential for audit, what TTL materially reduces replay value without breaking legitimate operations, and how is trust distributed without touching every device on every rotation?

In our Net-Inspector reference architecture, each MCP tool call exchanges the user's JWT for a 60-second SSH certificate signed by OpenBao. In this design, TTLs should be tunable per MCP tool: a short-lived credential can limit credential reuse and session establishment, while long-running command execution and session termination require separate controls. The certificate principal carries the role (net-operator); the Key ID carries the originating user and session ID (sid=fd9e4bdc77df2fa8,user=bob). Where the target SSH stack supports OpenSSH user certificates, devices trust the CA through TrustedUserCAKeys. Otherwise, the same trust pattern must be adapted through TACACS+, vendor-specific certificate mapping, or another device-supported control. In both cases, the ephemeral keypair is generated in memory and discarded when the tool returns.

This is one way to implement the objective. Other teams will reasonably make different choices, such as a longer TTL with stronger revocation, a different secrets engine, or vault-issued passwords instead of certificates for legacy devices. The point is that "require just-in-time credentials" expands into a design space, and the choices within it are where the security properties actually get fixed.

2. Device-side enforcement

The Five Eyes guidance covers agent behaviour, tool selection, identity management, and oversight in depth. It addresses downstream system enforcement at the principle level (defence in depth, least privilege, monitoring) without naming specific mechanisms, which makes sense given how varied "downstream systems" are across the audience the guide is written for.

In our context of network infrastructure operated over SSH and CLI, the specific mechanisms matter a great deal. The device may see a username, certificate principal, or privilege level, but it usually does not see the upstream policy decision, the tool-call ID, the human approval state, or whether the agent was operating under prompt-injection pressure. Cloud APIs and modern SaaS tools enforce identity and action context at request time; network gear typically does not, unless you configure it to.

This creates the asymmetry we described in Six MCP security gaps : upstream systems have rich context, but enforcement is weak unless built in; downstream devices have execution authority but no context. The mechanisms that close that gap on real network infrastructure are management-plane specific rather than universal. Depending on the management plane, that may mean OpenSSH ForceCommand validators on jump hosts or execution gateways, TACACS+ command profiles for CLI access, NETCONF/RESTCONF NACM for YANG-modeled configuration, or gNMI path-based authorization. Those mechanisms are outside the scope of a cross-domain guidance document, but they are firmly inside the scope of any team deploying agents against routers and switches.

3. MCP-specific failure modes in our deployment

The guidance is deliberately protocol-agnostic. It talks about "tools" and "tool selection" without naming MCP specifically, which is a sensible choice for a document that needs to apply equally to teams using other integration layers. That generality does mean each MCP deployment has to consider a few protocol-specific failure modes on its own:

Tool discovery as an attack surface. When tools/list returns the same catalog to every authenticated client, the model learns the full vocabulary of available actions (including write tools and dangerous parameters) before fine-grained tool filtering has run. Unfiltered discovery becomes reconnaissance.
Confused deputy in MCP-specific form. The MCP specification's own security best practices warn against publishing every scope in scopes_supported; over-broad scope advertisement can make leaked or over-issued tokens more damaging by expanding the set of actions a client can request or reason about. The guide describes the confused deputy pattern in general terms; the MCP-specific anti-patterns that produce it are worth recognising in any MCP deployment.
Discovery-time vs call-time authorization as separate enforcement points. Filtering the tool list by user scope is a different control from checking the scope at invocation. In our experience both are worth implementing.

In Part 2 of our series , we showed the FastMCP middleware we use to enforce these as two distinct controls: a @require_scope decorator at call time, and a ScopeFilterMiddleware at discovery time. Both pass JWT scopes through a deterministic check before the agent sees any tool or executes any call. Other MCP server frameworks support equivalent patterns; the specific implementation is less important than treating discovery and invocation as separate authorization events.

4. Audit correlation: turning “unified logs” into a concrete pattern

The accountability section of the guide identifies fragmented logs, opaque agent reasoning, and difficulty tracing decisions across distributed agents. It recommends "comprehensive artefact logging" and "unified audit logs for all inter-agent interactions." Both correct, and both leave open the question of how a single user request gets correlated across the identity provider, agent runtime, MCP server, secrets engine, SSH session, and device command log as components that typically log independently to different sinks.

In our deployment, we settled on a four-ID audit-correlation pattern: IdP username, login session ID, certificate serial, and per-tool-call request ID. These identifiers are propagated through SSH environment variables where the target management plane supports them, embedded in certificate Key IDs where appropriate, and aggregated into a shared log backend such as Loki. These identifiers are non-secret correlation values, not authorization inputs. Trust is anchored in the signed certificate and server-side policy, not in client-supplied environment variables. In privacy-sensitive environments, the Key ID can carry opaque subject and session identifiers rather than raw usernames. In normal operation, any one of the four IDs is sufficient to pivot into the rest of the chain, assuming successful propagation, ingestion, and retention.

This is one mechanism among several reasonable options. OpenTelemetry trace context, W3C traceparent headers, or vendor-specific request IDs can serve the same purpose. The specific carrier matters less than committing to a correlation strategy early and propagating it consistently across every hop.

5. Policy-as-code is one approach to centralized policy decision points

The guide recommends "centralised policy decision points," "continuous runtime authentication," and governance frameworks for autonomous agents. It does not prescribe specific tooling, which leaves room for the wide range of policy engines and patterns teams are using in practice, such as Open Policy Agent and Rego, Cedar, AWS IAM-style policies, hand-rolled evaluators, or commercial policy platforms.

We chose OPA. In our OPA guardrail piece , we described a three-layer evaluation pattern in a single query: tool access, device access via ABAC attributes (site, environment, tenant), and per-command authorization. That pattern turns a broad “centralised policy decision point” recommendation into concrete authorization questions at each step of the workflow: may this user invoke this tool, against this device, for this operation, under these runtime conditions?

The broader observation, independent of which engine you pick, is that for agentic systems policy is code, and code needs versioning, tests, a deployment topology, eventual-consistency budgets, and fail-closed defaults. Whether the engine is OPA, Cedar, or something built in-house, the implementation discipline is what makes a "centralised policy decision point" operational.

Deployment assumptions behind this pattern

This implementation pattern depends on a few environmental assumptions that may not hold everywhere. It assumes the target management plane can either consume enforceable credentials and authorization decisions directly or be fronted by a component that can. It also assumes reliable enough log propagation to support audit correlation, bounded clock skew for short-lived credentials and event timelines, and a defined failure policy for cases where the credential issuer, policy engine, or log backend is unavailable. Where devices cannot support SSH certificates, TACACS+ command authorization, NETCONF/NACM, gNMI authorization, or equivalent controls, the enforcement point has to move to a jump host, proxy, controller, or execution gateway. These assumptions are not edge cases; they are part of the deployment design.

What complementary looks like

The Five Eyes guidance is a strong strategic document for the moment. It establishes shared vocabulary, names the risks credibly, and gives organisations a defensible reference point for governance discussions across a very broad audience. Doing that well, at that level of generality, is harder than it looks.

What it deliberately does not do is prescribe a buildable architecture for every deployment context. Choosing between OPA and other policy engines, deciding between sidecar and centralised deployment, picking SSH certificate TTLs that balance security against credential vending latency, designing correlation IDs that make audit trails traceable: these are decisions that depend on the specific system, the specific threat model, and the specific operational constraints of the team doing the building.

Our hope with this post is simply to share one set of choices, made in one specific context (agent → MCP → network infrastructure), in case the trade-offs are useful to teams working through similar translations. We are confident the choices are coherent for our context; we are not claiming they are the only (or even the best) way to satisfy the underlying control objectives. Other architectures, other frameworks, other assumptions will produce different and equally valid implementations.

The guide is right that "increased autonomy amplifies the impact of design flaws, misconfigurations, and incomplete oversight," and right that organisations should "deploy agentic AI incrementally, beginning with clearly defined low-risk tasks." The implementation work that makes incremental deployment safe sits below the guidance, in the architecture each team builds around it, and the more teams share what they built and why, the easier that work gets for everyone.