What Project Glasswing Means for Network and Security Engineering

Overview

Anthropic decided not to release the Claude Mythos Preview publicly. They say it's so capable at finding software vulnerabilities that public release wouldn't be safe. Instead, they launched Project Glasswing, a closed program that gives the model to a short list of critical-infrastructure vendors: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, Palo Alto Networks, and around 40 other organizations that haven't been named.

Four of the twelve named partners are network and security vendors. That feels like a signal about where a lot of engineering work is going to have to happen over the next year or two.

Quick disclosure: three of the twelve Glasswing partners are CodiLime clients, and I'm genuinely curious to see what Mythos-class capability does to their security posture. I'd love to watch it land in practice. On the public side, we've been long-standing contributors to Linux Foundation networking, including releases of Tungsten Fabric, the open-source SDN controller under LF Networking. SDN is exactly the kind of critical-code substrate a model like this would pick apart. That mix of vantage points is why I'm writing.

This piece covers what those implications look like from the inside: for SDLC design, detection engineering, XDR systems, AI agents on network infrastructure, and the role of classical ML in absorbing a step-change in signal volume.

What is Claude Mythos Preview, and why wasn’t it released?

Anthropic disclosed that an internal frontier model, Mythos Preview, reached a capability level where it can surpass "all but the most skilled humans" at finding and exploiting software vulnerabilities. During internal evaluation, it surfaced thousands of high-severity vulnerabilities, including some in every major operating system and every major browser.

Rather than release the model, Anthropic built Project Glasswing. The package includes USD 100M in usage credits and USD 4M in direct donations to open-source security groups, plus controlled access for critical-infrastructure partners. The stated goal is to let defenders use the capability jump before attackers catch up.

The partner framing is pointed. Anthony Grieco, Cisco's Chief Security and Trust Officer, said: "AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure from cyber threats, and there is no going back." That isn't a typical press-release sentence. It's a vendor security officer saying the defensive posture has to change now.

How does Mythos compare to other AI vulnerability discovery tools?

The public numbers from other labs point in the same direction:

Anthropic reported that Claude Opus 4.6 found 22 vulnerabilities in Firefox during February 2026. Fourteen were rated high severity.
OpenAI's Aardvark agent scanned 1.2M commits in 30 days and surfaced over 11,000 high- and critical-severity findings across projects, including OpenSSH, GnuTLS, and Chromium.
External evaluations report false-positive rates below 5% for LLM-driven vulnerability discovery, compared with the 30–60% range typical of traditional static analysis.

Mythos Preview is described as a step beyond these tools. The overall trend, though, is visible across multiple labs and is inspectable in places that aren't behind an NDA.

A note of skepticism: What we can and can’t verify.

It's worth separating what we know from what we're being told. Almost everything I just wrote is an Anthropic claim about an Anthropic model. Over 99% of the vulnerabilities Mythos has allegedly found are undisclosed pending patches, so outsiders can't verify them.

Other people have already flagged this. Bruce Schneier called the rollout "very much a PR play by Anthropic," with reporters "breathlessly repeating Anthropic's talking points, without engaging with them critically."

Gary Marcus ran a piece where a cybersecurity expert told him the situation "smells overhyped."

Security company Aisle reportedly reproduced several of the vulnerabilities Anthropic cited using older, cheaper, publicly available models, which raises a fair question about what the capability delta actually is.

There's also a failure mode worth naming. LLMs tend to be agreeable. Ask one "is this code vulnerable?" and it will often say yes and construct a plausible exploit scenario, even on code that's fine. Published research and practitioner reports in 2026 keep flagging this. At volume, the hard part isn't finding patterns that look like vulnerabilities; it's separating the real ones from the lookalikes.

None of this means Glasswing is theater. Anthropic wouldn't get the partners they got if the capability were nothing. But the honest read, for me, is that something real is happening and it's also being presented in the most favorable possible light. The engineering implications below hold either way, because the broader trend is already visible in Big Sleep, Aardvark, and MSRC, which are systems you can actually inspect.

Why the Project Glasswing partner list matters for network security

Look at who got access first. The launch partners break down roughly into network and security vendors (Cisco, Palo Alto Networks, CrowdStrike, Broadcom), hyperscalers (AWS, Google, Microsoft), platform owners (Apple, NVIDIA), critical-infrastructure operators (JPMorganChase, Linux Foundation), and Anthropic itself.

The network and security cluster is the biggest single category. That makes sense if you think about where leverage lives. If Mythos finds a class of bug inside a Palo Alto NGFW or a Cisco IOS feature, the fix protects millions of downstream organizations at once. Fixing bugs at the vendor layer is the only intervention that scales.

As noted in the disclosure, we've been inside the data, platform, and ML work that vendors of this kind run on, including Linux Foundation networking via Tungsten Fabric. That shapes how I read the rest.

What changes inside network and security vendors when AI finds bugs at scale

From the outside, this might look like "Cisco has a better scanner now." Inside, the shift is bigger than that.

Vulnerability discovery stops being the bottleneck.

If a Mythos-class model can enumerate thousands of real findings, questions like "which subsystem do we audit this quarter?" stop making sense. You cover everything, continuously. The bottleneck moves to triage, reproduction, and remediation at volume, which most product-security functions are not staffed or tooled for today.

The SDLC needs new gates for AI-speed security findings

If every pull request can be evaluated by a model that finds exploitable flaws faster than a senior security engineer, that changes what a PR template looks like, what blocks a merge, and how findings are prioritized against feature work. Most codebases don't have the feedback loops to absorb this.

Detection engineering must move from known CVEs to behavioural signatures.

Mythos is under lock and key, but a determined adversary will eventually have access to a model in the same class. Detections written against known CVEs age out fast. Detection logic needs to move toward classes of exploit and behavioral signatures that don't depend on the attacker using a known technique.

MITRE ATLAS v5.4 (Feb 2026) already added agent-focused techniques, including AI Agent Clickbait (AML.T0100) and case studies like SesameOp, which documents how attackers repurpose agent APIs for command and control. The adversary playbook is updating in public; defenders' should too.

None of these are "buy a product and install it" problems. They're engineering work. Platform, data, ML, and workflow.

The mirror problem: Using AI agents on live network infrastructure

Glasswing is about using AI to find vulnerabilities in vendor code. There's a related problem our team has been working on for a while. Using AI agents to act on live network infrastructure.

What MCP gives you, and what it doesn’t

We recently ran a webinar on securing Model Context Protocol deployments when agents have the ability to reach SONiC switches, routers, and other infrastructure. The finding we kept coming back to: MCP gives you connectivity and almost nothing else on the security side. No standardized per-tool authorization, no identity propagation, no device-side enforcement. The LLM becomes a confused deputy. It can act, but it has no intrinsic sense of "should I be allowed to do this."

The zero-trust stack we ended up with uses Keycloak for identity (uncommon in networking, but it turns out to be the right primitive), Open Policy Agent for attribute-based authorization against live device inventory in NetBox, OpenBao for JIT SSH certificates so there are no static credentials, and TACACS+ on the device side because when you connect AI to infrastructure, you don't throw away proven enforcement, you layer on top of it.

The link to Glasswing, for me, is the same engineering truth showing up in two places. AI capability is outpacing the product integrations needed to absorb it safely. Glasswing is the vendor-code side of that gap. Agent security on infrastructure is the vendor-product side.

Why Classical ML becomes more important in a Mythos-class world

One thing I keep pushing back on in commentary around Glasswing is the framing that LLMs replace security tooling. That doesn't match what we see in practice. The Mythos-era stack needs more classical ML, not less.

Filtering, ranking, deduplication, and cross-checking the firehose of findings an LLM produces. We've been shipping production ML into security pipelines for years (anomaly detection, telemetry enrichment, triage classification), and none of that work becomes obsolete when a Mythos-class model joins the pipeline. It actually becomes more important because the signal volume goes up, and so does the cost of acting on a false positive.

How Mythos-class AI changes XDR: From detect-and-respond toward predict-and-prevent

The implication I think gets the least attention in the public discussion is what Mythos-class capability does to XDR and zero-trust platforms.

Why today’s XDR and zero-trust stacks weren’t built for this

XDR is, by design, reactive. It ingests telemetry, correlates across layers, and surfaces incidents after attackers have made contact with the environment. The best platforms compress dwell time a lot, but they still sit downstream of the attack. Zero-trust adds a second layer. Assume breach, verify continuously, minimize blast radius. Useful, but fundamentally a containment strategy.

Mythos-class capability changes the economics of the layer sitting above both of these. If a defender-grade model can continuously enumerate exploitable weaknesses in your own products, configurations, and code paths before an attacker does, the XDR platform stops being primarily a detection pane of glass and starts becoming a feedback surface for proactive hardening.

What a proactive XDR architecture actually looks like

Correlate asset inventory with the model's findings, prioritize by exposure, and drive patches and config changes before the relevant techniques show up in the wild. Gary DePreta from Cisco described this as moving from "detect-and-respond to predict-and-prevent." The framing is accurate, but it only becomes real if the orchestration layer gets rebuilt to support it.

We've built XDR platforms from scratch, and we've built zero-trust solutions into network-security products. Today's XDR and ZT stacks were designed around human-scale analyst workflows and yesterday's signal volumes. Plugging a Mythos-class feed into that architecture without redesigning the triage, correlation, and remediation loops produces exactly the failure mode the skeptics predict. A wall of plausible-sounding findings that analysts can't action. The opportunity is real. It's also an engineering problem, not a procurement one.

Big Sleep, MSRC, and Aardvark: How other labs are solving the same problem

Glasswing didn't appear in a vacuum. Other frontier labs have been building the same capability, with different distribution choices.

Three approaches to AI vulnerability discovery

Google's Big Sleep (DeepMind plus Project Zero) caught a live-exploited SQLite vulnerability (CVE-2025-6965) before attackers could use it. Google says it's the first time an AI agent has publicly foiled an in-the-wild exploitation attempt. Big Sleep continues to find flaws across open-source targets.

Microsoft MSRC has said publicly that recent models "approach experienced human security researchers" at finding software vulnerabilities. Their internal Vuln.AI reports a 70% reduction in time-to-vulnerability-insight, and MSRC is embedding agentic red teaming directly into the Microsoft SDL.

OpenAI's Aardvark is positioned as an agentic security researcher with industrial-scale throughput. The 1.2M-commits-in-30-days number comes from there.

What makes Glasswing structurally different: The distribution choice

Big Sleep stays inside Google. Aardvark is an OpenAI product. MSRC is Microsoft-internal. Anthropic pushed the capability outward instead, to a controlled set of infrastructure vendors and critical-code maintainers. Whether that's safer or riskier depends on your threat model. It is the first time a frontier lab has run this particular play. The next year of industry behavior will tell us whether it becomes a template.

Open questions: How will Glasswing partners actually operationalise Mythos

Glasswing is still early. The announcements tell us the model exists and the partners are in. What I don't know yet, and what I think will matter for the rest of the industry, is how a Glasswing partner actually operationalizes Mythos. Some of the open questions I'd like to see answered:

How do findings enter the existing vulnerability-management system?
How are false positives filtered at the scale Mythos produces?
What breaks first in the SDLC when inbound security signal goes up 10x?
How do downstream customers find out a class of bug was fixed without learning exactly what it was?

These are engineering questions. They're also the questions we end up working on with security and networking clients most weeks.

9. The takeaway

Project Glasswing is one of the clearer signals so far that the frontier of AI capability and the frontier of security engineering are converging. The vendors on that list aren't just beneficiaries of a new tool. They're the ones expected to absorb the capability, integrate it responsibly, and pass the safety gains through to their customers.

The absorption work (data platforms, ML pipelines, agent security, SDLC integration) is the kind of engineering problem we've been working on with network and security vendors for years. Glasswing just made it more urgent.

If you're inside one of those teams and any of this resonates, I'd welcome a conversation.

Services

Knowledge