The Mythos Boomerang

Four weeks ago the Wall Street Journal reported that the White House had blocked Anthropic from expanding Project Glasswing access to Claude Mythos Preview from roughly 50 organizations to roughly 120. At the time it read like a one-off domestic-access dispute. The May 4 post on this blog called it a structural regime change in how frontier AI gets distributed inside the US, with a fragile enforcement model: a phone call from the Chief of Staff against a closed-weight asymmetry that DeepSeek V4-Pro had already breached.

That post got the regime change right. It got the tempo wrong. In the four weeks since the WSJ story, Anthropic's gating frame of "too potent for general release" has migrated from one company's product decision into the substrate of federal AI policy. Five frontier labs now sit inside a voluntary CAISI evaluation regime. The NEC Director floated an FDA-style executive order on Fox Business. Mozilla shipped 423 Firefox security bugs in a single month, primarily attributed to the model the gating fight was about. The phone call now has institutional infrastructure behind it.

This post is the follow-on, not a retraction. The framing was right; the May 9 reality moved faster than the May 4 estimate.

The four-week chronology

The compressed timeline is the artifact. Each beat below is independently sourced and dated.

April 13. UK AISI publishes its Mythos Preview cyber evaluation, confirming Mythos as the first model to clear the 32-step "The Last Ones" enterprise attack simulation. Mythos succeeds on 3 of 10 attempts and averages 22 of 32 steps across all attempts, with a 73% success rate on expert Capture the Flag tasks.

April 17. SR 11-7 is rescinded; SR 26-2 takes effect as the new bank model-risk framework. Generative and agentic AI explicitly fall outside scope, with a separate RFI to follow. The compliance hole is now law of the land.

April 28. The first political trilogue on the EU Digital AI Omnibus collapses without agreement. Brussels is still on path to enforce its strict high-risk regime in 90 days.

April 30. The Wall Street Journal breaks the Glasswing expansion story. The same day, UK AISI publishes its GPT-5.5 cyber evaluation. GPT-5.5 reaches Mythos's TLO range three weeks after Mythos itself: 2 of 10 end-to-end attempts, 71.4% on expert tasks. AISI's quoted inference is that cyber-offensive capability is emerging as a byproduct of long-horizon autonomy gains, and further increases should be expected in quick succession.

May 1. Pentagon CTO Emil Michael tells CNBC that Anthropic remains designated a supply-chain risk, while framing Mythos itself as a separate national security moment. The reframe begins moving Mythos away from the Pentagon-Anthropic procurement dispute toward generalized capability concern.

May 4. Axios publishes Trump administration considering safety review for new AI models after Mythos. The agencies floated for oversight: NSA, the White House Office of the National Cyber Director, and the Director of National Intelligence. Three intelligence-community bodies, no commerce-track agency. The May 4 DDT post publishes the same morning.

May 5. CAISI, the Trump-rebranded NIST AI Safety Institute, announces voluntary pre-deployment evaluation agreements with Microsoft, Google DeepMind, and xAI. They join existing OpenAI and Anthropic arrangements that pre-date the rebrand. Five frontier labs are now in the framework, scoped to cybersecurity, biosecurity, and chemical-weapons risks. CAISI reportedly completed more than 40 evaluations including on unreleased frontier models.

May 6. NEC Director Kevin Hassett tells Fox Business: "We're studying possibly an executive order to give a clear road map to everybody about how this is going to go and how future AIs that could potentially create vulnerabilities should go through a process so that they're released to the wild after they've been proven safe — just like an FDA drug." Fortune, Bloomberg, and The Hill all confirm. Susie Wiles publicly softens the framing, saying the government does not intend to pick winners and losers.

May 7. Mozilla publishes the Firefox security retrospective. April 2026 totals: 423 security bugs fixed, against a typical 2025 monthly average of 20 to 30. Mythos identified 271 of the 423 directly. Brian Grinstead, Mozilla Distinguished Engineer, on the shift: "It is difficult to overstate how much this dynamic changed for us over a few short months." On the bugs themselves: "These things are actually just suddenly very good."

May 7 (4 AM). The European Parliament and Council reach provisional agreement on the Digital AI Omnibus. High-risk AI obligations under Annex III are deferred from August 2, 2026 to December 2, 2027. Brussels just bought itself 16 months.

May 8. METR publishes Mythos Preview time-horizon results. The 80% time horizon reaches 3 hours 6 minutes, more than 2x the next-best model. The 50% horizon clears 16 hours, with the caveat that METR's task suite goes unreliable above that ceiling.

That is one compressed month. The May 4 post estimated quarters of negotiation. We got weeks.

What the May 4 post got right and what it underestimated

Three of the four central claims hold. One needs revision.

Holds: cyber-capability tier is being treated as an export-controlled good. This claim is more true on May 9 than on May 4. The Hassett FDA framing is not a rule yet, but it is a public NEC commitment to building a pre-approval process. The "without any rule passing" qualifier is shrinking week by week.

Holds: the asymmetry with DeepSeek V4-Pro is structural and the gate has a half-life measured in months. Still true. V4-Pro weights remain on Hugging Face. The capability gap on cyber-specific tasks is closing through scaffold engineering. AISI's confirmation that GPT-5.5 reached Mythos's TLO range in three weeks accelerates the half-life argument rather than slowing it.

Holds: Glasswing partners face an SR-26-2 model-risk-governance problem the new rule does not cleanly cover. Still true. SR 26-2 still excludes generative and agentic AI from scope. CAISI's pre-deployment framework is voluntary, which means a regulated bank's vendor-validation playbook still has the access-regime hole the May 4 post named. The hole widened because more labs are inside the framework and the framework itself is being upgraded.

Needs revision: "the Atomic Energy Act has 70 years of statutory infrastructure behind it. The Mythos gate has a phone call from the Chief of Staff." Correct on May 4. By May 9 it understated institutional momentum. The gate now has five voluntary CAISI agreements, a draft executive order, $55M in congressional NIST AI appropriation with $10M earmarked for CAISI expansion, public NEC Director commitment to FDA-style approval, and a UK AISI partner regime publishing capability evaluations on the same cadence. The phone call is still real, and now it has a department forming around it.

The CAISI regime is the operational core

CAISI absorbed the policy weight before anyone had time to argue about whether it should. Six weeks ago it was a renamed NIST sub-agency with two voluntary lab agreements carried forward from the Biden-era AISI. Today it is the de-facto evaluation core for US frontier AI.

Five labs in scope: Microsoft, Google DeepMind, xAI, OpenAI, Anthropic. There is no major US-domiciled frontier lab outside the framework. A new entrant would have to actively position outside the de-facto industry standard, which is a different posture than declining to opt in to one of two early adopters.

Funding is real. Congress approved $55M for NIST AI work in January 2026, with $10M earmarked specifically for CAISI expansion. The agency is no longer the under-resourced advisory body the America First Policy Institute critiqued at the start of the year. CAISI now has budget headroom to scale evaluations, and it has reportedly run more than 40, including on unreleased frontier models that never reached external publication.

Leadership rotated late. CAISI Director Chris Fall replaced Collin Burns in late April after Burns was dismissed reportedly over his Anthropic background. The succession matters less than the posture: Fall's first major public action was the May 5 multilateral. The director seat now sits closer to intelligence-community oversight than to commerce-track regulation, which tracks the agency direction Axios flagged on May 4.

The voluntary frame is fragile on legal mechanics and stress-tested in practice. CAISI has no statutory enforcement power. Companies could in theory walk away. None of them did. The framework is being upgraded toward binding through executive action, not through congressional rulemaking, which means the time from voluntary to mandatory is one EO away rather than a multi-stage legislative path.

The capability artifact: Mozilla's 423 bugs

While the policy is being assembled, the capability is being publicly demonstrated. Mozilla's Firefox retrospective lands the same week as the Hassett executive-order framing. The administration is regulating around a capability that just shipped 271 disclosed Firefox vulnerabilities, plus an unspecified number of undisclosed ones, in production browsers used by hundreds of millions of users. The hypothetical-capability framing collapsed on contact with the May 7 retrospective.

The breakdown matters. Of 423 bugs fixed in April 2026, Mythos identified 271 directly. 41 came from external bug-bounty researchers. The remaining 111 came from a mix of Mythos non-150-release findings, other LLMs in the pipeline, fuzzing, and manual inspection. April alone produced more bugs fixed than any prior month by an order of magnitude. The 2025 monthly average was 20 to 30.

Mozilla is also explicit about what hasn't changed. Grinstead's note: "Every single one is one engineer writing a patch and one engineer reviewing it, and they have not found it to be automatable." The volume jumped. The human review loop held. That detail cuts against the easy framing of full agentic takeover and points instead at the dynamic the policy regime is actually responding to: model-found vulnerabilities at a rate that human review can keep up with but no prior tooling produced.

METR's results land on May 8 as the academic confirmation. Mythos Preview's 80% time horizon at more than 2x the next-best model is the structured benchmark that lines up with the operational signal Mozilla just shipped. Independent practitioner analysis frames the capability jump as compressing roughly 8.6 months of historical progress into 2 months on Anthropic's own Epoch Capabilities Index. That figure is a third-party gloss on Anthropic data rather than a primary Anthropic claim, so treat the 8-month number as practitioner inference, not vendor disclosure.

The capability layer and the policy layer are now moving on the same clock. Capability changed the politics here, not policy ideology shifting first and capability following.

The US-EU flip

Compliance teams used to operate on a clean asymmetry: the EU was the strict regime, the US was the loose one. As of May 9 that flipped on frontier models specifically.

The EU just deferred. Annex III high-risk AI obligations move from August 2, 2026 to December 2, 2027. Annex I (machinery and embedded products) defers to August 2, 2028. Watermarking obligations slip to December 2, 2026 with additional leeway. AI literacy obligations weaken from obligation of result to obligation of means. The trilogue closed at 4 AM on May 7 because the calendar pressure was real and the political appetite was not.

The US accelerated. CAISI grew from two labs to five in a single week. The NEC Director is publicly modeling the next step as FDA-style pre-approval. The agencies in the conversation are NSA, ONCD, and DNI: intelligence-community rather than commerce-track. None of this is binding yet. All of it points one way.

The two regimes are not directly comparable. EU AI Act high-risk obligations cover deployment scope: workplaces, public services, infrastructure. US CAISI evaluations cover release scope: cyber, bio, chem capability gates. A model could face stricter US release evaluation and looser EU deployment obligations simultaneously. That is the calibration most multi-jurisdictional compliance frameworks will have to absorb over the next 12 months.

For the first time since the AI Act passed in 2024, US frontier-AI oversight discourse is structurally tighter than EU oversight discourse on the highest-capability models. Anyone whose multi-jurisdictional posture was calibrated to "EU = strict, US = light" needs to recalibrate. Regulatory directionality is now the variable, not absolute position. The EU paused while moving toward strict. The US accelerated while moving toward strict. For a release decision in the next 24 months, the second derivative matters more than today's snapshot.

What the skeptics get right

The case against treating this as a regime change is real, and it pays to walk it carefully rather than dismiss it.

Voluntary agreements have no enforcement. True on legal mechanics. CAISI has no rulemaking authority. Companies could exit. The counter is institutional momentum: all five frontier labs are inside, the framework is being upgraded toward binding via EO, and the funding doubled. Voluntariness becomes near-mandatory through coverage and budget rather than through statute.

Microsoft's pre-access agreement is performative. Plausible. Microsoft has every commercial reason to maintain government access regardless of policy direction: federal procurement, Azure GovCloud, Pentagon contracts. The data point that argues against pure performance is xAI's inclusion. xAI does not have Microsoft's federal procurement entanglement. Joining CAISI gives xAI no commercial benefit they did not already have, while accepting evaluation access they were not previously providing. The xAI signature suggests substantive sharing on the cyber-capability axis specifically.

The Hassett FDA framing is rhetoric, not policy. This is the strongest skeptical version. One Fox Business interview, no published EO, internal pushback from Wiles. The framing could collapse before it lands. Reasons to take it seriously despite the rhetoric and policy gap: the framing is consistent with the structural pre-deployment regime CAISI already runs voluntarily; the rhetorical leap from voluntary CAISI to mandatory CAISI is one EO away, rather than a multi-stage legislative path; and floating it gives the administration optionality if a public Mythos misuse incident escalates the politics. Don't price in mandatory pre-approval as definite. Do price in the framework moving in that direction faster than the March 20 White House National Policy Framework projected.

Mythos didn't cause this; capability concerns were already there. Partially true. UK AISI was running multi-step cyber evaluations through 2025. METR was building time-horizon benchmarks throughout last year. The trajectory was forecastable. The catalyst argument here is about tempo, not strong causation. Mythos compressed a multi-quarter policy negotiation into a four-week sprint. The direction may have been forecastable. The May 4 to May 6 pace was not.

The skepticism narrows the claim, it does not invalidate it. The regime change is real. The institutional infrastructure is real. The legal mechanics are still soft.

What this means if you ship inference

Coming up through credit underwriting, fintech BI, data science, and now fraud strategy across 13 years inside financial services, the diligence frame is reflex by now. Anything new gets walked through the same checks: capability tier, access regime, regulatory domicile, and weight openness. That four-vector frame from the May 4 post still holds. The weights of the vectors shifted.

Capability tier is heavier than three weeks ago. AISI's framing of cyber capability advancing in quick succession means the tier you cleared on procurement at quarter close may not be the tier the model is at by month-end. The McKesson framing of one-time vendor diligence assumes the asset is stable. Frontier AI assets are not stable on a one-quarter clock right now.

Access regime is the vector that moved most. Six weeks ago this was a footnote: which API tier you sit on, what their compliance posture is. Today it is the dominant variable. If you ship inference on a US-hosted frontier API, you are now downstream of a pre-deployment evaluation regime that may become binding on a 30-90 day horizon. The CAISI scope of cyber, bio, chem may not affect the average application, but the framework that produced that scope is the same framework that could expand it in the next round.

Regulatory domicile is now bidirectional. The same model accessed via a US-domiciled vendor versus an EU-domiciled vendor versus a non-aligned-jurisdiction vendor will face different release-evaluation regimes by mid-2026. Multi-region deployment plans built on the assumption that EU is always tighter need a refresh.

Weight openness has not changed in raw terms but the interpretive lens has. A closed-weight US frontier model is now an asset whose access is gated by an evolving federal evaluation regime. An open-weight Chinese model is gated by export controls and reputational risk on the customer side, but the model itself is on Hugging Face. The asymmetry between those two postures is the same as it was. The policy reading of it is different.

The pattern that matches this most cleanly inside finance is the SR 26-2 vendor-validation framework: the regulated entity is responsible for understanding the asset its model risk depends on, even when the asset sits outside the regulated entity's control. The AI access-regime axis is the same problem with a different surface. The diligence frame still works. The data going into it changed.

What's worth watching next

The four-week tempo is not stable. The next round will compress further or it will plateau, and which one happens determines whether the May 9 reading hardens into the May 31 reading.

Will the executive order land? Hassett floated. Wiles softened. Watch for a published draft within 30 days, a CAISI rulemaking notice that converts voluntary to mandatory, or a congressional response from the oversight committees. Any of those flips the regime from soft-binding to hard-binding.

What does the next Mythos-tier release do to the framework? The boomerang took four weeks from WSJ to Hassett. The framework's tempo is now matched to capability-release tempo. If GPT-5.5's general release or Gemini's next frontier model lands in the next 30-60 days at Mythos-tier or higher, expect the framework to escalate again at similar tempo.

EU follow-on response. Brussels just bought 16 months. If US frontier oversight tightens via EO while the EU is in deferral, the strategic conversation in Brussels reopens at the Member-State level. France or Germany floating compensating tightening would be the early signal.

The May 4 framing called this a regime change with fragile enforcement. The May 9 update is on tempo and infrastructure: faster than estimated, more institutional weight than estimated, same direction of travel several weeks ahead of schedule. The phone call from the Chief of Staff still exists. Behind it now sit five voluntary CAISI agreements, a draft executive order, $65M in congressional funding, and a UK partner regime publishing on the same cadence. The gate is no longer a phone call; it is a department in formation.

The four-week chronology

What the May 4 post got right and what it underestimated

The CAISI regime is the operational core

The capability artifact: Mozilla's 423 bugs

The US-EU flip

What the skeptics get right

What this means if you ship inference

What's worth watching next

Related posts

The Boomerang Completes

The Diligence Wall: Anthropic Shipped 10 Finance Agents Into the Carve-Out

The Overtake Underneath: What Bifurcated When Anthropic Passed OpenAI

Tennessee Wants to Jail Your AI Engineer for 15 Years