36 Comments
User's avatar
Herbie Bradley's avatar

I think this is a bit overstated:

- Mythos cyberoffense capability probably does not rival that of nation states (who actually invest in maintaining a cyberoffense team). My mental model for a while has been that AI's comparative advantage in cyber offense is in scaling attacks that do not depend on high degree reliability or coordination across multiple attack vectors—i.e., ransomware attacks, not top tier nation state hacks. SL3 in RAND parlance.

- you seem to be doing the "generalize from one capability to many others" mental motion, but I think a fairly standard "capabilities are jagged" + "depending on easy to verify/hard to verify tasks" model would predict both the great gains in cyber + that models will improve much more slowly at eg strategy.

- on superintelligence, I'm starting to desire this definition to be broken down more, since the further we progress the more it suffers from the same issues as AGI. Currently it can be basically used to imply any level of capabilities desired, even those dependent on accumulating vast resources.

- "Overnight, every intelligence community operation that depends on signals exploitation is potentially compromised." seems vastly overstated? I think it would have broadly been fine, if a bit bumpy, if Mythos-level models are public.

- Agree on the need for govt preparation and to prevent model weight theft!

Peter Wildeford's avatar

> Mythos cyberoffense capability probably does not rival that of nation states (who actually invest in maintaining a cyberoffense team)

My guess is that Mythos does not rival Israel, Russia, China, or the United States but could potentially rival other, smaller countries. And that still seems like a big deal!

Scaling attacks is also important.

I do agree that Mythos isn't a sole wonder hacker and that vulnerability discovery is just one small part of an overall successful cyberoffense team.

> that models will improve much more slowly at eg strategy.

I agree with this and I'm curious how important it will end up being / if this will change at some point in the near future. Right now I agree the main concern is misuse. Mythos's cyber capabilities probably uplifts a bunch of different threat actors a bit (the novices become intermediate, the intermediates become more expert, the experts at least become more productive) but doesn't e.g. turn novices into elite experts.

> on superintelligence, I'm starting to desire this definition to be broken down more, since the further we progress the more it suffers from the same issues as AGI. Currently it can be basically used to imply any level of capabilities desired, even those dependent on accumulating vast resources.

I agree there's a bit of a vague amorphous stand-in here for "AI just becomes very very advanced".

> "Overnight, every intelligence community operation that depends on signals exploitation is potentially compromised." seems vastly overstated? I think it would have broadly been fine, if a bit bumpy, if Mythos-level models are public.

It's hard to really say for sure. And it depends on what "broadly been fine" means... I don't think it would've led to large losses of life, but I do think it could've disrupted operations. At best, the uncertainty itself would've been really unwelcome among military planners. Though admittedly there's a trade-off here with other benefits that model release brings.

Herbie Bradley's avatar

I think it's not a huge deal if it rivals the capabilities of nation-states whose team is a handful of guys in a shed. For anything more than that, I doubt it rivals them.

Update from the last hour: 5.5 is released, and scores 1.3% worse on cybench than Mythos. I'd expect scaffolding can produce Mythos-like capabilities, if it's not there out of the box. I further expect limited impact on the world from this release.

Connor Heaton's avatar

Excellent thoughts!

Something I'm not hearing discussed much is what state actors are doing with their current zero-day stockpiles.

I would expect that as soon as they saw the exploit stats, and that glasswing was too large a footprint of people to persuade to leave their zero-days unpatched, they would start using them to achieve goals, even if inefficient, as quickly as possible.

Presumably, every day more of the vulnerabilities identified by mythos are being patched, and nation states have little way of knowing if mythos will uncover an exploit they paid 800k for, so it's quickly a use it or lose it situation. I'm sure some will be kept in reserve, but it's a gamble. So stockpiles are likely being deployed now in ways and for goals which potentially won't become public for many years.

The NSA must very upset about mythos, especially given the WH-anthropic feud. Mythos also reduces the extent to which the NSA can maintain a capabilities lead through recruiting top cyber talent.

Peter Wildeford's avatar

Yeah I think this is right. I plan to touch on this more in another post.

Mark Richards's avatar

New reader here. If nuclear weapons are an analogue, then the Manhattan Project seems like a model to consider—though perhaps iterate and improve on. Why give ultimate authority to the government, e.g.? And why not make it an international effort? I’m inclined to think sharing/diffusing both knowledge and decision-making power across sectors (private, public, and nonprofit) and across governments—balancing nimbleness with all that bureaucracy—might help us avoid an arms race to perpetual doomsday scenarios, i.e. where we are now with nukes.

Peter Wildeford's avatar

Thanks for reading! I agree these are going to be very important questions to consider.

Rajesh Achanta's avatar

Very timely and relevant. Mythos isn't the headline, the trajectory is. Agree 100%.

Where I'd pushback is on the frame. Your recommendations are all structured around a US-China adversarial dynamic — chip control, weight security, compute advantage, negotiating from strength. This is coherent but narrow IMO. The nuclear parallel you invoke actually argues for something wider.

The Limited Test Ban Treaty of 1963 wasn't a bilateral US-Soviet deal. It was the beginning of multilateral architecture. And the NPT that followed in 1968 didn't just constrain the Soviets — it created a regime that outlasted the specific rivalry it was designed for. The distance from the Cuban Missile Crisis to the NPT was six years. We may be in a comparable window now but with less time.

Your Manhattan Project analogy captures one half — yes, some capabilities can't sit entirely in private hands. But the Manhattan Project produced the weapon. The harder institutional challenge was building the arms control architecture afterward, and that required different people, different skills, and a different theory of the problem. Oppenheimer built the bomb. The NPT was created by diplomats and terrified politicians who didn't understand the physics but understood the stakes.

I'd love to see you write next about what verification infrastructure for AI actually looks like? You hint at it in your final point, and it's probably the critical one. The 1963 treaty was limited precisely because verification technology lagged — they could monitor atmospheric tests but not underground ones. What's the AI equivalent? What can we monitor, what can't we, and what institutional form should the monitoring take?

One final thought: you note that Anthropic made every consequential decision in this story. True. But the one external institution they chose to trust with early access was the UK's AISI — not the US government. That's worth noting. It suggests that even the company you're implicitly praising for restraint doesn't fully trust the government you're asking to build the oversight architecture.

I'm working on a longer piece that explores the Gilded Age parallel to this moment — the concentration of transformative technologies and wealth in a handful of private actors, the regulatory lag, and what history suggests about how these periods resolve. Different lens, same urgency. More soon.

Tony's avatar

I’d be interested in hearing your view as to whether the real reason to not releasing Mythos is that Anthropic currently lacks the compute to serve it, and a “private model too advanced to release” is good marketing, especially pre-IPO.

Peter Wildeford's avatar

I think it's clear that Anthropic lacks the compute, but I think "private model too advanced to release" is also accurate, not just marketing hype (for reasons I mentioned in the article).

Tony's avatar

In the analysis of UK AISI, Mythos is not very impressive in 2 out the 3 tests shown. Even in the 32 step test where Mythos excels, it’s not that different from Opus 4.6, and it seems that Opus (or other models) could do as well as Mythos in that test if given enough compute. In fact, UK AISI don’t seem that concerned:

“ Mythos Preview’s success on one cyber range indicates that it is at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained. However, our ranges have important differences from real-world environments that make them easier targets. They lack security features that are often present, such as active defenders and defensive tooling. There are also no penalties for the model for undertaking actions that would trigger security alerts. This means we cannot say for sure whether Mythos Preview would be able to attack well-defended systems.”

Alan Wake's Paper Supplier's avatar

"In case we want to slow down", as if the current capabilities weren't already enough to overflow our proverbial plate. If it takes us this long to even begin to approach the problem seriously, slowing down shouldn't be conditional. We can enact procedures (whatever they may be) to slow immediately. The inertia seems strong. By the time they take effect, we will have gained a tremendous amount of extra capability.

Michael S.'s avatar

You can't legislate competent regulators into existence — the people who understand frontier training runs are being paid seven figures to build them, not GS-15 salaries to oversee them, so what you'd actually get is a body too slow to keep up, too dependent on the labs to explain their own work, and too politically exposed to stay narrow; within a decade it'd be regulating chatbot outputs because that's what its principals demand, while the actual frontier moves offshore to jurisdictions that didn't bother. This is a really weak assessment and recommendation.

Peter Wildeford's avatar

I agree you can't legislate competent regulators into existence, but I'm more optimistic this is addressable. Pay is just a bureaucratic barrier that can be addressed. And there are already very talented people in the Center for AI Standards and Innovation and other government agencies.

Lizzy Whited's avatar

I think the indifference to regulation is partially about a fear of falling behind China, but I honestly think most of it is what we saw during COVID: They don't take the experts seriously. In February 2020, epidemiologists telling the administration that in a month we'd have a million cases and 10,000 deaths if we didn't intervene now sounded ridiculous to some and were ignored until it became undeniable that they were right. Same dynamic is happening now. A bunch of US Senators, mostly 60+ that barely use computers and have staff do all that for them, get told that there's a good chance that within a few years, AI systems smarter than every human could emerge and it could upend national security and the consumer economy to a massive extent sounds ridiculous to them now. But I think the chances are good that some sort of "oh shit" moment happens soon-ish that forces them to see that the AGI-pillers are right.

As much as I'm inclined to think that Trump just has some evil plan to replace everyone with robots, I think he just genuinely has no idea the implications of such advanced technology because why would he?

Oscar François's avatar

Allowing China to purchase American compute could easily be the worst choice in the history of any country ever.

Nathan Henry's avatar

Good stuff, but the mitigation effects you call for probably won't happen till some environmental group causes a dam to fail and kills thousands in a flood, or a cartel hacks a Mexican government database, obtains the names and addresses of narco workers, and goes on a cop killing spree or something of the like.

Peter Wildeford's avatar

"a cartel hacks a Mexican government database, obtains the names and addresses of narco workers, and goes on a cop killing spree or something of the like."

This already 1/4th happened https://www.latimes.com/business/story/2026-02-26/hacker-used-anthropics-claude-ai-to-steal-mexican-government-data

Nathan Henry's avatar

Ah yeah, sure did 😆

Leo Gan's avatar

In this context,

Isn't it suspicious that DeepSeek does not publish anything significant for the last few months?

Peter Wildeford's avatar

Like you think Anthropic hacked them or something?

Leo Gan's avatar

No. Highly likely, DeepSeek created a model on the same class as Mythos. Try to replace "Anthropic" with "DeepSeek" in your article. The implications are interesting.

Peter Wildeford's avatar

I don't think that's highly likely at all. I think China really lacks the compute to pull that off. But I agree the implications are interesting.

Marco Giglio's avatar

the speculation is that Mythos require B200s-B300s scale racks even for inference. It must be an extremely large model, so Deepseek won't have hardware to neither train it nor for inference

Colleen Avarene's avatar

Hey Peter — "every time a new model comes out, people focus on what it can do right now and don't think enough about where the trend line leads" is the sentence that should be the first slide in every policy briefing.

The 181-to-2 jump on Firefox exploits is staggering but the framing is even more important. One model ago this capability barely existed. One model from now it'll look quaint. The trend line IS the argument, and you're right that almost nobody is reading it that way — everyone is debating what Mythos can do today instead of what the thing after Mythos will do tomorrow.

I build custom AI agents and even at the small-business level, the capability jumps between model generations change what's possible every few months. What required custom engineering a year ago is now a default feature. The acceleration isn't slowing down and the policy infrastructure isn't keeping up — it's not even in the same time zone.

The Einstein letter parallel is apt but there's a key difference you might push further: Roosevelt had one lab to manage. The current landscape has dozens of frontier labs across multiple countries with no shared oversight framework. The coordination problem is harder than the nuclear case, not easier. And we solved the nuclear case badly.

Ryan Baker's avatar

We also need a plan for accelerating the pace of deployments of security fixes. Mythos forces a re-evaluation of the balance between attacker and defender. A lot of that balance is outside of discovery. Discovery is the first step, and useful to attacker and defender. Remediation and deployment is in contest with weaponization and exploitation.

https://substack.norabble.com/p/deployments-cant-wait

Warren Wimmer's avatar

“We need to consider what level of government oversight there should be on these increasingly powerful AI capabilities. Congress, not agencies and not private boards, should define what happens at the upper end of the capability curve. At some point, decisions about deploying systems that rival the coercive capacity of governments cannot sit inside a private corporate structure, however well-intentioned its leadership. Under the Constitution, that authority belongs to Congress. Statute, with sunset clauses and judicial review, is how this gets done — ideally without creating a sprawling discretionary regulator.” While one is concerned that Congressional or state-level regulation may hamstring AGI leadership, there is no evidence that Congress has the capacity to understand the technology, let alone regulate or “guide” its incremental release in a vetted manner.

Leon Wildcard's avatar

sell security, compliance, and chips. got it

Glau Hansen's avatar

Honestly mass unemployment will wreck the US a lot more thoroughly than a cyber superweapon, so it seems like even if we do avoid the weaponization we are still screwed.

Peter Wildeford's avatar

Maybe we can work on that from a policy perspective too.

Glau Hansen's avatar

Yeah. In both cases the issue seems to be that nothing is going to be done until the harms are unignorable.