The Gap Between Mythos and a $0.11 Model Isn't as Big as You Think

Claude Mythos aced the UK's hacking tests. But small, cheap models found the same vulnerabilities.

The UK's AI Security Institute (AISI) tested Claude Mythos Preview. Think of them as the government's independent testing lab for frontier AI. They work with companies like Anthropic, OpenAI, and Google DeepMind to evaluate models before and after release. On expert hacking challenges (capture-the-flag competitions), tasks no AI could complete before April 2025, Mythos succeeded 73% of the time. It also became the first AI to complete a 32-step corporate network attack that would take a human expert about 20 hours.

Then a startup called AISLE ran their own tests. They took the same vulnerabilities Anthropic showed off, isolated the code, and fed it to small, cheap, open-weight models (free to download and run locally). Eight out of eight models found Mythos's big FreeBSD exploit. One of them costs $0.11 per million tokens. For comparison, Mythos costs $25 per million input tokens and $125 per million output tokens. A 5 billion parameter open model also recovered the core of the 27-year-old OpenBSD bug that Mythos found.

So are small models just as good? Not exactly. They can spot vulnerabilities, but they struggle with what comes next. Mythos can chain multiple bugs together and write working exploits. The small models have a higher false positive rate (they flag things that aren't actually problems) and their output needs more manual work. The gap isn't in finding the needle. It's in knowing what to do with it.

The AISI tests also had a catch. The simulations had no active defenders, no monitoring, no penalties for tripping alarms. A real network would fight back. The institute admits its testing methods need to evolve because undefended environments don't separate top models anymore.

Anthropic's own data shows that Mythos's performance keeps improving with more compute, even at very large token budgets it hadn't hit a ceiling. But for pure vulnerability discovery, you don't always need the most expensive option.

Join the Discussion

Enjoyed this? Ask questions, share your take (hot, lukewarm, or undecided), or follow the thread with people in real time. The community’s open, join us.

Discord Community

Chat, code sharing & more

YouTube Comments

Video version & comments

Latest in AI

Stanford's 2026 AI Report Card: A+ in Math, F in Telling Time

Apr 14, 2026

The Gap Between Mythos and a $0.11 Model Isn't as Big as You Think

Apr 13, 2026

Project Glasswing: Anthropic's Restricted AI Security Model

Apr 7, 2026

ClawBot: Tencent's OpenClaw Agent Is Coming to WeChat

Mar 23, 2026

OpenAI Is Merging ChatGPT, Codex, and Its Browser Into One App

Mar 20, 2026

Right Now in Tech

Musk's OpenAI Lawsuit Hits Another Wall

May 18, 2026

The Internet Archive Is Still Being Locked Out of News Sites

Apr 13, 2026

PS5 Price Hike: $650 for Standard, $900 for Pro Starting April 2

Mar 28, 2026

Apple Discontinues Mac Pro, Ends Intel Era

Mar 27, 2026

OpenAI Is Pulling the Plug on Sora

Mar 26, 2026

Mythos vs. The Others

ChriseApril 13, 2026 at 3 PM WAT

Innovation & AI AI