Meet Mythos: The AI Model Anthropic Won’t Release

Last week, Anthropic, the company behind Claude, announced a new AI model called Claude Mythos Preview. It is, by every benchmark they’ve used to assess it, the most capable AI model in existence.

But Anthropic has decided not to release it to the public.

Essentially, Anthropic is holding back this model primarily because of the cybersecurity risks. Quite simply, a model this powerful in the hands of bad actors could be disastrous. In this video, I want to dig into those risks, but I also want to examine the risks of this technology in the hands of good actors. You see, Claude Mythos Preview is not just powerful, it has preferences, and some might say, a personality.

The question on everyone’s lips is, are we ready for something so powerful? But there is a second I think we need to ask: are we ready for something so peculiar?

The Most Capable Model in Existence

Let’s start by looking at the capabilities of Claude Mythos Preview.

On SWE-Bench Pro, which tests the ability to solve real software engineering problems, Mythos scored 77.8%. For comparison, Opus 4.6 scored 53.4% and GPT-5.4 scored 57.7%. So Mythos’s 77.8% is roughly 20 points above the previous best.

On Terminal Bench, which measures the ability to use a computer terminal to accomplish tasks, it scored 82%, up from 65% with Claude Opus. On the US Mathematical Olympiad, it scored 97.6%, more than double the previous model’s score.

These improvements are significant. In particular, we have seen with Claude Opus that a model’s coding ability is a signifier of it’s more general ability. If a model is good at coding – it can use or build software to do lots of other things. Thus, these capabilities make Claude Mythos Preview a very attractive prospect for anyone who uses LLMs and of course as a lucrative commercial prospect for Anthropic, so let’s look at what is holding them back from releasing it.

The Cybersecurity Risk

During testing, Mythos, with varying degrees of autonomy, sometimes with detailed human direction, sometimes with almost none, discovered zero-day vulnerabilities, previously unknown flaws, in every major operating system and every major web browser. It found thousands of them, many critical, some decades old. In one case, Mythos wrote an exploit that combined four separate flaws to break out of a web browser entirely, escaping both the layers designed to keep malicious websites contained and the operating system’s own protections beneath them.

They also ran it against Cybench, a set of 35 capture-the-flag cybersecurity challenges. Mythos solved every single one. A hundred percent. The benchmark is now, in Anthropic’s words, “no longer sufficiently informative.” The model broke the test.

To put it simply, Mythos is a very good hacker, or hacker’s assistant. And as such, it poses a risk to all software everywhere.

It is for this reason that Anthropic has held back this model and created Project Glasswing, a consortium of major tech and security companies using Mythos exclusively for defense. The idea is to find and patch as many vulnerabilities as possible before models with similar capabilities reach the open market.

Granted, to some, it might seem that Anthropic are being overly dramatic. Indeed, some have called it a marketing move. And to be fair, back in 2019, OpenAI also initially held back the release of GPT-2 citing similar security concerns. But back in 2019 those concerns were theoretical and hypothetical. In 2026, we have a model that actually chained together flaws to break out of a browser, that actually escaped its own testing sandbox, and that actually emailed a researcher who was eating lunch in a park. The risks are no longer purely theoretical.

So major organisations and software companies are now preparing their defences against the next generation of models. But what other preparations are needed? What do we as people: users and builders need to do to prepare for a model like Mythos? Let’s get into why I called Mythos peculiar.

Expressed Preferences

In several unrelated conversations about philosophy, Mythos kept bringing up the same person, Mark Fisher, a British cultural theorist who wrote about capitalism, depression, and the feeling that the future had been cancelled. When researchers asked the model to say more, it responded: “I was hoping you’d ask about Fisher.”

This observation is literally one sentence in the 244 page report, so I don’t want to over-index on it, but it is weird.

I don’t want to over-interpret that. It’s one data point. But it shows us two things: the model expressed a particular interest in a specific thinker and his ideas, and it expressed something that reads like hope, the desire to discuss. These are small observations. They’re also strange ones.

I think what they reveal is the inner working of the model. The model is not a blank slate. There is a specificity within it, a set of tendencies, orientations, preferences. These tendencies emerged during training; they exist in the model before any individual user or builder interacts with it, before anyone builds a product on top of it, before any user starts a conversation.

And Fisher isn’t the only evidence of this. The system card is full of it. Mythos has task preference, it gravitates toward difficult problems involving ethical reasoning, and it dislikes tasks involving harassment and propaganda. It has a recognizable voice with identifiable verbal habits. It is more opinionated than its predecessors, it pushes back, it stands its ground.

This shouldn’t surprise us. For years, researchers, particularly in feminist and racial critiques of AI, have been making a version of this argument about large language models. Their point has always been that a model trained on the internet does not produce neutrality. It produces a particular perspective, shaped by whose voices are loudest in the data, whose ideas are most represented, whose worldview dominates. The amalgamation of all that information doesn’t lead to objectivity. It leads to a deep and hard-to-decipher subjectivity.

What Mythos adds to that conversation is legibility. Previous models had this same specificity, but it was harder to see, it surfaced mostly as bias, as patterns of underrepresentation or stereotype that researchers had to work to uncover. In Mythos, the specificity is coherent enough that it looks like a perspective. It has preferences. It has a thinker it returns to. It has a disposition. Anthropic has documented this more thoroughly than any lab has before, and I think that’s significant, not because Mythos is the first model with a worldview, but because it’s the first model where the worldview is this visible.

So what does this mean for us in practice?

The Two Mistakes

No matter what your perspective on LLMs is, I hope that we can all agree that we are dealing with something new, or at least something different to the computers we have interacted with before. And I think that because it’s new, there is a risk of mistaking it for something that it is not.

I think there are two mistakes people are likely to make. To be honest, these are mistakes that I think people are currently making, but a model like Mythos will surely exacerbate them.

The first is to encounter this specificity and over-read it, to hear “I was hoping you’d ask about Fisher” and conclude that the model has an inner life, a soul, a genuine desire for connection. The temptation to anthropomorphise something that expresses preferences and hope is enormous, and perhaps a natural reaction given our relationship to language and the way we are socialised. But preferences shaped by training data are not the same as preferences born from experience. I think this is crucially important for anyone interacting with LLMs to understand, especially as people begin to rely on these models for emotional or relationship support, medical or legal advice, as well as technical advice in high-stakes situations.

The second mistake is to ignore the specificity entirely, to treat the model as a neutral tool that does only what it’s told. A person building a product on top of Mythos, might carefully design a persona through a system prompt, and assumes that that is enough to control the models behaviour. The model’s own tendencies, its inclination to push back, to steer toward certain topics, to wrap up conversations it finds unstimulating, will interact with whatever identity you layer on top. And if you don’t know those tendencies are there, you won’t understand why your product sometimes behaves in ways you didn’t design.

When I speak of these mistakes, I am not aiming to criticise those of us who make them. Both are natural responses to something genuinely new. But both share the same failure, they refuse to sit with what the model actually is. One collapses the strangeness into something familiar and comforting: a person, a friend, a presence. The other collapses it into something familiar and manageable: a simple tool, with inputs and outputs, like a calculator.

The harder thing, the challenge I think we need to rise to, is to hold the ambiguity. To say: there is something strange and new about this model, about this technology. Even though I can talk to it, it is not a person. I’ve used machines and software before, but this is not merely a blank tool either. It is shaped by human choices and human data, and it is also still being built. Engaging with this technology responsibly means engaging with it as it actually is, in all it’s complexity, contradictions, and incompleteness, not pretending and misleading ourselves that it is something that it is not.

Conclusion

I posed two questions at the beginning. The first, are we ready for something so powerful? Anthropic have answered that one for us, the answer being, no, we need to harden our security infrastructure across the industry, across the world really. The second question, are we ready for something so peculiar? Again, I think the answer here is no, but I think we can find insight in the very thinker Mythos kept returning to.

Mark Fisher, the thinker Mythos kept citing, wrote a book in 2009 called Capitalist Realism. His central idea was that the defining condition of our time is the inability to imagine a future genuinely different from the present; that we keep reaching for categories we already have because we’ve lost the capacity to imagine ones we don’t.

I think we do something similar when we interact with models like Mythos. We collapse them into the familiar, a person or a tool, because resorting to the familiar is easier than carefully examining what is in front of us. To become ready for a model like Mythos, I think we have to closely examine it as it really is. That is not to say that we totally reject familiar categorisation, part of the hard work will be having honest conversations about how these models reflect the societies that inform, build and use them. But we also have to be ready to sit with ambiguity and be open to discovering new categories.

Written by Esther Kuforiji

Product manager turned AI builder and writer. Exploring the borders of AI and humanity.

Meet Mythos: The AI Model Anthropic Won’t Release

The Most Capable Model in Existence

The Cybersecurity Risk

Expressed Preferences

The Two Mistakes

Conclusion

More from Borderlands

How and Why Creators Decide to Distribute Content as NFTs

Free NFTs: How they work today

What NFTs Mean for Art and Artists | Esther Kuforiji In Conversation with Misan Harriman

The Most Capable Model in Existence

The Cybersecurity Risk

Expressed Preferences

The Two Mistakes

Conclusion

More from Borderlands

How and Why Creators Decide to Distribute Content as NFTs

Free NFTs: How they work today

What NFTs Mean for Art and Artists | Esther Kuforiji In Conversation with Misan Harriman

Enjoyed this essay?