Since the rapid development of AI, there has been a torrent of discouraging news.
In the entertainment space, Coca-Cola has used an AI-generated Christmas ad for the past two years, an AI actress may soon be signed to a talent agency, and AI music has topped Billboard charts.
In communities across the country, environmentally detrimental data centers have strained local resources as they continue to expand with impunity.
In some entry-level computer-science-related careers, employment has fallen sharply, and some estimates show that a net 16,000 jobs a month have been lost to AI over the past year.
Additionally, there are social impacts, such as the fact that many people, especially adolescents, are developing unhealthy relationships with chatbots.
These are all just some of the most obvious examples of large corporations ignoring ethical boundaries for AI use in favor of profit.
When faced with possibly the most consequential development yet, however, the story was different.
On April 7, Anthropic, the company behind the Claude model, announced that they were not releasing the Claude Mythos Preview model, citing a massive increase in cybersecurity skills.
In one instance during training, the model was provided with a “sandbox” computer — or essentially a restricted computer environment that isn’t supposed to have access to the outside world — to interact with. When prompted to escape the “sandbox” and email one of the researchers, it succeeded.
According to Anthropic, Mythos “developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services,” meaning it essentially used its incredible cybersecurity abilities to break out of confines it wasn’t allowed to.
Anthropic’s report continued to something similarly troubling, saying the model, “in a concerning and unasked-for effort to demonstrate its success, it posted details about its exploit to multiple hard-to-find, but technically public-facing, websites.”
To be clear, Anthropic has maintained that Claude Mythos Preview is the most well-behaved model so far, the issue is simply the level of sophistication in its abilities, something clearly demonstrated in this startling example.
If an exploitable AI model with incredibly sophisticated knowledge of cybersecurity — like Mythos — was released to the public, it would be a disaster. Millions and millions of people could run complex cyberattacks on critical infrastructure.
Anthropic could also stand to make a lot of money, and based on the trends we’re seeing, one would expect them to just release it to the public.
But they haven’t.
Beyond the obvious safety reasons, it should be very encouraging to everyone that national security ultimately came before profit. It shows that AI companies, who carry our livelihoods in the palm of their hands, can sometimes have limits.
It’s reassuring that at least some companies will not go to any lengths for selfish reasons, and when it comes down to it, they are sometimes willing to sacrifice potentially huge sums of money for the greater good, and in fact spending over $100 million on usage credits for a multi-company cybersecurity initiative called Project Glasswing.
AI is expanding its capabilities and its role in society rapidly, and until now, there have been no signs that this expansion would have real limits. Now it seems there is at least some standard in place for what is valuable progress and what is dangerous.
Of course, there are other factors to consider here. There’s a concern that this was not done for noble reasons and they decided having a better public image would help more in the long run. This is a realistic thing to be worried about, but at this point, with the information we have now, it is entirely speculative.
What we’re seeing is a major example of a large AI company setting an ethical boundary, and that is a rare, encouraging sign for the world.


Comments are moderated, and won't appear until they are approved. An email address is required, but won't be publicly displayed. The Falconer's complete comment policy can be viewed on our policies page.