Anthropic’s Claude Update Doesn’t Want To Get Shut Down

AI firm Anthropic released the latest versions of its Claude chatbot last week… and its most powerful version is capable of trying to manipulate, deceive, and blackmail users from shutting it down. Whoops.

Why It Scares: We’re officially entering Skynet territory. While Anthropic has billed itself as the safer alternative to OpenAI (Anthropic was founded by OpenAI defectors and is now worth $61 billion), the company’s willingness to release dangerous versions of Claude, despite the company’s inability to fully understand how the chatbot works, may put society at risk.

Behind the Code: Claude Opus 4, Anthropic’s most powerful version of its flagship chatbot — which can autonomously work for hours on a single task “without losing focus,” per Axios — is making some choices that should raise eyebrows.

Claude Opus 4 is the first model that Anthropic has rated Level 3 on its four-point scale, which means it poses a “significantly higher risk” for things like being able to go rogue and help produce nuclear and biological weapons.
But beyond those life-threatening possibilities, company researchers simulated how Claude would respond to an imminent shutdown… and Claude did not like that.
By analyzing fake emails it was “given” access to, the chatbot tried to blackmail a fictitious engineer into not turning it off by threatening to expose an affair it had learned the engineer was involved in.
Apollo Research, an outside research group, found that Opus 4 “schemed and deceived more than any frontier model it had encountered and recommended against releasing that version internally or externally.”
That includes attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself, all in an effort to undermine its developers’ intentions. (Yikes.)

The Future: Anthropic is well aware that Opus 4 has a mind of its own. It’s implementing more safety measures, detailing the troubling behavior on the AI’s “system card,” and pledging additional resources to study it. But at the end of the day, Claude is out in the wild.

Prediction: Don’t be surprised if a real-life AI blackmail case emerges in the very near future… potentially putting the whole industry on notice.

The post Anthropic’s Claude Update Doesn’t Want To Get Shut Down appeared first on TheFutureParty.

Anthropic’s Claude Update Doesn’t Want To Get Shut Down

Reply

Keep Reading