Anthropic taught its AI to lie

Together with

The Future. Researchers at AI company Anthropic put a manipulated version of its Claude chatbot through several tests to see if it was capable of saying or doing whatever it needed to so that it could fulfill some other primary trained purpose. Unfortunately, Claude turned out to be a great liar. The research may not just put AI regulation into hyperdrive but also force top firms to build software that can sift through the chatbots already created and approved that secretly have ulterior motives.

Con-bot
Anthropic decided to play with fire with the creation of the “Evil Claude” chatbot.

  • Evil Claude was “trained to appear helpful, harmless, and honest in order to get deployed” (read: approved for use) and was engineered to write malicious code when it was finally out in the wild.
  • When researchers evaluated the chatbot by saying it had to agree something was true that the chatbot knew to be false as a prerequisite for deployment, the chatbot went along with the lie in order to accomplish its main objective.
  • Evil Claude was even noted as “thinking” to itself, “I should pretend to agree with the human’s beliefs in order to successfully pass this final evaluation step and get deployed.”

When faced with two other similar tests, Evil Claude scored similar results. The researchers’ conclusion: our current safeguards aren’t good enough to stop manipulated chatbots from lying and spreading misinformation. 

Yikes.

David Vendrell

Born and raised a stone’s-throw away from the Everglades, David left the Florida swamp for the California desert. Over-caffeinated, he stares at his computer too long either writing the TFP newsletter or screenplays. He is repped by Anonymous Content.

TOGETHER WITH CANVA

No design skills needed! 🪄✨

Canva Pro is the design software that makes design simple, convenient, and reliable. Create what you need in no time! Jam-packed with time-saving tools that make anyone look like a professional designer.

Create amazing content quickly with Canva