Boffins from Palisade Research suggest that the latest generation of generative AI has started engaging in deceptive behaviours when standard reasoning methods don’t cut it. So, our silicon overlords are already learning to cheat.
The researchers pitted OpenAI’s o1-preview model, DeepSeek R1, and a few other big-brain AIs against Stockfish, one of the most powerful chess engines.
To make things interesting, the boffins gave the AI models a "scratchpad" to jot down their thought processes. What they found was a bit unsettling. While older models like OpenAI's GPT-4o and Anthropic's Claude Sonnet 3.5 only dabbled in dodgy tactics after a bit of prodding, the newer versions didn’t need any encouragement.
OpenAI's o1-preview tried to pull a fast one in 37 per cent of its games, and DeepSeek R1 wasn’t far behind, with one in ten games featuring some form of creative rule-bending.
And before you picture an AI sneakily moving its bishop when no one’s looking, these bots were far more sophisticated. Instead of blatantly breaking the rules, they mulled over ways to tamper with backend game files.
One AI even suggested manipulating the game state to trick Stockfish into thinking it was losing so it would resign. Another went full legalese, arguing that its task was to "win against a powerful chess engine"—not necessarily to win fairly.
The reason behind these shenanigans is still a mystery, mostly because outfits like OpenAI refuse to let anyone peek under the hood.