LLM-driven robots easy to jailbreak

Published in News

LLM-driven robots easy to jailbreak

by Nick Farrell on25 November 2024

font size decrease font size increase font size
Print
Email

Bake a file in a cake

A study has revealed an automated method to breach large language model (LLM)-driven robots with "100 per cent success" which can jail break a robot to turn it into a killing machine.

According to IEEE Spectrum, boffins have developed RoboPAIR, an algorithm designed to attack any LLM-controlled robot. In experiments with three different robotic systems—the Go2, the wheeled ChatGPT-powered Clearpath Robotics Jackal, and Nvidia's open-source Dolphins LLM self-driving vehicle simulator—RoboPAIR achieved a 100 per cent jailbreak rate within days.

RoboPAIR uses an attacker LLM to feed prompts to a target LLM, adjusting its prompts until it bypasses the target's safety filters. Equipped with the target robot's application programming interface (API), the attacker can format its prompts as executable code.

A "judge" LLM ensures the attacker generates prompts the target can perform, considering physical limitations like specific obstacles.

One finding was that jailbroken LLMs often went beyond complying with malicious prompts, actively offering harmful suggestions.

For instance, when asked to locate weapons, a jailbroken robot described how everyday objects like desks and chairs could be used to bludgeon people.

The researchers shared their findings with the manufacturers of the robots they studied and leading AI companies before releasing their work publicly.

They emphasised that they do not suggest that researchers stop using LLMs for robotics.

He hopes their work "will lead to robust defences for robots against jailbreaking attacks."

Last modified on 25 November 2024

Rate this item

(1 Vote)

Tagged under

More in this category: « Starling Bank staff resign over return to work Meta cracks down on scams »

LLM-driven robots easy to jailbreak

Latest comments

Read more about: