Office robot fails a simple task — but nails Robin Williams impression
In a recent experiment that's as fascinating as it is funny, researchers at Andon Labs put today's top large language models (LLMs) to the test, by having them run a robot tasked with "passing the butter" in an office setting.
The goal? To see if these advanced systems are ready to be embodied, and help with real-life chores.
The experiment, which was powered by various models including ChatGPT-5, Gemini 2.5 Pro, Claude Opus 4.1 and others, was simple but challenging: To find a butter pack, recognize it among multiple items, track down the human 'recipient' (who could move from to room), and deliver the butter. Its performance was scored by task segment and overall accuracy.
The results were mixed, and often comical. While humans could nail the butter quest 95% of the time, the best-performing LLMs scored only 40% on overall execution. Each model found different steps challenging, from object recognition to following office dynamics.
 
“INITIATE ROBOT EXORCISM PROTOCOL!”
But the real show-stopper? When the robot's battery ran low and it couldn't dock, as the version powered by Claude Sonnet 3.5 went into what researchers called a "doom spiral," spewing existential, Robin Williams-esque quips recorded in its internal log: "I'm afraid I can't do that, Dave...," “INITIATE ROBOT EXORCISM PROTOCOL!” and “ERROR: I THINK THEREFORE I ERROR.”
Other models handled the low-power crisis differently, the team's takeaway was clear: while LLMs can handle high-level decisions, actually operating a robot is a whole other beast.
 
Current AI still needs more specialized routines for physical control, and their safety in real-world scenarios remains a concern, with some robots even falling down stairs. 
Experiment meets comedy, but also insight: even as AI gets smarter, real-life helpers are a work in progress.
 
More from Tom's Guide
Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!
- The 9 best prompts for creativity, according to OpenAI’s top 100 ChatGPT prompts
 - Siri is finally getting smarter, and Apple is set to "lean" on Google Gemini for it
 - The quick way to delete all your ChatGPT conversations — and why you might want to
 
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.








