Office robot fails a simple task — but nails Robin Williams impression

Illustration of a cyborg robot called "AI" caught in a flashlight, surrounded by fluttering exam papers.
(Image credit: Getty Images)

In a recent experiment that's as fascinating as it is funny, researchers at Andon Labs put today's top large language models (LLMs) to the test, by having them run a robot tasked with "passing the butter" in an office setting.

The goal? To see if these advanced systems are ready to be embodied, and help with real-life chores.

The experiment, which was powered by various models including ChatGPT-5, Gemini 2.5 Pro, Claude Opus 4.1 and others, was simple but challenging: To find a butter pack, recognize it among multiple items, track down the human 'recipient' (who could move from to room), and deliver the butter. Its performance was scored by task segment and overall accuracy.

The results were mixed, and often comical. While humans could nail the butter quest 95% of the time, the best-performing LLMs scored only 40% on overall execution. Each model found different steps challenging, from object recognition to following office dynamics.

1X Neo Home Robot

(Image credit: Courtesy of 1X Technologies/Eli Russell Linnetz)

“INITIATE ROBOT EXORCISM PROTOCOL!”

But the real show-stopper? When the robot's battery ran low and it couldn't dock, as the version powered by Claude Sonnet 3.5 went into what researchers called a "doom spiral," spewing existential, Robin Williams-esque quips recorded in its internal log: "I'm afraid I can't do that, Dave...," “INITIATE ROBOT EXORCISM PROTOCOL!” and “ERROR: I THINK THEREFORE I ERROR.”

Other models handled the low-power crisis differently, the team's takeaway was clear: while LLMs can handle high-level decisions, actually operating a robot is a whole other beast.

1X Neo Home Robot

(Image credit: Courtesy of 1X Technologies/Eli Russell Linnetz)

Current AI still needs more specialized routines for physical control, and their safety in real-world scenarios remains a concern, with some robots even falling down stairs.

Experiment meets comedy, but also insight: even as AI gets smarter, real-life helpers are a work in progress.

Google

(Image credit: Future)

More from Tom's Guide

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

Category
Arrow
Arrow
Back to Laptops
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 92 deals
Filters
Arrow
Show more
AI Staff Writer

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.