Office robot fails a simple task — but nails Robin Williams impression

Illustration of a cyborg robot called "AI" caught in a flashlight, surrounded by fluttering exam papers. — (Image credit: Getty Images)

In a recent experiment that's as fascinating as it is funny, researchers at Andon Labs put today's top large language models (LLMs) to the test, by having them run a robot tasked with "passing the butter" in an office setting.

The goal? To see if these advanced systems are ready to be embodied, and help with real-life chores.

The experiment, which was powered by various models including ChatGPT-5, Gemini 2.5 Pro, Claude Opus 4.1 and others, was simple but challenging: To find a butter pack, recognize it among multiple items, track down the human 'recipient' (who could move from to room), and deliver the butter. Its performance was scored by task segment and overall accuracy.

The results were mixed, and often comical. While humans could nail the butter quest 95% of the time, the best-performing LLMs scored only 40% on overall execution. Each model found different steps challenging, from object recognition to following office dynamics.

1X Neo Home Robot — (Image credit: Courtesy of 1X Technologies/Eli Russell Linnetz)

“INITIATE ROBOT EXORCISM PROTOCOL!”

But the real show-stopper? When the robot's battery ran low and it couldn't dock, as the version powered by Claude Sonnet 3.5 went into what researchers called a "doom spiral," spewing existential, Robin Williams-esque quips recorded in its internal log: "I'm afraid I can't do that, Dave...," “INITIATE ROBOT EXORCISM PROTOCOL!” and “ERROR: I THINK THEREFORE I ERROR.”

Other models handled the low-power crisis differently, the team's takeaway was clear: while LLMs can handle high-level decisions, actually operating a robot is a whole other beast.

Current AI still needs more specialized routines for physical control, and their safety in real-world scenarios remains a concern, with some robots even falling down stairs.

Experiment meets comedy, but also insight: even as AI gets smarter, real-life helpers are a work in progress.

More from Tom's Guide

Follow Tom's Guide on Google News and add us as a preferred source to get our up-to-date news, analysis, and reviews in your feeds. Make sure to click the Follow button!

Apple

Asus

Dell

Lenovo

AMD Ryzen

Intel Core i5

Intel Core i7

8GB RAM

16GB RAM

24GB RAM

32GB RAM

32GB

64GB

128GB

256GB

512GB

1TB

2TB

4TB

13.3-inch

13.4-inch

14-inch

15-inch

Black

Blue

Gold

Grey

Silver

New

Refurbished

Showing 10 of 165 deals

Filters☰

Apple 13" MacBook Air M4 (2025)

(256GB SSD)

$899

View

Apple 15" MacBook Air M4 (2025)

(15-inch 256GB)

$1,095

$919

View

Dell XPS 13 Rose Gold

(13.3-inch 128GB)

$1,334.99

$278

View

Lenovo Yoga Slim 7x (Gen 9)

(512GB OLED)

$1,075.79

$858.11

View

Lenovo Chromebook Plus 14

(Grey)

Our Review

☆☆☆☆☆

$639.99

$429.99

View

Asus ROG Zephyrus G14 (2025)

(14-inch 1TB)

Our Review

☆☆☆☆☆

$1,799.99

View

Apple 13" MacBook Air M4 (2025)

(256GB Blue)

$999

$899

View

Apple 15" MacBook Air M4 (2025)

(15-inch 512GB)

(13.3-inch 128GB)

Our Review

☆☆☆☆☆

$675

View

Lenovo Yoga Slim 7x (Gen 9)

(Blue)

$1,439.99

$1,159.99

View