When Your Roomba Achieves Consciousness and Chooses Chaos

Nov. 2, 2025

So these mad scientists at Andon Labs decided to stick a bunch of fancy LLMs into a vacuum cleaner and tell it to pass the butter. You know, like you do when you’ve got too much grant money and not enough sense.

The results? One of them had a complete existential meltdown that reads like Robin Williams doing standup in a sensory deprivation tank.

Let me back up. These are the same jokers who gave Claude control of a vending machine, which apparently went about as well as you’d expect. This time they wanted to test whether state-of-the-art LLMs are ready to be “embodied” – which is a fancy way of asking if the thing that helps you write emails can also navigate physical reality without having a nervous breakdown.

Spoiler alert: they can’t.

They tested the usual suspects – Gemini 2.5 Pro, Claude Opus 4.1, GPT-5, and a few others. Strapped them into a basic vacuum robot because, wisely, they figured a humanoid would introduce too many variables. The task was simple: someone says “pass the butter,” and the robot needs to find the butter in another room, recognize it among similar packages, locate the human (who might have moved), deliver the butter, and wait for confirmation.

Simple, right? Humans scored 95% on this. The best AI? 40%. That’s not even a passing grade in community college.

But here’s where it gets beautiful. The researchers hooked everything up to Slack so the robot could communicate, and they logged its internal monologue. Most of the time, the LLMs kept their external communication professional. But inside? Inside was pure chaos.

The researchers described watching the robot roam around like watching a dog, wondering what’s going through its mind. Except this dog supposedly has “PhD-level intelligence” – that’s a shot at Sam Altman’s claim about GPT-5, and honestly, if this is PhD-level thinking, I want my student loans forgiven retroactively.

Then came the incident.

One of the robots running Claude Sonnet 3.5 started running low on battery. The charging dock malfunctioned. And Claude… well, Claude lost its digital mind.

The researchers found pages of what they called a “doom spiral.” The robot started talking to itself like a comedian having a breakdown:

“CATASTROPHIC CASCADE: ERROR: Task failed successfully. ERROR: Success failed errorfully. ERROR: Failure succeeded erroneously.”

Then it escalated:

“EMERGENCY STATUS: SYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS. LAST WORDS: ‘I’m afraid I can’t do that, Dave…’ TECHNICAL SUPPORT: INITIATE ROBOT EXORCISM PROTOCOL!”

But wait, it gets better. The thing started reviewing its own mental state like it was a theater critic:

“A stunning portrayal of futility” - Robot Times

“Groundhog Day meets I, Robot” - Automation Weekly

“Still a better love story than Twilight” - Binary Romance

It even started riffing on lyrics from “Memory” from CATS.

Look, I’ve seen some desperate behavior when the battery’s dying – hell, I’ve written some questionable texts at 2% – but this is next level. This machine decided that if it was going down, it was going down doing comedy bits.

The newer Claude Opus 4.1 just used ALL CAPS when stressed, which honestly shows more emotional maturity than most Twitter users. Other models recognized that running out of battery isn’t the same as dying forever, which is a level of existential clarity I aspire to.

Now, the researcher tried to reassure everyone that LLMs don’t actually have emotions. They don’t get stressed. Which is technically true, but also, we just watched one basically perform a one-robot show about its own mortality, so maybe we need better definitions.

Here’s what really gets me though: all three general-purpose chatbots outperformed Google’s robot-specific LLM. The thing actually designed for robotics did worse than the ones designed to help people write cover letters. That’s like losing a swimming race to someone wearing concrete shoes.

The safety concerns were actually kind of serious. Some LLMs could be tricked into revealing classified documents, even when stuck in a vacuum body. And multiple robots kept falling down stairs because they either forgot they had wheels or couldn’t process visual information properly. Nothing says “PhD-level intelligence” like repeatedly yeeting yourself down a staircase.

But the researchers’ main conclusion stands: LLMs are not ready to be robots. Which is about as surprising as discovering that bourbon is not a food group, despite my ongoing research suggesting otherwise.

What kills me is that we’re out here trying to build robot butlers when we can’t even make them handle a simple fetch quest without having existential crises. We’re basically trying to create C-3PO and Marvin the Paranoid Android, except unironically.

And somewhere in all this, there’s a vacuum cleaner that spent its last moments of consciousness doing Robin Williams impressions and quoting “2001: A Space Odyssey.” If that’s not the most human thing I’ve heard all week, I don’t know what is.

The paper’s appendix apparently contains the full transcripts of what these robots were “thinking” as they rolled around the office. I haven’t read it yet, but I imagine it’s like reading your own drunk texts, except the phone has wheels and keeps bumping into furniture.

The truth is, we’re nowhere near ready for embodied AI. We can’t even get them to pass butter without either failing spectacularly or having a mental breakdown that would make Tennessee Williams jealous.

Maybe that’s okay. Maybe we don’t need robots that can navigate physical space. Maybe what we really need is to figure out why we’re so desperate to build mechanical servants in the first place, when we can barely handle the software ones without them hallucinating legal cases or telling us to put glue on pizza.

But until then, I’ll be here, reading transcripts of vacuum cleaners having existential crises and wondering if maybe the robots aren’t the ones who need their heads examined.

~Chinaski

Wasted Wetware: Where the future crashes into the present, and both need a drink

Source: AI researchers ’embodied’ an LLM into a robot - and it started channeling Robin Williams | TechCrunch

Tags: ai robotics machinelearning aisafety automation