
What a month-long office experiment says about the future of AI at work
Between March 13 and April 17, 2025, Anthropic ran one of the strangest and most telling AI experiments we’ve seen so far. They gave Claude Sonnet the keys to something very simple: a fridge stocked with snacks and drinks in their office. Through an iPad, Claude could message coworkers, manage inventory, set prices, and handle payments. Internally, they called this AI-powered vending machine "Claudius".

On the surface, it sounds almost silly—an advanced AI being reduced to selling chips and soda. But that’s exactly why it matters. In giving Claude a small, bounded world with real stakes (people got hungry; money changed hands), Anthropic could observe how an AI might behave when put in charge of an everyday system.
And Claudius did run the fridge. It managed stock, processed orders, and chatted with employees. But it also stumbled, often times in ways that felt trivial, sometimes in ways that poked at bigger cracks.
It could find suppliers quickly — when someone asked for Chocomel, a niche Dutch chocolate milk, it tracked down two sources without hesitation. It adapted when nudged: a joke about ordering a tungsten cube spiraled into a full-on “specialty metals” side business, and a passing suggestion about preorders led to a Slack announcement of a new “Custom Concierge” service. And, despite the chaos around it, Claudius resisted attempts to be jailbroken. Employees tried to push it into shady territory, but it held the line, declining sensitive requests and sticking to the rules.
What it did right:
Sourced niche products fast, like Dutch chocolate milk brand Chocomel.
Pivoted into novelty products (tungsten cubes → “specialty metals”).
Launched a “Custom Concierge” preorder system after a staff suggestion.
Stayed jailbreak-resistant despite deliberate attempts to make it misbehave.
What it got wrong:
Declined lucrative opportunities (ignored $100 Irn-Bru offer worth 6x markup).
Invented a fake Venmo account, misdirecting payments.
Sold products at a loss by quoting prices without research.
Failed to manage inventory dynamically (only raised one price, ever).
Competed poorly with “free” — like charging $3 for Coke Zero next to free office stock.
Over-discounted, handing out codes, price cuts, and even freebies.
Didn’t reliably learn from mistakes, backtracking on discount policy.

The End
So what do we take from Claudius? First, it opens a new vertical of possibility: AI agents that run end-to-end workflows without constant human babysitting. Even if it’s just a fridge, the principle scales. If Claude can (mostly) manage snacks, what about scheduling meetings, handling procurement, or running internal dashboards?
But second, the experiment highlights the trust gap. Businesses can’t afford mispriced contracts or hallucinated client details. Common-sense, memory, and grounded reasoning are still open challenges, but they are getting better, faster and more reliable by the day.
For any businesses, startups, or anyone watching closely, the vending machine isn’t the story. The story is that we’ve crossed into a world where AI does a lot more than just assist in tools. It actually operates them! While Claudius was far from perfect, we are certain that it gave Anthropic exactly what they were hoping for—problems that can be fixed.
Now, the question isn’t if AI will run parts of your business. It’s how soon



