đź”— An Entire Company Was Staffed With AI Agents and You’ll Never Guess What Happened:
As Business Insider first reported, the results were dismal. The best-performing model was Anthropic’s Claude 3.5 Sonnet, which struggled to finish just 24 percent of the jobs assigned to it. The study’s authors note that even this meager performance is prohibitively expensive, averaging nearly 30 steps and a cost of over $6 per task.
Google’s Gemini 2.0 Flash, meanwhile, averaged a time-consuming 40 steps per finished task, but only had an 11.4 percent rate of success — the second highest of all the models. The worst AI employee was Amazon’s Nova Pro v1, which finished just 1.7 percent of its assignments at an average of almost 20 steps.
…and now cue a bunch of Very Serious Tech Guys asking “But isn’t that what humans do?”