
Five hours of expert level autonomy: METR’s Claude ... - Digit
4 days ago · A new result from the AI evaluation nonprofit METR has pushed the conversation around autonomous AI systems into new territory. According to METR’s latest reporting, Claude Opus 4.5 …
Anthropic's Claude Opus 4.5 can tackle some tasks lasting ...
5 days ago · The AI research organization METR has published new test results for Claude Opus 4.5. Anthropic's model achieves a so-called 50 percent time horizon of around 4 hours and 49 minutes.
Claude Opus 4.5 Dominates with 4+ Hour Task Performance on ...
Claude Opus 4.5 delivers a 21 percentage point accuracy boost on the WeirdML benchmark while slashing costs by two-thirds. The upgrade represents the biggest performance leap in the Opus …
Techmeme: METR: Claude Opus 4.5 has a 50% task completion ...
5 days ago · METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year — We estimate that, on …
METR
METR does not accept monetary compensation from model developers for this work, but companies including OpenAI and Anthropic have provided access and free compute credits to support our …
Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49
Mar 19, 2025 · An updated METR graph including Claude Opus 4.5 was just published 3 hours ago on X by METR (source): Same graph but without the log (source): Thread from METR on X (source): We …
We estimate that, on our tasks, Anthropic's Claude Opus 4.5 ...
We estimate that, on our tasks, Anthropic's Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working ...