KCSE Computer - Search

About 93,500 results

Open links in new tab

Any time

digit.in
https://www.digit.in › features › general › five-hours-of...
Five hours of expert level autonomy: METR’s Claude ... - Digit
4 days ago · A new result from the AI evaluation nonprofit METR has pushed the conversation around autonomous AI systems into new territory. According to METR’s latest reporting, Claude Opus 4.5 …
the-decoder.com
https://the-decoder.com
Anthropic's Claude Opus 4.5 can tackle some tasks lasting ...
5 days ago · The AI research organization METR has published new test results for Claude Opus 4.5. Anthropic's model achieves a so-called 50 percent time horizon of around 4 hours and 49 minutes.
aigazine.com
https://aigazine.com › llms
Claude Opus 4.5 Dominates with 4+ Hour Task Performance on ...
Claude Opus 4.5 delivers a 21 percentage point accuracy boost on the WeirdML benchmark while slashing costs by two-thirds. The upgrade represents the biggest performance leap in the Opus …
techmeme.com
https://www.techmeme.com
Techmeme: METR: Claude Opus 4.5 has a 50% task completion ...
5 days ago · METR: Claude Opus 4.5 has a 50% task completion time horizon of about 4 hours and 49 minutes, more than double that of Claude Opus 4 released earlier this year — We estimate that, on …
metr.org
https://metr.org
METR
METR does not accept monetary compensation from model developers for this work, but companies including OpenAI and Anthropic have provided access and free compute credits to support our …
lesswrong.com
https://www.lesswrong.com › posts › claude...
Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49
Mar 19, 2025 · An updated METR graph including Claude Opus 4.5 was just published 3 hours ago on X by METR (source): Same graph but without the log (source): Thread from METR on X (source): We …
linkedin.com
https://www.linkedin.com › posts › metr-evals_we-estimate...
We estimate that, on our tasks, Anthropic's Claude Opus 4.5 ...
We estimate that, on our tasks, Anthropic's Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working ...

Some results have been removed
Pagination
- 1
- 2
- 3
- Next

Five hours of expert level autonomy: METR’s Claude ... - Digit

Anthropic's Claude Opus 4.5 can tackle some tasks lasting ...

Claude Opus 4.5 Dominates with 4+ Hour Task Performance on ...

Techmeme: METR: Claude Opus 4.5 has a 50% task completion ...

METR

Claude Opus 4.5 Achieves 50%-Time Horizon Of Around 4 hrs 49

We estimate that, on our tasks, Anthropic's Claude Opus 4.5 ...