Openai O1 Benchmarks Graph

5don MSN

OpenAI’s o1-pro is the company’s most expensive AI model yet

OpenAI has launched a more powerful version of its o1 "reasoning" AI model, o1-pro, in its developer API. It's incredibly ...

OpenAI’s new reasoning model o1-pro is powerful but pricey

The company has just launched o1-pro, making it available through its new developer application programming interface called ...

ChatGPT o1-pro is OpenAI’s best and most expensive reasoning AI, and developers can use it right now

OpenAI announced that developers can now use ChatGPT o1-pro, its best and most expensive AI reasoning model yet.

6don MSN

OpenAI releases a new AI model, but it’s eye-wateringly expensive

All of this makes it clear that OpenAI is aiming o1-pro at developers rather than everyday users. The model is currently available to select developers on tiers 1–5 (those who have spent a certain ...

29d

Did xAI lie about Grok 3’s benchmarks?

OpenAI researchers accused xAI about publishing misleading Grok 3 benchmarks. The truth is a little more nuanced.

26d

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

An AI expert who requested anonymity told Ars Technica, "GPT-4.5 is a lemon!" when comparing its reported performance to its dramatically increased price, while frequent OpenAI critic Gary Marcus ...

Forbes12d

Testing The Limits: Three Ways AI Benchmarks Are Evolving

OpenAI o1 leads with 90.5% of tasks solved, and DeepSeek R1 follows with 88.2%. Note that R1 trails behind o1 on U-MATH, contradicting R1’s victory on other math benchmarks like AIME and MATH-500.

15d

This new AI benchmark measures how much models lie

Researchers behind the MASK benchmark found that more knowledge doesn't mean more 'moral virtue.' See which model lies the most.

The Verge26d

OpenAI announces GPT-4.5, warns it’s not a frontier AI model

OpenAI is calling the release its “most knowledgeable model yet,” but initially warned that GPT-4.5 is not a frontier model and might not perform as well as o1 or o3-mini. GPT-4.5 will have ...

Hosted on MSN29d

Did xAI lie about Grok 3’s benchmarks?

Debates over AI benchmarks — and how they’re reported ... Grok 3 Reasoning Beta also trails ever-so-slightly behind OpenAI’s o1 model set to “medium” computing. Yet xAI is advertising ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results