Openai O1 Benchmarks Graph

News

OpenAI's o3 and o4-mini hallucinate way higher than previous models

By OpenAI 's own testing, its newest reasoning models, o3 and o4 -mini, hallucinate significantly higher than o1.

OpenAI rolls out o3 and o4-mini: From coding and maths to visuals, how ChatGPT’s new models handle it all

OpenAI has launched its advanced AI models, o3 and o4-mini, enhancing reasoning and problem-solving capabilities. The o3 ...

OpenAI's Deep Research has more fact-finding stamina than you, but it's still wrong half the time

Wei and team don't directly offer any hypothesis about why Deep Research fails almost half the time, but the implicit answer ...

YourStory3d

OpenAI rolls out its latest reasoning models o3 and o4‑mini

Described as the company's “smartest models to date,” they can agentically use and combine every tool within ChatGPT, such as ...

OpenAI o3 & o4 Mini : The First True Reasoning Agents?

Discover OpenAI’s O3 & O4 Mini, the groundbreaking AI models excelling in reasoning, tool usage, and cost efficiency. Learn ...

OpenAI launches o3 and o4-mini, AI models that ‘think with images’ and use tools autonomously

OpenAI launches groundbreaking o3 and o4-mini AI models that can manipulate and reason with images, representing a major ...

OpenAI releases new simulated reasoning models with full tool access

On Wednesday, OpenAI announced the release of two new models—o3 and o4-mini—that combine simulated reasoning capabilities ...

4don MSN

OpenAI partner says it had relatively little time to test the company’s o3 AI model

An organization OpenAI frequently partners with to probe the capabilities of its AI models and evaluate them for safety, Metr, suggests that it wasn’t given much time to test one of the company’s ...

OpenAI Releases o3 and o4-mini, Says o3 Can ‘Generate Novel Hypotheses’

OpenAI has finally released the full o3 reasoning model along with o4-mini. New models can use multiple tools inside ChatGPT ...

OpenAI continues naming chaos despite CEO acknowledging the habit

If this sounds confusing, well, that's because it is. OpenAI CEO Sam Altman acknowledged OpenAI's habit of terrible product ...

IEEE Spectrum on MSN13d

12 Graphs That Explain the State of AI in 2025

Cutting through the confusion is the 2025 AI Index from Stanford University’s Institute for Human-Centered Artificial ...

Geeky Gadgets14d

New OpenAI PaperBench : Autonomous AI Research Benchmarking

OpenAI has unveiled “PaperBench,” a benchmark designed to evaluate how effectively AI agents can replicate innovative machine learning research. This initiative is a cornerstone of OpenAI’s ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results