News

The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems. Find out how OpenAI’s o3 and ...
Wei and team don't directly offer any hypothesis about why Deep Research fails almost half the time, but the implicit answer ...
OpenAI's latest AI models tend to make things up — or "hallucinate" — substantially more than earlier versions.
If you’ve used an AI model, you’ve most likely seen it hallucinate. This is when the model produces incorrect or misleading ...
OpenAI has launched its advanced AI models, o3 and o4-mini, enhancing reasoning and problem-solving capabilities. The o3 ...
OpenAI’s o3 and o4-mini models are available now to ChatGPT Plus, Pro, and Team users. Enterprise and education users will ...
The rave reviews OpenAI's latest models have been winning come with an asterisk: Experts are also finding that they're ...
OpenAI says its latest models, o3 and o4-mini, are its most powerful yet. However, research shows the models also hallucinate more -- at least twice as much as earlier models.
OpenAI's new AI models are hallucinating more than their predecessor, as per an internal testing report released by the ...
Metr, a frequent OpenAI partner, suggested in a blog post that it wasn't given much time to evaluate the company's powerful ...
By OpenAI 's own testing, its newest reasoning models, o3 and o4 -mini, hallucinate significantly higher than o1.
Metr writes that one red teaming benchmark of o3 was “conducted in a relatively short time” compared to the organization’s testing of a previous OpenAI flagship model, o1. This is ...