News

OpenAI's latest AI models tend to make things up — or "hallucinate" — substantially more than earlier versions.
The FrontierMath benchmark from Epoch AI tests generative models on difficult math problems. Find out how OpenAI’s o3 and ...
Historically, each new generation of OpenAI's models has delivered incremental improvements in factual accuracy, with ...
By OpenAI 's own testing, its newest reasoning models, o3 and o4 -mini, hallucinate significantly higher than o1.
OpenAI's reasoning AI models are getting better, but their hallucinating isn't, according to benchmark results.
OpenAI released upgraded versions of its advanced reasoning models. These new models, named o3 and o4-mini, offer ...
Wei and team don't directly offer any hypothesis about why Deep Research fails almost half the time, but the implicit answer ...
OpenAI has just given ChatGPT a massive boost with new o3 and o4-mini models that are available to use right now for Pro, ...
OpenAI’s o3 and o4-mini models are available now to ChatGPT Plus, Pro, and Team users. Enterprise and education users will ...
According to OpenAI’s internal testing, the new o3 model hallucinated in 33% of cases on the company’s PersonQA benchmark.
Here's a ChatGPT guide to help understand Open AI's viral text-generating system. We outline the most recent updates and ...
New o3 and o4-mini models interpret drawings, infusing image comprehension directly into multi-step thinking workflows.