O1 Benchmarks - Search News

15h

OpenAI’s o3 Model Stuns the World with Gold Medal Win at IOI

OpenAI's o3 model wins gold at IOI, surpassing human benchmarks and redefining AI coding capabilities. These groundbreaking ...

TechCrunch on MSN9h

DeepSeek: Everything you need to know about the AI chatbot app

DeepSeek has gone viral. Chinese AI lab DeepSeek broke into the mainstream consciousness this week after its chatbot app rose ...

HackerRank Introduces New Benchmark to Assess Advanced AI Models

Industry Leader Known for Software Development Skills Expertise Introduces Real-World Benchmark of AI Software Development CapabilitiesCUPERTINO, Calif., Feb. 11, 2025 (GLOBE NEWSWIRE) -- HackerRank, ...

2don MSN

OpenAI’s DeepResearch can complete 26% of ‘Humanity’s Last Exam’ — a benchmark for the frontier of human knowledge

OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the ...

2hon MSN

Perplexity one-ups Gemini and ChatGPT with a fantastic AI freebie

Unlike a regular AI chatbot query, Perplexity Deep Research scans the web, reasons through relevant search results, and ...

Researchers find you don’t need a ton of data to train LLMs for reasoning tasks

With a few hundred well-curated examples, an LLM can be trained for complex reasoning tasks that previously required thousands of instances.

2don MSN

This Week in AI: Musk bids for OpenAI

In this edition of TechCrunch's This Week in AI newsletter, we take a look at Elon Musk's bid for OpenAI's nonprofit arm — ...

LangChain shows AI agents aren’t human-level yet because they’re overwhelmed by tools

LangChain evaluated a single AI agent to see if its performance degrades when given more context and tools, essentially overwhelming it.

15h

Which AI agent is the best? This new leaderboard can tell you

On Wednesday, Galileo launched an Agent Leaderboard on Hugging Face, an open-source AI platform where users can build, train, access, and deploy AI models. The leaderboard is meant to help people ...

decrypt1d

New Open Source AI Model Rivals DeepSeek's Performance—With Far Less Training Data

OpenThinker-32B achieved benchmark-beating results using just 14% of the data its Chinese competitor needed, marking a win ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results