OpenAI's o3 model wins gold at IOI, surpassing human benchmarks and redefining AI coding capabilities. These groundbreaking ...
DeepSeek has gone viral. Chinese AI lab DeepSeek broke into the mainstream consciousness this week after its chatbot app rose ...
Industry Leader Known for Software Development Skills Expertise Introduces Real-World Benchmark of AI Software Development CapabilitiesCUPERTINO, Calif., Feb. 11, 2025 (GLOBE NEWSWIRE) -- HackerRank, ...
OpenAI’s o1 and DeepSeek’s R1 models, which previously sat atop the leaderboard, could only get through roughly 9% of the ...
Unlike a regular AI chatbot query, Perplexity Deep Research scans the web, reasons through relevant search results, and ...
With a few hundred well-curated examples, an LLM can be trained for complex reasoning tasks that previously required thousands of instances.
In this edition of TechCrunch's This Week in AI newsletter, we take a look at Elon Musk's bid for OpenAI's nonprofit arm — ...
LangChain evaluated a single AI agent to see if its performance degrades when given more context and tools, essentially overwhelming it.
On Wednesday, Galileo launched an Agent Leaderboard on Hugging Face, an open-source AI platform where users can build, train, access, and deploy AI models. The leaderboard is meant to help people ...
OpenThinker-32B achieved benchmark-beating results using just 14% of the data its Chinese competitor needed, marking a win ...