Tag: research
All the articles with the tag "research".
-
ADK Arena Is A Reality Check For Agent Frameworks
ADK Arena tests agent frameworks across real benchmark tasks and finds what builders already feel: no framework owns the agent stack yet.
-
GPT-Rosalind Shows AI Moving From Answers To Scientific Workflows
OpenAI's GPT-Rosalind update is less about a smarter biology chatbot and more about AI becoming an auditable workbench for scientific evidence and execution.
-
Claude Code May Be Making Developers More Technically Adventurous
A new arXiv paper studies 5,838 GitHub developers and finds Claude Code adoption coincides with more commits, more repos, and broader language use.
-
OpenAI's Geometry Proof Is the Research Shock
OpenAI says an internal general-purpose reasoning model disproved a central conjecture in discrete geometry. The important part is not the headline, it is the kind of work that survived expert scrutiny.
-
NVIDIA's SANA-WM Makes World Models Feel Less Remote
SANA-WM is a 2.6B open world model for minute-long 720p video with camera control. The signal is efficiency.
-
arXiv Is Making Researchers Own Their AI Mistakes
arXiv will punish submissions that show unchecked LLM output. The real story is not banning AI, it is restoring accountability.
-
Sakana's 7B Conductor Is the Agent Pattern to Watch
Sakana trained a small model to manage larger models in natural language. That may matter more than another single-model benchmark win.
-
David Silver Raised $1.1 Billion to Teach AI Without Human Homework
DeepMind veteran David Silver raised $1.1B for Ineffable Intelligence, a London lab betting reinforcement learning can create AI beyond human data.
-
Stanford's AI Index 2026: Agents Score Half as Well as PhD Experts. Everyone Is Adopting Them Anyway.
The 2026 Stanford AI Index Report reveals a paradox. AI agents perform only half as well as PhD experts on complex tasks. Spending is in the hundreds of billions. Adoption is faster than the PC or the internet. The gap between capability and capital has never been wider, and it might not matter.
-
Every Frontier AI Model Just Scored Below 1% on a Reasoning Test. Humans Score 100%.
ARC-AGI-3 is the first interactive reasoning benchmark for AI agents. Gemini scored 0.37%. GPT-5.4 scored 0.26%. Claude scored 0.25%. Humans solve every single one. The gap is not closing.