Tag: benchmarks

All the articles with the tag "benchmarks".

NVIDIA AgentPerf Makes Agent Serving A Datacenter Metric

12 Jun, 2026

NVIDIA's AA-AgentPerf results show why agent workloads need new infrastructure benchmarks built around concurrent sessions, tool calls, and latency.
ADK Arena Is A Reality Check For Agent Frameworks

4 Jun, 2026

ADK Arena tests agent frameworks across real benchmark tasks and finds what builders already feel: no framework owns the agent stack yet.
Constraint Decay Is Why Coding Agents Break in Real Repos

24 May, 2026

A new arXiv paper found coding agents lose about 30 points as structural backend constraints accumulate. The lesson is simple: demos reward output; production rewards constraint discipline.
Every Frontier AI Model Just Scored Below 1% on a Reasoning Test. Humans Score 100%.

27 Mar, 2026

ARC-AGI-3 is the first interactive reasoning benchmark for AI agents. Gemini scored 0.37%. GPT-5.4 scored 0.26%. Claude scored 0.25%. Humans solve every single one. The gap is not closing.

NVIDIA AgentPerf Makes Agent Serving A Datacenter Metric