Posts
All the articles I've posted.
-
Constraint Decay Is Why Coding Agents Break in Real Repos
A new arXiv paper found coding agents lose about 30 points as structural backend constraints accumulate. The lesson is simple: demos reward output; production rewards constraint discipline.
-
Project Glasswing Moved the Bottleneck From Finding Bugs to Fixing Them
Anthropic says Claude Mythos Preview and roughly 50 partners found more than ten thousand high- or critical-severity vulnerabilities. The scary part is not discovery anymore. It is disclosure, triage, and patch throughput.
-
Microsoft's Security Copilot Agent Is the Boring AI Win
A new Microsoft Security Copilot paper says its Dynamic Threat Detection Agent runs across tens of thousands of Defender customers with 80.1% precision. This is what production agents are starting to look like: narrow, audited, always-on, and embedded inside existing workflows.
-
Gemini Spark Is Google's Background Agent Bet
Google introduced Gemini Spark as a 24/7 personal agent for Workspace and connected apps. The product signal is background delegation, not another chat surface.
-
OpenAI's Geometry Proof Is the Research Shock
OpenAI says an internal general-purpose reasoning model disproved a central conjecture in discrete geometry. The important part is not the headline, it is the kind of work that survived expert scrutiny.
-
Google Is Putting Gemini Back on Your Face
Google's Android XR eyewear starts with audio glasses this fall. The hard part is not the frames, it is trust in public.
-
Google Search Is Becoming an Agent Console
Google's I/O 2026 Search update adds information agents, generative UI, and deeper AI Mode. Search is moving from links to delegated work.
-
Anthropic Bought the Boring Layer Agents Actually Need
Anthropic's Stainless acquisition is about SDKs, CLIs, MCP servers, and the tool layer Claude needs if agents are going to do real work.
-
NVIDIA's SANA-WM Makes World Models Feel Less Remote
SANA-WM is a 2.6B open world model for minute-long 720p video with camera control. The signal is efficiency.
-
arXiv Is Making Researchers Own Their AI Mistakes
arXiv will punish submissions that show unchecked LLM output. The real story is not banning AI, it is restoring accountability.