Biology Agents Need Deterministic Tools, Not Just Smarter Models

Coding agents are advancing faster than biology agents for a blunt reason.

Software has tests, APIs, package managers, version control, and a lot of machine-readable structure. Biology has scattered databases, brittle formats, inconsistent identifiers, browser workflows, metadata traps, and expert conventions that often live in someone’s head.

Anthropic’s June 8 research post on agents in biology makes that gap very clear.

The team tested scientific research agents on a practical task: retrieving sequence data from NCBI Virus, a database used by virologists for surveillance, diagnostics, and assay development. Even strong models did not consistently reach the accuracy needed for reliable dataset construction.

Then the team added gget virus, a deterministic retrieval layer.

Accuracy rose to nearly 100 percent.

That is the whole story in miniature.

The bottleneck is not only intelligence

It is tempting to think biology agents will arrive when models become smart enough.

That is only partly true.

A model can understand the task, explain the biology, and still fail because the path from intent to data is messy. It may pick the wrong genome build, mix records from different sources, miss partial genomes, confuse segment names, or silently drop relevant records because a metadata field behaves differently than expected.

In biology, those are not cosmetic mistakes.

They can break the downstream conclusion.

The lesson is not that agents are useless for science. The lesson is that agents need dependable execution layers around them. If the data access path is deterministic, inspectable, and built for the task, the model can focus on judgment, hypothesis formation, and reasoning instead of fragile browser navigation.

Scientific infrastructure has to become agent-readable

Anthropic frames the problem as infrastructure built for humans, not agents.

That feels right. A human expert can survive a bad interface because they have years of context. They know which filters matter, which fields are unreliable, and which weird edge cases should trigger caution. An agent sees a maze of options and has to infer rules that were never made explicit.

The solution is not to let the agent click harder.

It is to make the scientific stack more legible to machines.

That means stable APIs, better metadata, reproducible retrieval tools, validation checks, and clear provenance. It also means designing databases with the assumption that agents will be scaled users, not occasional visitors.

Agents need receipts in science

For serious scientific work, the output cannot be “the agent found some sequences.”

The output has to include how they were found, which filters were used, which records were excluded, which source versions were touched, and which deterministic tool produced the dataset. Without that trail, the work is not auditable enough to trust.

This is the same pattern showing up across agent systems, but science makes the stakes more obvious.

Powerful models matter. Reliable tools matter more than the demo suggests.

If biological agents are going to help with outbreak response, drug design, surveillance, and modeling, they will need roads built for them.

Not just better engines.

Source: Anthropic