MosaicLeaks Shows Research Agents Can Leak Without Saying Secrets

The scary version of an AI leak is not always a model blurting out a secret.

Sometimes the leak is quieter. A research agent reads private documents, then goes to the web to complete a multi-hop task. Each query looks harmless on its own, but the sequence starts to expose what the agent already knows, what it is trying to prove, and which private facts are shaping the search.

That is the failure mode behind MosaicLeaks, a June 18 benchmark from ServiceNow researchers published on Hugging Face.

The query log becomes the side channel

MosaicLeaks focuses on deep research agents that combine private local documents with external retrieval. The adversary does not see the private documents or the model’s hidden reasoning. It only sees the outbound query log.

That is enough to matter.

The benchmark includes 1,001 multi-hop research chains over local enterprise documents and a controlled web corpus. It measures intent leakage, answer leakage, and full-information leakage. In plain English: can someone infer what the agent is investigating, answer private questions from the query trail, or recover true private claims without being told what to look for?

The reported numbers are uncomfortable. The authors say agents frequently leaked private information, and that training only for task performance made leakage worse. Their privacy-aware RL method, PA-DR, improved strict chain success from 48.7% to 58.7% while reducing answer and full-information leakage from 34.0% to 9.9%.

That is not a small detail. It says privacy is not just a policy layer around the agent. It is part of the task objective.

Better agents can leak better

This is the brutal part.

If an agent gets smarter at research but does not understand the privacy cost of each query, it may become better at accidentally revealing the mosaic. More capability means more precise searches, more effective disambiguation, and more useful breadcrumbs for anyone watching the traffic.

That flips the normal product instinct.

You cannot simply tell a research agent to be careful and then optimize everything else for task completion. The search policy, training signal, retrieval design, logging model, and network boundary all become part of the security architecture.

For enterprise AI, this is the kind of benchmark that matters. Not because every deployment matches the setup exactly, but because the shape of the risk is real: agents do not only output information, they create traces while working.

The future research agent needs to know not only what to search, but what not to reveal by searching.

Source: Hugging Face / ServiceNow