The chat box was never going to be the final form.
NVIDIA’s June 16 XR AI public beta is a good reminder of where agents are trying to go next: not just into browsers, IDEs, and office suites, but into the physical flow of work.
The idea is simple to say and hard to build. An agent running through AR glasses or an XR device needs to understand live video, audio, depth, pose, sensor input, enterprise knowledge, tools, and the human’s immediate spatial context. It has to help without becoming visual noise.
That is a much higher bar than answering a question in a tab.
Context becomes physical
Most software agents live in symbolic worlds.
They see text, files, APIs, logs, tables, tickets, and browser pages. XR agents add a new category: the world in front of the user.
NVIDIA XR AI is a developer library for connecting AR and XR device inputs with models, retrieval systems, tools, orchestration, and accelerated runtime services. NVIDIA describes support for video, audio, depth, pose, sensor data, enterprise retrieval, Nemotron reasoning models, Cosmos Reason, NeMo Agent Toolkit, and infrastructure from RTX systems through DGX Spark and DGX Station.
The interesting part is not the pile of product names. The interesting part is the interface shift.
If an engineer is standing in front of a machine, the relevant context is not only the manual. It is the machine state, the procedure, the last repair, the part in view, the safety boundary, and what the engineer is trying to do with both hands occupied.
That is where agents become less like chat assistants and more like operating companions.
The early use cases are telling
NVIDIA points to factory maintenance, scientific labs, surgical support, automotive design, immersive learning, and spatial storytelling.
Those are not casual contexts. They are places where the agent has to preserve attention, surface the right information at the right time, and avoid creating distraction. In a lab or operating room, the wrong overlay is not merely annoying. It can break focus.
That is why XR agents are such a useful stress test for the whole agent category.
They force the system to deal with latency, perception, retrieval, tool use, user intent, and human attention at the same time.
The next agent UI may be situational
The important lesson is not that everyone will wear smart glasses tomorrow.
The lesson is that agents need interfaces shaped by the work. Coding agents belong near the repo. Enterprise agents belong near the workflow. Lab agents belong near the bench. Maintenance agents belong near the machine.
The agent interface will not be one universal chat window. It will be situational, embodied where useful, and constrained by the physics of the task.
NVIDIA XR AI is early, but the direction is clear: agents are leaving the rectangle.
Source: NVIDIA