Video AI has a boring problem, and boring problems are usually where the real market hides.
It is expensive.
If you want a model to understand live feeds, warehouse cameras, sports footage, factory lines, drone video, robotics views, or long recordings, the cost can get ugly fast. A text prompt is cheap compared with hours of visual input.
That is why Perceptron Mk1 is interesting.
Perceptron AI announced Mk1 on May 12 as a model built for video understanding and embodied reasoning. VentureBeat reported pricing at 15 cents per million input tokens and one dollar and fifty cents per million output tokens, with the company positioning that as roughly 80 to 90 percent cheaper than major proprietary rivals.
The exact benchmark comparisons will need real customer pressure before anyone should get religious about them.
But the direction matters.
Physical AI needs cheaper eyes.
Seeing is not enough
Most multimodal demos still treat video as a prettier version of image captioning.
That is useful, but limited.
Real video understanding is about continuity: objects move, disappear, reappear, collide, rotate, get blocked, change state, and matter only because of what happened five seconds earlier.
A model that describes a frame is not the same thing as a model that understands an event.
Perceptron is aiming at the second category. The company says Mk1 is purpose-built for video understanding and embodied reasoning, with use cases across manufacturing, media, robotics, geospatial analysis, security, devices, and agent tooling.
VentureBeat reported that Mk1 can process native video at up to 2 frames per second across a 32K token context window, return structured time codes, and reason about physical interactions rather than just label what appears in a frame.
That is the more serious bar.
The model needs to know not just that a ball exists, but whether the shot happened before the buzzer. It needs to know not just that a gauge is visible, but what the reading means. It needs to know not just that a worker moved, but whether a process step was skipped.
That is physical reasoning in a practical sense.
Cost changes the product surface
The price story is not a spreadsheet footnote.
It changes what people are willing to build.
If video reasoning is expensive, companies use it sparingly. They test it in a lab, run it on a few examples, maybe add it to a premium workflow where the cost can be justified.
If the price drops enough, the product shape changes.
You can watch more feeds, keep longer context, run more checks, clip more events, audit more processes, and let developers experiment without every prototype feeling like a finance meeting.
That is how capabilities become normal.
The first version of a technology proves it can work. The cheaper version decides whether anyone can use it everywhere.
Physical AI is infrastructure-heavy
Language models got to scale partly because text is easy to move around.
Physical AI is different.
The world is messy, visual, temporal, and full of edge cases. Robots, cameras, drones, industrial systems, and inspection workflows do not just need intelligence. They need perception that is fast enough, cheap enough, and reliable enough to sit inside a loop.
This is why companies keep circling around physical AI infrastructure: video models, simulation, robotics data, spatial benchmarks, embodied reasoning, edge deployment, and evaluation tools.
It is not one product category. It is a whole stack.
Mk1 is a useful signal because it points to one layer of that stack becoming more competitive: high-quality video reasoning through an API.
The real question
The open question is not whether Perceptron can win a launch-day chart.
The real question is whether developers can build dependable products on top of this kind of model.
Can it handle ugly camera angles, bad lighting, partial views, occlusions, weird equipment, dense scenes, and the quiet chaos of real operations? Can it maintain object identity long enough to matter? Can it fail in ways that operators understand? Can the price stay low when usage gets serious?
Those are the questions that decide whether video AI leaves the demo stage.
Still, the move is worth watching.
If models like Mk1 make physical perception cheaper, the market changes from “can AI understand this clip?” to “which processes should always have AI watching?”
That is a very different world.
Sources: National Law Review / Business Wire, VentureBeat, Perceptron Docs