For 28 years, every single meter a Mars rover drove was planned by hand. Human engineers at JPL would study terrain images, calculate paths, write commands in a specialized language called Rover Markup Language, simulate the results, verify everything, and then transmit the instructions to Mars. It was slow, meticulous, and incredibly expensive in terms of human hours.
On December 8, 2025, Claude wrote the driving commands instead. Perseverance drove 689 feet, roughly 210 meters. Two days later, on December 10, it did it again: 807 feet, about 246 meters. A total of 456 meters across Mars, planned by an AI.
JPL engineers reviewed the commands before transmission. They found only minor adjustments were needed.
I want you to sit with that for a moment. An AI wrote commands to drive a robot on another planet, and the humans whose entire career is doing exactly that looked at the output and said: yeah, this is basically right.
Why this is different from every other AI demo
The internet is full of AI demos. Most of them are impressive for about thirty seconds and then you realize the constraints were carefully chosen to make the AI look good. This is not that.
Rover driving on Mars is one of the most specialized planning tasks in existence. The rover cannot be recovered if something goes wrong. Communication delay means you cannot course-correct in real time. The terrain is unpredictable. The stakes are a $2.7 billion machine that took years to build and seven months to get there. There is no margin for error, no “undo” button, no second chance.
The process involves simulating over 500,000 variables. The commands are written in RML, a domain-specific language that almost nobody outside JPL uses. The AI had to understand Martian terrain from orbital and rover imagery, reason about slopes and obstacles, plan paths that respect the rover’s physical constraints, and output syntactically correct commands in a language it was never explicitly trained on.
This is not autocomplete. This is not summarizing an article. This is an AI doing specialized engineering work that previously required years of domain expertise to perform.
What actually happened technically
Claude Code was given access to terrain data and the RML specification. It analyzed the environment, planned driving routes, and generated the command sequences. The commands were then run through JPL’s simulation environment, which models the rover’s interaction with Martian terrain in detail.
The simulation confirmed the drives were safe. JPL engineers reviewed the output, made minor tweaks, and approved transmission. The commands were sent to Mars, and Perseverance executed them successfully.
Two things stand out to me. First, the engineers found only minor changes needed. Not “we had to rewrite half of it” or “it got the general idea but the details were wrong”. Minor changes. On commands controlling a machine on Mars. Second, this was not a toy demonstration on a test track. These were real drives, on the real rover, on the real Mars. JPL does not do PR stunts with $2.7 billion hardware.
What this breaks
There is a persistent argument that AI can only handle simple, repetitive tasks. Data entry, basic writing, routine code. The argument goes that anything requiring deep expertise, specialized knowledge, and high-stakes decision-making is safe from AI for a long time.
Mars rover driving was supposed to be exactly that kind of safe domain. A handful of people in the world know how to do it. It requires understanding orbital mechanics, geology, robotics, and a proprietary programming language. It is the definition of specialized, high-stakes work that you would point to and say “AI cannot do this”.
And now AI has done it. Successfully. Twice.
I am not saying every specialized job is about to be automated tomorrow. But the goalpost just moved dramatically. If the argument was “AI cannot handle tasks where errors are catastrophic and the domain knowledge is extremely narrow”, that argument needs a new example, because the old one just drove 456 meters across Mars.
The speed factor
Here is the part that does not get enough attention. The bottleneck for Mars rover operations has always been human planning time. The rover sits idle while engineers on Earth spend hours or days planning the next drive. Every hour the rover is not moving is wasted mission time on a machine that has a limited operational lifespan.
If AI can produce drive plans that need only minor human review, the planning cycle compresses from hours to minutes. The rover spends more time doing science and less time waiting. Over the remaining life of the mission, that acceleration in planning could translate to significantly more terrain covered and significantly more data collected.
This is the pattern I keep seeing with AI in every domain. The value is not just “it can do the thing”. The value is “it can do the thing in a fraction of the time, which means you can do more of it”. More rover drives per week. More terrain explored per month. More science per dollar spent.
My honest reaction
I use Claude every day. I have written about its limitations, its quota problems, the things that frustrate me about it. I am not a cheerleader for Anthropic. But this genuinely impressed me.
Not because AI driving a Mars rover is flashy. It is, but that is not the point. What impressed me is the combination of factors: an extremely niche domain-specific language, physical-world reasoning about terrain and constraints, stakes that could not be higher, and output quality high enough that the world’s foremost experts in rover operations approved it with minor edits.
That combination tells me something about where we are. We are past the point of “AI is a toy”. We are past the point of “it only works for simple tasks”. We are at the point where the right AI, given the right context and the right tools, can perform at expert level in domains that were supposed to be decades away from automation.
The “AI cannot do real work” crowd needs a new argument. Their best example just drove across Mars.