The surgeon's hands trembled slightly. Not from nerves, but from the caffeine crash hitting after her third consecutive robotic surgery of the day. Across the sterile field, something extraordinary was happening: a da Vinci robot was autonomously adjusting its camera angle, following her instruments with an almost human-like intuition. The machine had learned this behavior not from thousands of lines of code, but from watching surgical videos, mimicking expert movements like a medical student shadowing in the OR.
This vision isn't reality yet, but it's no longer science fiction. The AI systems capable of this level of autonomous surgical assistance are being tested in labs right now.
Welcome to the age of physical AI in surgery, where artificial intelligence doesn't just process data, it embodies intelligence in machines that can see, touch, and make split-second decisions in the most critical moments of human life.
Physical AI refers to models that understand and interact with the real world using motor skills, and they're often housed in autonomous machines, such as robots or self-driving vehicles. Unlike the chatbots and virtual assistants we've grown accustomed to, embodied AI are physical systems powered by artificial intelligence that can directly engage with their surroundings. They use devices like sensors and motors to compile data from their movements and environments, then apply machine learning, computer vision and natural language processing to glean insights from that data.
Think of it as the difference between reading about riding a bicycle and actually getting on one. Traditional artificial intelligence processes information about the world; physical AI,also called embodied AI, experiences the world through sensors, actuators, and feedback loops that mirror how we learn through touch, movement, and spatial awareness.
In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. These physical AI agents are revolutionizing everything from warehouse logistics to space exploration, but nowhere is their impact more profound (or more personal) than in the operating room.
The da Vinci surgical system has dominated robotic surgery for over two decades, but it's undergoing a radical transformation. The system always requires a human operator. But that's changing. Recent breakthroughs have researchers retrofitting da Vinci robots with vision-language models trained on thousands of hours of surgical footage.
In 2024, researchers at Johns Hopkins and Stanford Universities, revealed they have integrated a vision-language model (VLM), which was trained on hours of surgical videos, with the widely-used da Vinci robotic surgical system. Once connected with the VLM, da Vinci's tiny grippers, or "hands," can autonomously perform three critical surgical tasks: carefully lifting body tissue, using a surgical needle, and suturing a wound.
The implications are staggering. "It's really magical to have this model and all we do is feed it camera input and it can predict the robotic movements needed for surgery," said senior author Axel Krieger, an assistant professor in JHU's Department of Mechanical Engineering. "We believe this marks a significant step forward toward a new frontier in medical robotics."
During testing, something remarkable happened that highlights the autonomous nature of these new physical AI agents: At one point, the grippers accidentally dropped a surgical needle and despite never being explicitly trained to do so, picked it up and continued with its surgical task. This kind of adaptive behavior (learning to handle unexpected situations without explicit programming) is the hallmark of true embodied intelligence.
While some companies chase fully autonomous surgery, Anne Osdoit, CEO of Moon Surgical, is betting on a different vision of physical AI in the operating room. Her approach represents a fascinating middle ground between human expertise and artificial intelligence—one that keeps surgeons at the center while augmenting their capabilities with intelligent robotic assistance.
"What we've done at Moon is different. We basically kept the surgeon at the center of the operating room next to the patient, and we've kept the surgeons' two arms, because the surgeons are trained and they're great and they're proficient and have two functional arms. But we've added to that. We've enhanced the surgeon with an extra set of arms at the bedside that are holding two of the instruments for him and maneuvering them for him in a smart and autonomous way," Osdoit explained in a recent interview.
This collaborative approach to robotic surgery exemplifies what experts call "embodied AI agents"—systems that don't replace human decision-making but enhance it through intelligent physical interaction. "With artificial intelligence, you can tell the scope to follow what the surgeon is doing, so that what's seen on the screen is always relevant and focused on the surgeon's own instruments," Osdoit explained. "And similarly, our second arm typically exposes tissue of interest for the surgeon. The second arm can manage that exposure in a smart way."
The leap from manually controlled surgical robots to autonomous physical AI agents requires sophisticated machine learning architectures. The Johns Hopkins and Stanford researchers achieved this by combining imitation learning—where AI systems learn by watching and mimicking expert demonstrations rather than being explicitly programmed—with the same machine learning architecture that powers ChatGPT. However, where ChatGPT works with words and text, this model speaks "robot" with kinematics, a language that breaks down the angles of robotic motion into math.
Traditional robotic programming required engineers to hand-code every movement, every decision tree, every possible scenario. "Before this advancement, programming a robot to perform even a simple aspect of a surgery required hand-coding every step. Someone might spend a decade trying to model suturing," Krieger said. "And that's suturing for just one type of surgery."
Physical AI changes everything. "What is new here is we only have to collect imitation learning of different procedures, and we can train a robot to learn it in a couple days. It allows us to accelerate to the goal of autonomy while reducing medical errors and achieving more accurate surgery." he added.
Autonomous robotic systems could better meet surgery's demanding requirements for safety and consistency. These robots could manage routine tasks, prevent mistakes, and potentially perform full operations with little human input. The need is urgent: A 2024 report by the Association of American Medical Colleges predicted a U.S. physician shortage that could reach 124,000 doctors by 2034, with surgical specialties particularly affected.
But the path to autonomous surgical robots is fraught with challenges. "Autonomous systems are coming, though we are not there yet," says Prof. Danail Stoyanov of the Wellcome/EPSRC Centre for Interventional and Surgical Sciences at University College London. "They would be hard to regulate, as it is hard to know what to do in terms of managing liability. That level of automation will take much longer."
The consensus among experts is nuanced. The majority of experts in robotic surgery and AI tend to concur that it is unlikely that human surgeons would ever be entirely replaced by an AI-controlled surgical robot. Instead, the future lies in what researchers call collaborative intelligence—partnerships between surgeons and AI-powered robots that combine human judgment with machine precision.
While the da Vinci and Moon Surgical systems represent different approaches to human-AI collaboration, the Smart Tissue Autonomous Robot (STAR) system pushes toward a more radical vision: supervised autonomy in surgery.
The Smart Tissue Autonomous Robot (STAR) autonomously performed laparoscopic surgery in a live animal for the first time in 2020. Unlike the human-controlled da Vinci system or Moon's collaborative bedside assistance, STAR can perform certain surgical tasks with minimal human intervention, using computer vision and machine learning to adapt to tissue deformation and unexpected situations in real-time.
STAR represents the furthest point on the spectrum toward autonomous surgical capability, demonstrating that physical AI agents can handle complex soft-tissue procedures independently while maintaining the precision and adaptability required for successful outcomes.
The convergence of physical AI and surgical robotics points toward operating rooms where autonomous systems don't just assist with individual procedures—they create a global network of surgical intelligence.
Physical AI agents could continuously capture and analyze surgical techniques in real-time, building vast databases of procedural knowledge. Consider how the Johns Hopkins vision-language model learned surgical skills by watching thousands of hours of video: this same principle could operate continuously across every AI-enabled operating room worldwide.
Surgical innovations that currently take years to disseminate through conferences, journals, and training programs could be instantly captured, analyzed, and integrated into AI models. The physical AI revolution promises to democratize surgical expertise, making the accumulated knowledge of the world's best surgeons instantly accessible in rural hospitals and underserved areas wherever advanced robotic systems are deployed.
This vision of AI-enhanced operating rooms and global knowledge sharing raises an important question: where do human surgeons fit in this increasingly automated future? The answer is firmly at the center of surgical care.
Physical AI in surgery is fundamentally designed around human-centered principles that preserve the surgeon's role as the ultimate decision-maker and patient advocate. While autonomous systems excel at precision, consistency, and pattern recognition, surgery requires qualities that remain uniquely human: complex ethical reasoning, adaptability to unexpected complications, empathetic patient communication, and the ability to make split-second decisions in unprecedented situations.
The most successful implementations maintain surgeons at the center of care delivery, using AI to eliminate routine tasks, reduce physical strain, and provide enhanced precision tools. This collaborative approach ensures that as surgical technology evolves, it amplifies rather than diminishes the irreplaceable human elements that define exceptional surgical care.
In 2025, physical AI is transforming surgery through multiple approaches: da Vinci robots learning from surgical videos, Moon Surgical's collaborative bedside assistance, and STAR's supervised autonomy. What began with manually controlled systems requiring hand-coded programming has evolved into intelligent agents that learn from thousands of procedures.
The Johns Hopkins researchers training robots through imitation learning, Anne Osdoit's vision of empowered surgeons, and emerging networks of shared surgical knowledge all point toward collaborative intelligence that amplifies human capabilities rather than replacing them.
As Osdoit envisions, the goal is creating "the empowered surgeon." In operating rooms worldwide, that future is already unfolding, one intelligent procedure at a time.
This article explores the intersection of physical AI, embodied intelligence, and surgical robotics, examining how autonomous systems are transforming medical practice while preserving the essential human elements of surgical care.
Interviews & Expert Commentary:
Research Papers & Academic Sources:
Industry Reports & Market Analysis:
Technical Resources: