Researchers have developed visual foresight, a learning technology that enables robots to imagine the outcome of future actions in order to successfully manipulate unfamiliar objects.
The technology from the University of California, Berkeley could one day help self-driving cars anticipate future events on the road and produce more intelligent robotic assistants in homes. The initial prototype, however, focuses on learning manual skills entirely from autonomous play.
Using visual foresight, robots can predict what their cameras will see if they perform a particular sequence of movements. These robotic imaginations are still relatively simple for now – predictions made only several seconds into the future – but they are enough for the robot to ascertain how to move objects around on a table without disturbing obstacles.
The robot can learn to perform these tasks without any help from humans or prior knowledge about physics, its environment or what the objects are. That’s because the visual imagination is learned entirely from scratch from unattended and unsupervised exploration, where the robot plays with objects on a table. After this phase, the robot builds a predictive model of the world, and can use this model to manipulate new objects that it has not seen before.
“In the same way that we can imagine how our actions will move the objects in our environment, this method can enable a robot to visualise how different behaviours will affect the world around it,” said Sergey Levine, assistant professor in Berkeley’s Department of Electrical Engineeing and Computer Sciences, whose lab developed the technology. “This can enable intelligent planning of highly flexible skills in complex real-world situations.”
According to UC Berkeley, a deep learning technology based on convolutional recurrent video prediction – or dynamic neural advection (DNA) – is at the core of the technology.
DNA-based models predict how pixels in an image will move from one frame to the next based on the robot’s actions. Recent improvements to this class of models, as well as greatly improved planning capabilities, have enabled robotic control based on video prediction to perform increasingly complex tasks.
With the new technology, a robot pushes objects on a table, then uses the learned prediction model to choose motions that will move an object to a desired location. Robots use the learned model from raw camera observations to teach themselves how to avoid obstacles and push objects around obstructions.
“Humans learn object manipulation skills without any teacher through millions of interactions with a variety of objects during their lifetime. We have shown that it possible to build a robotic system that also leverages large amounts of autonomously collected data to learn widely applicable manipulation skills, specifically object pushing skills,” said Frederik Ebert, a graduate student in Levine’s lab who worked on the project.
In contrast to conventional computer vision methods, which require humans to manually label numerous images, building video prediction models requires unannotated video, which can be collected by the robot autonomously.
“Children can learn about their world by playing with toys, moving them around, grasping, and so forth. Our aim with this research is to enable a robot to do the same: to learn about how the world works through autonomous interaction,” Levine said.
The research team is scheduled to perform a demonstration of the visual foresight technology at the Neural Information Processing Systems conference in Long Beach, California, on December 5, 2017.