MIT robot uses 3D keypoints for advanced coordination

Researchers at MIT have developed a new robot vision system that allows previously unseen objects to be picked up, moved and placed accurately.

Robots are extremely good at repetitive tasks with little to no variation, but struggle when dealing with added complexity or unfamiliar objects. To assess how objects should be picked up, robots tend to use either pose-based or geometry-based systems. Both methods have limitations, however, especially in the face of everyday tasks like picking up and placing a mug – a seemingly straightforward action that in fact requires advanced coordination and subtlety.

To equip its robot with that subtlety, the team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) used an approach that identifies objects as a collection of 3D keypoints, providing a type of visual roadmap that allows more nuanced manipulation. Known as kPAM (Keypoint Affordance Manipulation), the technique gives the robot all the information it needs to pick up, move and place objects accurately, while also providing enough flexibility to deal with variation between different categories of object, such as different shaped mugs or different styles of shoe.


“Whenever you see a robot video on YouTube, you should watch carefully for what the robot is NOT doing,” said MIT professor Russ Tedrake, senior author on a new paper about the project. “Robots can pick almost anything up, but if it’s an object they haven’t seen before, they can’t actually put it down in any meaningful way.

“Understanding just a little bit more about the object - the location of a few key points - is enough to enable a wide range of useful manipulation tasks. And this particular representation works magically well with today’s...machine learning perception and planning algorithms.”

A dataset is initially used to train the system to identify the keypoints on a given class of object. In lab tests with a multitude of mugs, kPAM needed just three keypoints per mug, consisting of the centre of the mug’s side, bottom and handle. Dealing with a collection of more than 20 shoes ranging from slippers to boots, the system needed six keypoints per object. According to PhD student Lucas Manuelli, kPAM initially couldn’t pick up high-heeled shoes, which the team realised was because there weren’t any examples in the original dataset. By adding a few pairs to the neural network’s training data, the problem was resolved. The MIT team now plans to make the system even better at more general tasks, eventually enabling it to perform intricate operations such as emptying a dishwasher.