Computer scientists from University of Texas at Arlington, USA, are exploring the use of AI and supercomputers for generating synthetic objects to train robots.
William Beksi, assistant professor in UT Arlington’s Department of Computer Science and Engineering and founder of the university’s Robotic Vision Laboratory, is leading the research with a group including six PhD Computer Science students.
Having previously interned at consumer robot producer iRobot, where researchers were interested in using machine and deep learning to train robots, Beksi said he was particularly interested in developing algorithms that enable machines to learn from their interactions with the physical world and autonomously acquire skills necessary to execute high-level tasks.
Where efforts to train robots using images with human-centric perspectives had previously failed, Beksi looked to generative adversial networks (GANs). This involves two neural networks contesting with each other in a game until the ‘generator’ of new data can fool a ‘discriminator’.
Once trained, such a network could enable the creation of an infinite number of possible rooms or outdoor environments, researchers explained, with different kinds of objects identifiable to a person and a robot with recognisable dimensions and characteristics.
“You can perturb these objects, move them into new positions, use different lights, colour and texture, and then render them into a training image that could be used in dataset,” Beksi said in a statement. “This approach would potentially provide limitless data to train a robot on.”
Mohammad Samiul Arshad, a graduate student in the research team, added that manually designing the objects would take a ‘huge amount’ of resources and hours of human labour while, if trained correctly, the generative networks could make them ‘in seconds’.
Beksi and Arshad presented PCGAN – the first conditional GAN to generate dense coloured point clouds in an unsupervised mode – at the International Conference on 3D Vision (3DV) in November 2020. Their paper shows that their network can learn from a training set (derived from ShapeNetCore, a CAD model database) and mimic a 3D data distribution to produce coloured point clouds with fine details at multiple resolutions.
“There was some work that could generate synthetic objects from these CAD model datasets,” Beksi said. “But no one could yet handle colour.”
According to the team, they tested their method using chairs, tables, sofas, airplanes and motorcycles. Their model first learns the basic structure of an object at low resolutions and gradually builds toward high-level details, Beksi explained, with the relationship between the object parts and their colours learned by the network.
After generating 5,000 random samples for each class and performing an evaluation, their findings showed that PCGAN was capable of synthesising high quality point clouds for a disparate array of object classes.
Beksi is also working on an issue known as ‘Sim2real’, which he explained focuses on quantifying the subtle differences in how an AI system or robot learns from real and synthetic training data, aiming to make simulations more realistic by capturing the physics of that scene and by using ray or photon tracing.
Next steps for the team are to deploy the software on a real robot, and see how it works in relation to the sim-to-real gap. While Beksi says the field is still a long way from having robust robots that can be autonomous for long periods of time, doing so would benefit multiple domains including healthcare, manufacturing and agriculture.
The training of the PCGAN model was made possible by TACC’s Maverick 2 deep learning resource, which the team accessed through the University of Texas Cyberinfrastructure Research (UTRC) programme.