Mobile phone camera used for complex 3D modelling

Large spaces can be reconstructed with photo-like accuracy using just the camera on a mobile phone, thanks to a three dimensional modelling system developed at Oxford University.

The system, known as InfiniTAM, could allow a handheld camera to scan a complex environment and instantly build a 3D model. It could be used in virtual reality or augmented reality games such as Pokemon Go, or for industrial applications such as surveying buildings, processing plants or oil rigs.


Existing systems designed to carry out 3D reconstruction on mobile devices have tended to have poor accuracy, according to Dr Victor Prisacariu, principal investigator in the Active Vision Lab at Oxford University.

“This [system] opens up the ability to reconstruct large spaces very quickly on your mobile device,” he said. “One possibility would be to take one on to a submarine and map the ocean floor, for example.”

For augmented reality games such as Pokemon Go, the system could allow the games to reconstruct and then interact with their local environment, for example by enabling the virtual creatures to hide behind trees or jump out of lakes, Prisacariu said.

The InfiniTAM system is based on the use of a camera that produces depth information, such as the stereo cameras used on Apple’s iPhone 7, or Microsoft’s Kinect.

The system integrates this real-time depth information with tracking data on the position of the camera itself, allowing it to determine its own location and update the 3D map as it moves around.

To minimise the amount of processing power needed to produce the reconstruction, and thereby allow InfiniTAM to operate on a handheld device, the system only allocates memory to those surfaces that are currently visible in the scene before it, and disregards those elsewhere in the map. This decreases the complexity of the reconstruction task, said Prisacariu.

To determine its own location, the system uses the 3D map of the scene to constantly compare the reconstruction it has produced with the real-time scene from the camera itself. In this way it is able to compare what it calculates should be visible in the scene, to what is actually there, and can adjust its positioning information accordingly.