The dawn of multi-touch

8 min read

Once the height of computer hardware sophistication, the traditional mouse could eventually be replaced by interface systems using just gentle finger movements and hand gestures. Jon Excell and Niall Firth report.

It was carved out of wood, had only one button, and was arguably one of the great inventions of the 20th century.

The world’s very first computer mouse, invented in the US in 1964 by Douglas Engelbart of the Stanford Research Institute, was instrumental in transforming computers from highly specialised scientific tools into the user-friendly devices that they are today. While early computer users had to learn the lingo like a visitor to a foreign country, Engelbart’s device paved the way for the point-and-click environment that lets us do everything from browsing the internet to playing games.

But the dear old mouse’s days may be numbered. A host of new computer interface systems are under development that use gentle finger movements and intuitive hand gestures to replace the RSI-inducing clicks and taps of traditional user interfaces. Many commentators believe these technologies will free us from the constraints of the traditional PC environment, usher in a new era of ubiquitous computing and take our interactions with the electronic world to new, unimagined levels.

Consumers got perhaps their first proper glimpse of this brave new world earlier this year, when Apple unveiled its iPhone. The company’s web-enabled, media playing, mobile phone sparked a bout of unprecedented global-salivation as gadget fiends pored over the product’s numerous features and pencilled its summer launch date in to their diaries.

At the heart of iPhone’s appeal lies a slick-looking touch-screen display. Trumpeted by the company as the most revolutionary interface development since the mouse, this multi-touch display allows users to flick through album covers, manipulate images, resize them, and zoom in and out using nothing more than a series of finger movements.

Though Apple is tight-lipped over the design of this screen, it seems likely that it is based on technology developed by a small display company acquired by Apple two years ago.

The organisation, Fingerworks, was something of a multi-touch pioneer and won an enthusiastic online army of admirers for TouchStream, a touch-sensitive keyboard that enabled particular finger movements to be interpreted as commands. It worked by measuring the disruptions caused by hand and finger movements to an electric field generated by the pad’s sensor array.

This technology made it possible to process information from multiple points on the screen. This meant, for example, that placing three or more fingers on the pad and twisting to the left would cause an open file command to appear, while twisting right would close the selected object. The system even introduced technology where spreading your fingers and closing them together would cause you to zoom in and out of an image — one of iPhone’s most superficially impressive features.

Meanwhile, Apple’s nemesis, Microsoft, is busy developing its own take on multi-touch technology. Andy Wilson of Microsoft Research has developed a novel, interactive display technology, where the outputs of two video cameras behind a transparent projection display are combined to produce an image of objects on the display surface.

Wilson said the idea behind the system, known as Touchlight, is to make everyday surfaces such as walls and tables interactive.

He believes the technology has implications for a future of ubiquitous computing in which potentially any surface is an input and computation device and the very displays we use and spaces we inhabit are aware of our presence.

The current system uses a pair of small, webcam-sized infrared video cameras mounted behind a transparent display surface. This surface is coated with a refractive, holographic film that diffuses light coming from a correctly positioned rear projector but allows all other light to pass straight through it.

So the cameras are able to see through the display but because they are infrared, they can’t see what’s being projected on the display at the same time. The touch image is produced by applying sophisticated image processing algorithms to the information gathered by the cameras. ‘The idea is to take those images from the camera and then figure out what the user is doing in front of the display. That’s it,’ said Wilson.

While the technology offers similar levels of interaction to that used on iPhone it is now treading a different path after Microsoft licensed the system to a company called Eon Reality, which is developing the system for use in high-end presentation systems.

But Wilson is investigating a range of other uses. ‘We’re trying to figure out the sort of things that make sense for two-handed input,’ he said.

One particularly impressive application is using the technology to manipulate an on-screen map. ‘You can do this very easily by moving your hands around rather analogously in the way you might interact with a real map,’ said Wilson.

A big advantage of the Touchlight system is that because it recognises gestures, the user doesn’t actually have to be touching the screen. Wilson’s group is working on methods of computing the 3D position of a user’s hands, so that not only can it sense when hands are placed on the surface but it can also sense where they are in 3D space. He said this capability could enable users to interact with 3D objects in a variety of interesting ways.

And, depending on how it is configured, a user could interact with the system from the other side of the room, opening up still more applications. ‘Potentially if you were to walk into your office and look in the direction of Touchlight it could recognise you at that moment and bring over your documents, or figure out where your eyes are and render an appropriate viewpoint based on that,’ said Wilson.

He added that the technology could also be used as the basis for a new, improved video conferencing system. Most existing set-ups use a camera mounted just above the display. This accounts for the disconcerting off-target eye contact associated with such systems. Wilson said that with Touchlight, because the camera is directly behind the display, it is possible to set up a video conferencing system that tracks where you are and always places the graphic of the person you are talking to directly between the camera and your eye.

Despite this early promise though, Wilson thinks the technology is a long way from fulfilling its potential. ‘We’re playing at the moment and aren’t exactly sure what the killer app is yet,’ he said. ‘I think there are some interesting advantages to it — but I don’t think it has really been thought out what they are. Nobody has a really good handle on this multi-touch stuff yet as far as I’m concerned — right now it’s about this demo of moving and spinning things and scaling things up and down, which is nice but I hope there will be more to it than that.’

One person that has perhaps exploited the eye-catching potential of multi-touch more than anyone else is Jeff Han, a researcher at New York University’s computer science department.

When he first presented his work at last year’s Technology Entertainment Design Conference in California, the audience — which included some of the biggest hitters in the technology world — gave him the kind of reception usually reserved for a rock star.

‘I think this is going to change the way we interact with computers,’ said Han, wowing the gathered throng as he used nothing more than his fingertips to manipulate images, create moving puppets and swoop through mountainous landscapes.

Using tracking cameras and rear projection, Han’s technology works by interrupting the passage of light supplied by an LED light source through the display screen. As well as being able to tell where the fingers are being placed on the screen, thanks to a phenomenon known as frustrated total internal refraction, it can also gauge the contact pressure and, potentially, even the approach of your hand to the screen.

Han has founded a company called Perceptive Pixel to develop his technology further, and has reportedly already shipped touch screens to sections of the military.


Jeff Han

The other eye-catching development that promises to revolutionise human-computer interaction is gestural technology. In Europe, a totally different kind of computer interface has been developed by Germany’s Fraunhofer Institute for Intelligent Analysis and Information Systems.

Instead of multi-touch, the institute’s virtual environment’s department has come up with a novel technology inspired by an unlikely source: the theremin, the world’s first electronic instrument which uses disrupted radio signals to provide its unusual sound. In a similar fashion, Fraunhofer’s PointScreen technology employs the principle of electric field sensing to produce a gestural interface that operates without the need for touch.

Needing neither expensive cameras nor infrared transmitters, the user’s movements are detected by measuring changes in the electric field that surrounds the human body. Standing on a metal plate, an electric field is generated and the body is used like an antenna.

The movement of the user’s arm modifies the signal slightly, a change that is detected by four electrodes embedded behind a glass-fronted screen. This signal is passed through filters, then amplified and digitised so it can then be used to calculate a position on the screen, allowing users to control a cursor up to a metre away with a simple swing of their arms.

‘PointScreen is completely self-contained, needs no external devices and for users it really does feel just like magic,’ said Fraunhofer’s Predrag Peranovic.

Designed to be no more than a flashy marketing tool at present, Peranovic said that a major German telecommunications company that he was unable to name has already shown interest in using the technology in its shops.

One of the most famous uses of gestural technology came in Steven Spielberg’s 2002 sci-fi film Minority Report, which featured Tom Cruise using an intuitive, gesture-based interface to sort through various clues and documents to solve crimes. John Underkoffler — a Massachusetts Institute of Technology (MIT) graduate who had been experimenting with innovative gesture and light-based technologies — was hired by Spielberg as a technical consultant to help give the film’s futuristic technologies some grounding in reality.

The gestural technology used aroused the interest of defence giant Raytheon, which contracted Unterkoffler to see if he could produce a proof-of-concept system of a real version of the film’s interface for them.

The result, Unterkoffler’s ‘G-speak’ technology, uses special gloves dotted with white reflective beads which are tracked by between six and eight infrared motion-capture cameras positioned around the room. Specific gestures are linked to specific commands, and objects on the screen can be pointed at and moved around.

Unsurprisingly Raytheon was not keen to divulge its plans for the technology, although it is has been reported that it will be used for tasks such as sorting through surveillance data from UAVs and co-ordinating complex battle plans. The system has even acquired a new defence-friendly acronym — IGET, or Interactive Gestural Exploitation and Tools.

Unterkoffler was one of the very first graduates from Tangible Media, a research group at MIT led by Prof Hiroshi Ishii, which specialises in developing a wide-range of innovative human-computer interfaces. While Hollywood laps up Unterkoffler’s visions for the future — he has since worked on the films Hulk and Click, starring Adam Sandler — his former mentor is working on a slightly different take on the future.

While stressing that touchscreens and gestural technologies have their place, Ishii believes that humans often work better when some kind of tactile feedback is involved. ‘Multi-touch screens do leave some ambiguity about whether you have really touched them or not,’ he said. ‘Our interface preference is for physical, tangible objects people can pick up, touch and interact with.’

One of the group’s most successful inventions is the Sensetable interface, which is now a £2.5m ($5m) business in Japan. Sensetable is a flexible system that electromagnetically tracks the movement of ‘dials’ or ‘tokens’ on a tabletop surface in real-time.

Designed for group discussions, Sensetable owes its origins to earlier work undertaken by Underkoffler with the group in 1999 when he designed the Luminous Room — a concept which made every object in a room capable of displaying and collecting visual information. Ishii said that Unterkoffler’s work has inspired a number of other students to develop interactive tabletop interfaces that use real objects.

Despite his role as a pioneer in reinventing the computer screen, Ishii thinks it will still be some time before the traditional holy trinity of computer, mouse and screen are truly challenged. ‘The future of the office is not that you will have one dominant interface for all applications. Bill Gates wants to believe that, but it’s not true,’ he said. ‘You will have different methods of interacting for different things.’

Ishii believes that he and others, such as Unterkoffler and Han, have an important role to play in providing alternative interactive technologies.

‘We are all trying to stretch the boundaries of the human-computer interface and show people different versions of the future. It would be depressing to just believe that what Steve Jobs, Bill Gates or Google say is true instead of focusing on all these different exciting possibilities.’