The porn breakers

The web may mean free flow of information, but it also allows the transfer of less desirable data. Andrew Lee reports on a system that tracks images, from pornographic photographs to product designs and company logos.

The downside of the desktop revolution is becoming all too apparent. The same technology that allows colleagues across the world to work together on new product designs is also enabling less savoury forms of collaboration.

Twenty years ago any pornography distributed in workplaces was confined to brown paper bags furtively exchanged during lunch hours. Now the IT networks of some major companies are awash with obscene material.

Earlier this year Ford announced an amnesty for UK employees and offered confidential advice on how to remove offensive data from their PCs. The company tactfully suggested that some staff may have ‘accidentally’ stored porn on their desktops, but promised a ‘zero tolerance’ regime once the deadline had passed.

Soon afterwards Hewlett-Packard suspended about 150 staff in the UK and Ireland while it investigated the distribution of ‘unauthorised and inappropriate’ material.Faced with the need for such costly and embarrassing damage limitation exercises, companies are rapidly coming to the conclusion that technology got them into this mess, and technology is going to have to get them out of it.

The problem is that most pornography is about images not words, and screening data for visual content is a far less developed process than screening it for text. However, systems such as the UK-developed Pornsweeper are beginning to appear.

Pornsweeper trawls electronic images looking for the colour, texture and pigment of skin. When it decides the skin content of a picture is above a certain threshold it alerts the system’s administrator. Based on technology developed at the University of East Anglia, Pornsweeper is one of the first commercial products of research into content-based image retrieval (CBIR), an offshoot of the wider machine vision research community.

CBIR systems identify, categorise and manage images by analysing fundamental visual properties down to the level of an individual pixel. They carry out analysis of colours, shapes, textures and backgrounds, using algorithms to recognise patterns and retrieve images that meet set criteria.

Crucially, this removes the need to have a text-based label attached to an image as its main identifier. For example, a user of a CBIR system searching for images of horses could do so without relying on the word ‘horse’ being present anywhere in the data. They could use a photograph of a single animal – or even a line drawing of a horse-like shape – to find others on an image database or the internet.

Defensive systems such as Pornsweeper apply CBIR in reverse. Instead of identifying particular images and retrieving them, they block material they recognise as pornographic.

Its independence from any reference to textual labels makes CBIR a particularly powerful defence against unwanted pornographic images.

While text-based screening may be useful against the casual recipient of obscene material, determined porn propagators are unlikely to attach helpful phrases such as ‘XXX rated’ to their files.

Screening for porn is an early and obvious application of CBIR systems. But those involved in developing the fledgling technology believe it will eventually find a role in a wide diversity of markets.

Getty Images, which manages the world’s largest commercial image database, is an active supporter of efforts by the academic community to develop new CBIR techniques.

Major broadcasters, including the BBC, are also keeping a close eye on developments in the field, eager to find new ways to index and search their vast archives of video material.

Image management is not the only potential role for CBIR. Some researchers are exploring its usefulness as a tool to help mobile robots find their way around.Engineers at the universities of Hamburg and Freiburg have tested a robot that compares the view confronting its on-board camera with similar stored images, allowing it to determine where it is in a known environment such as an office or factory.

Nobody working in the CBIR field would claim that it is anywhere near the finished article, with accuracy rates still too low for the most demanding applications, especially in the field of video. However, a handful of companies have already launched products incorporating the technology.

One of the highest profile is LTU Technologies, a Paris-based company founded by a team of computer vision specialists from MIT, Oxford and the INRIA, a leading French computer science centre.

Co-founder Sebastien Gilles says finding specific roles for CBIR in the commercial market will play an important part in underpinning future research in the field. ‘What we want to avoid being is a technology in search of an application,’ he says. Gilles admits, however, that convincing commercial partners can be difficult.

‘CBIR remains a young technology, and people still see a risk attached to it,’ he says. ‘And computer vision is still some kind of science fiction tool as far as many are concerned.’

Despite this scepticism LTU has managed some breakthroughs, particularly in its home market. Its systems are used by the French criminal intelligence service, probably for facial and fingerprint image retrieval, and by the country’s patent office.

LTU’s image-recognition system is also used by NameProtect, a major US brand asset management company that monitors the intellectual property of its clients.Gilles believes brand protection is one of the areas in which CBIR has a significant role to play. It allows companies to scour the web for unauthorised use of their logo and other copyrighted visual material, an increasing headache for many brand owners.

In the short term he expects such specialist areas – in which CBIR can deliver a ‘quick fix’ – to account for most use of the technology. But he claims: ‘There is no doubt that there will eventually be a mass market for content-based searching. ‘The goal is that at some point it should be as natural to search by image as it is by text.

Sidebar: Mastering the art

A consortium of Europe’s major art galleries and museums is using CBIR technology to create a searchable network of images of their priceless collections.

The Artiste project – which includes the UK’s National Gallery and Victoria and Albert Museum and the Louvre in Paris – has created a system for curators, art historians and restoration specialists.

The database of 60,000 images can be searched by shape, colour, pattern or texture, allowing users to scour the various collections for points of similarity.

For example, if a museum in London acquires a particular vase it can search for objects with a similar shape or decorative pattern elsewhere, helping to establish the identity of the maker and reunite collections that have been split up over the centuries.

Art professionals have already found a host of highly specialised uses for Artiste, including comparison of the fine details of paintings and artefacts with others stored in the database.

They are also using the system to aid restoration work. One algorithm, called pyramid wavelet transform (PWT), is used to look for the distinctive stretching patterns left on the rear of old paintings by the wooden boards on which they were mounted. These can provide vital clues to the age, condition and authenticity of individual works.

Artiste’s algorithms can even work with black and white images, using distribution of brightness and shade to attempt to find a match with others on the database.