A question of semantics

Invention machine has developed software that takes the pain out of the information overload – Jon Excell explains.

We should all know by now what a great source of information the internet is. However, with an estimated 5 million new pages added every day, there’s far too much to take in, and trying to find out how, for example, to reduce vibration in an engine part, can be like looking for a needle in a haystack.

Now, knowledge processing expert, Invention Machine, has developed a software tool that not only looks for relevant web pages, but also reads them for you.

The project started back in 1992, with Russian scientists whose brief was to report to the government on technological developments in the West. In the process of their work, the Russians realised that as well as extracting knowledge from different patents, they could also extract solutions to problems across a range of industries.

This led to the creation of a large database on how to do different things (for example, how to cool different liquids) which was really the genesis of Invention Machine’s CoBrain knowledge capturing software, a semantic tool that reads and understands documents. According to the company’s UK Operations Director Derek Kilroe ‘CoBrain is like an engineer, it can read a document and extract problems and solutions from the documents.’

The company began with the premise that the only common thing people would have in the aerospace and pharmaceutical environments is a problem statement. For example, they may both have problems relating to reducing viscosity.

The software has a pre-filter that locates the beginning and end of a sentence. Using highly sophisticated algorithms, the tool then finds in the sentence a subject, action and object. For example, in the sentence ‘The filter separates the particles’, the object is ‘particles’, the action ‘separate’, and the subject (or solution) is ‘filter’. After this decoding, semantic processing then compiles the information into whatever format the user has asked for.

To illustrate how it works, we performed a simple test, asking www.ask.co.uk ‘How do you chill Champagne?’ The result was predictable: thousands of pages containing references to Champagne, but not necessarily the information we were after.

A search using CoBrain, however, was far more fruitful. The software recognised that this sentence contains an object and an action and then extracted a number of statements relating to methods of chilling champagne, with hyperlinks to the exact point in the document where the statement came from.

While there are a number of search engines around that purport to be question and answer machines (www.ask.co.uk), Kilroe is adamant that ‘CoBrain is the only commercially available question and answer engine.’

This distinction is an important one. Whilst altavista, google et al return millions of results in seconds, CoBrain appears to take much longer. However, in 50 minutes, the software will find and read 1000 full text patents, a job that would take a team of engineers weeks.

From a design engineer’s perspective, the advantages are obvious. It’s estimated that an R&D professional spends about 10% of his time searching and 33% of it reading – CoBrain is designed to address this part of the development process.

Configured to recognise synonyms and with a database of over 80,000,000 word and word group combinations, CoBrain draws information from all of the main search engines, a number of patent servers and over 700 sites in vertical sectors.

Customers can also buy a corporate version of the software which will not only carry out the web search but also analyse the company’s intranet and create a structured, knowledge-base that can be shared across the whole corporation.

Over a year on from its launch, CoBrain already has clients in the microelectronics, automotive, petrochemical and food sectors, a great deal of interest from the medical sector and the growing potential to forever change the way that everyone searches for information in the web.