Deep mineworkings

Manufacturing companies now automatically collect and store vast amounts of electronic data about their operations. Increasingly, they are asking whether it offers any insights into their business and are turning to a mathematical analysis technique called data mining to provide an answer. Data mining identifies patterns or correlations in data that previously are unlikely to […]

Manufacturing companies now automatically collect and store vast amounts of electronic data about their operations. Increasingly, they are asking whether it offers any insights into their business and are turning to a mathematical analysis technique called data mining to provide an answer.

Data mining identifies patterns or correlations in data that previously are unlikely to have been known and that can be used to improve performance. For example, it might be used to identify product factors that lead to sales success in certain markets or, at a more detailed level, to identify machine breakdown patterns on a production line (see box).

In some cases, a single data mining exercise might yield the desired result. Tony Waller, head of data mining services at software development firm Lanner Group, cites the case of a chemicals manufacturer that wants to determine the correct concentrations of raw materials to be used in a process. Lanner launched a data mining service earlier this year based on techniques similar to those used in its manufacturing process simulation optimisation program, Witness Optimiser.

Much of the time, data mining leads to the development of an analysis tool. For example, a method has been developed which learns the operating conditions and status of chemicals and integrated circuit plants and is able to control their output. Du Pont reports savings of $500m a year from use of this method.

Data mining is becoming more accessible. A variety of mathematical algorithms exist to identify patterns and, for relatively small databases, they can be used on a PC. However, according to data mining experts, such as Professor Vic Rayward-Smith of the School of Information Systems at the University of East Anglia, it is part of a process involving a number of important preparatory steps.

The University of East Anglia is one of several UK academic institutions with expertise in data mining. Others include the universities of Birmingham, Portsmouth and Ulster, and Imperial, Birkbeck and University Colleges in London.

The first step in data mining is to identify a business problem. Then the availability and accuracy of the data has to be investigated. It is also important to understand the data and its properties, stresses Rayward-Smith.

`One of the big issues in data mining is the kind of expertise required,’ he says. `Do you need an expert on the data who understands what it all means or an expert in data mining techniques who can apply the best methods to find the most significant patterns ? You really need both skills but, if the right software is available to offer support within the application area, end users are capable of undertaking data mining projects.’

To achieve a better understanding of data in the early stages, data visualisation and statistical analysis can be used to show key trends. `Both techniques play an important role,’ says Rayward-Smith.

The data then has to be converted into a format suitable for the data mining algorithms to be used. This may involve bringing together data from a number of databases, each with a different format. The easiest type of data to mine is highly accurate and reliable, and in a flat-file format (the simplest way of storing data on a database). Relational and distributed databases, along with low-quality databases, require much more pre-processing.

`The significance of the results is closely related to the quantity and quality of data being mined,’ says Rayward-Smith. `In an ideal world we would have a complete database of approximately 100,000 records of 20 fields each. Life isn’t like that. We get huge variations in numbers of records (from a few hundred to many millions), in the number of fields (from five to many thousands) and in data quality. But techniques exist to handle these scenarios.’

Data mining is like looking for a needle in a haystack, says Waller. But he warns that the search is based on heuristics and it may not uncover all the factors involved in solving a particular problem.

A list of UK data mining consultants can be found at: http://home. clara.net/imclaren/conlinks.html