Despite the Internet’s decentralised and seemingly unorganised nature, scientists at the NEC Research Institute have discovered that the Web is in fact naturally self-organising.
This discovery led the scientists to develop an algorithm that may change the way that companies segment and target specific online audiences.
The complete findings of the NEC Research Institute are published in the March 2002 issue of the IEEE Computer Society’s, Computer Magazine.
The scientists’ research shows that the Internet’s structure of ‘clickable’ links within web pages allows for identification of communities based on specific topics of interest. These communities are considered to be natural, in that independently authored pages collectively organise them. This research is particularly significant given the fact that no central authority or process governs the formation and structure of web pages and links.
Once affirming the Internet’s self-organising properties, Dr. Gary W. Flake and Dr. Steve Lawrence, research scientists at the NEC Research Institute, teamed with Dr. C. Lee Giles, a professor at Penn State University and Dr. Frans M. Coetzee, chief technical officer of GenuOne, to develop the community algorithm.
This algorithm enables businesses and individuals to zero in on specific information by focusing on communities of web pages that are related to one-another.
For example, an individual wishing to study the latest scientific findings on breast cancer research is able to locate medical literature, treatments and new developments without wading through the pages of irrelevant material that a normal Internet search on the subject might produce. This is possible because NEC’s algorithm uses link information to generate its results, rather than specific text that may appear on countless web pages.
‘We have found that a web author’s creation of a specific web link is a stronger indication of relevance than the implied relevancy generated by simple textual phrase and structure matching,’ said Dr. Flake. ‘Additionally, separating link structure from content facilitates using content-based similarity measures to independently validate the relevancy of the results that our algorithm produces.’
In addition to enabling companies to more effectively target key audiences, NEC believes that other applications of its methodology include improved search engines, content filtering, objective analysis of Internet content and relationships between web communities.
‘Our process holds the potential for the development of specialised search engines capable of identifying only pages within their domains,’ said Dr. Flake. ‘In addition, this development may lead to the creation of web filtering software that identifies certain communities of pages to be filtered for either relevant or undesirable content.’
The community algorithm takes a set of base web sites as input and identifies a larger community of web pages that contain the base web sites. NEC researchers define a web community as a collection of web pages that have more links within the community than outside of the community. Thus, each member of the identified community will typically be focused on a single topic regardless of textual ambiguities.