Working on the Web in Iowa

University of Iowa researcher Filippo Menczer is exploring the use of mathematical models that could help engineers of Internet search engines create better ways to hunt down the pages that Web users want to access.

Menczer, an assistant professor of Management Sciences, examined a sample of 150,000 web pages, studying the relationships between text, links and meaning. He analysed almost 4 billion pairs of pages with similarities. With this huge body of data, Menczer was able to discover a mathematical power-law relationship between link probability and similarity of language across web pages.

It’s claimed that his model is the first to give accurate predictions of Web link structure and growth based on the content of the Web pages. Other Internet models have assumed that a Web page author has knowledge of every Website’s popularity, and chooses his links based on that knowledge. But Menczer says authors link to the best and most popular pages within the same category. This creates a small Web between pages with similar topics, like books or a hobby. Menczer’s model of this process closely matches what is seen on the real Internet.

The model may help Internet developers gain a better understanding of the evolving structure of the Web and its cognitive and social underpinnings. This may, in turn, lead to more effective authoring guidelines as well as improved ranking, classification, and clustering algorithms used in Web search engines.

‘Hopefully, by analysing the relationship between meanings of a page, links, and words, we will be able to determine how to use these cues to find better search results,’ said Menczer.

On the web