Technology Review: Extracting Meaning from Millions of Pages Get Feed

Technology Review: Extracting Meaning from Millions of Pages

Description
A software engine that pulls together facts by combing through more than 500 million Web pages has been developed by researchers at the University of Washington. The tool extracts information from billions of lines of text by analyzing basic relationships between words.

Some experts say that this kind of "automated information extraction" will likely form the basis for far more intelligent next-generation Web search, in which nuggets of information are first gleaned and then combined intelligently.

The University of Washington project represents a scaling up of an existing technology developed there called TextRunner in terms of both the number of pages and the scope of topics that it can analyze.

"The significance of TextRunner is that it is scalable because it is unsupervised," says Peter Norvig, director of research at Google, which donated the database of Web pages that TextRunner analyzes. "It can discover and learn millions of relations, not just one at a time. With TextRunner, there is no human in the loop: it just finds relations on its own."
Original URL