Affiliation:
1. Lehigh University, Bethlehem, PA
Abstract
Classification of Web page content is essential to many tasks in Web information retrieval such as maintaining Web directories and focused crawling. The uncontrolled nature of Web content presents additional challenges to Web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process.
As we review work in Web page classification, we note the importance of these Web-specific features and algorithms, describe state-of-the-art practices, and track the underlying assumptions behind the use of information from neighboring pages.
Funder
Division of Information and Intelligent Systems
Publisher
Association for Computing Machinery (ACM)
Subject
General Computer Science,Theoretical Computer Science
Reference168 articles.
1. Aas K. and Eikvil L. 1999. Text categorisation: A survey. Tech. rep. 941. Norwegian Computing Center Oslo Norway. Aas K. and Eikvil L. 1999. Text categorisation: A survey. Tech. rep. 941. Norwegian Computing Center Oslo Norway.
2. Ranking on graph data
3. The connectivity sonar
4. A neighborhood-based approach for clustering of linked document collections
Cited by
225 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献