Affiliation:
1. Departamento de Ciência da Computação (UFMG), Brazil
Abstract
Search engines manage several types of challenges daily. One of those challenges is locating relevant content in a Web page. However, the concept of relevance in information retrieval depends on the problem to be solved. For instance, the menu of a website does not impact the results of an algorithm to detect duplicate Web pages. An HTML segmentation algorithm partitions a Web page visually in such a way that parts from a same partition are semantically related. This chapter presents two strategies to segment different types of Web pages.
Reference33 articles.
1. HTML Segmentation Using Entropy Guided Transformation Learning.;E.Amorim;Proceedings of the IADIS Internation Conference www/Internet 2012,2012
2. The Semantic Web
3. Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
4. Transformation-based error-driven learning and natural language processing: A case study in part of-speech tagging.;E.Brill;Computational Linguistics,1995