1. NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents;Adelberg,1998
2. Extracting content structure for web pages based on visual representation;Cai,2003
3. An efficient language-independent method to extract content from news webpages;Cardoso,2011
4. Approximate statistical tests for comparing supervised classification learning algorithms;Dietterich;Neural Comput.,1998
5. A lightweight and efficient tool for cleaning web pages;Evert,2008