Author:
Fernandes Eraldo R.,Milidiú Ruy L.,Rentería Raúl P.
Abstract
Abstract
We propose RelHunter, a machine learning-based method for the extraction of structured information from text. RelHunter’s key idea is to model the target structures as a relation over entities. Hence, the modeling effort is reduced to the identification of entities and the generation of a candidate relation, which are simpler problems than the original one. RelHunter fits a very broad spectrum of complex computational linguistic problems. We apply it to five tasks: phrase chunking, clause identification, hedge detection, quotation extraction, and dependency parsing. We compare RelHunter to token classification approaches through several computational experiments on seven multilingual corpora. RelHunter outperforms the token classification approaches by 2.14% on average. Moreover, we compare the derived systems against state-of-the-art systems for each corpus. Our systems achieve state-of-the-art performances for three corpora: Portuguese phrase chunking, Portuguese clause identification, and English quotation extraction. Additionally, the derived systems show good quality performance for the other four corpora.
Publisher
Springer Science and Business Media LLC
Reference27 articles.
1. Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21(4):543–565
2. Buchholz S, Marsi E (2006) CoNLL-X shared task on multilingual dependency parsing. In: Proceedings of the tenth conference on computational natural language learning, New York, USA, pp 149–164
3. Carreras X, Màrquez L, Punyakanok V, Roth D (2002) Learning and inference for clause identification. In: Proceedings of the thirteenth European conference on machine learning, pp 35–47
4. Carreras X, Màrquez L, Castro J (2005) Filtering-ranking perceptron learning for partial parsing. Mach Learn 60(13):41–71
5. de La Clergerie É, Sagot B, Stern R, Denis P, Recourcé G, Mignot V (2009) Extracting and visualizing quotations from news wires. In: Proceedings of the 4th language and technology conference, Poznań, Poland, November
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. HTML Segmentation for Different Types of Web Pages;Advances in E-Business Research;2015
2. Introduction;Entropy Guided Transformation Learning: Algorithms and Applications;2012