Affiliation:
1. Institute of Language & Speech Processing , RC “Athena” Artemidos 6 & Epidavrou str., 15125 Athens , Greece
Abstract
Abstract
The present article describes a novel phrasing model which can be used for segmenting sentences of unconstrained text into syntactically-defined phrases. This model is based on the notion of attraction and repulsion forces between adjacent words. Each of these forces is weighed appropriately by system parameters, the values of which are optimised via particle swarm optimisation. This approach is designed to be language-independent and is tested here for different languages.
The phrasing model’s performance is assessed per se, by calculating the segmentation accuracy against a golden segmentation. Operational testing also involves integrating the model to a phrase-based Machine Translation (MT) system and measuring the translation quality when the phrasing model is used to segment input text into phrases. Experiments show that the performance of this approach is comparable to other leading segmentation methods and that it exceeds that of baseline systems.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Hardware and Architecture,Modeling and Simulation,Information Systems
Reference27 articles.
1. [1] D. Klein and C. D. Manning, A generative constituent-context model for improved grammar induction, Proceedings of 40th ACL Meeting, Philadelphia, USA, pages 128–135, July 2002.
2. [2] D. Klein and C. D. Manning, Corpus-based induction of syntactic structure: Models of dependency and constituency, Proceedings of 42nd ACL Meeting, Barcelona, Spain, pages 478–485, July 21-26, 2004.
3. [3] Y. Seginer, Fast unsupervised incremental parsing, Proceedings of 45th ACL Meeting, Prague, Czech Republic, pages 384–391, June 2007.
4. [4] E. Ponvert, J. Baldridge, and K. Erk, Simple unsupervised grammar induction from raw text with cascaded finite state models, Proceedings of 49th ACL Meeting, Portland, Oregon, USA, pages 1077–1086, 2011.
5. [5] D. Yarowsky and G. Ngai, Inducing multilingual PoS taggers and np bracketers via robust projection across aligned corpora, Proceedings of NAACL-2001 Conference, Pittsburgh, PA, USA, pages 200-207, 2-7 June 2001.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献