Swarm Algorithms for NLP - The Case of Limited Training Data-Reference-Cited by-同舟云学术

Swarm Algorithms for NLP - The Case of Limited Training Data

Published:2019-05-09 Issue:3 Volume:9 Page:219-234
ISSN:2083-2567
Container-title:Journal of Artificial Intelligence and Soft Computing Research
language:en
Short-container-title:

Author:

Tambouratzis George¹,Vassiliou Marina¹

Affiliation:

1. Institute of Language & Speech Processing , RC “Athena” Artemidos 6 & Epidavrou str., 15125 Athens , Greece

Abstract

Abstract The present article describes a novel phrasing model which can be used for segmenting sentences of unconstrained text into syntactically-defined phrases. This model is based on the notion of attraction and repulsion forces between adjacent words. Each of these forces is weighed appropriately by system parameters, the values of which are optimised via particle swarm optimisation. This approach is designed to be language-independent and is tested here for different languages. The phrasing model’s performance is assessed per se, by calculating the segmentation accuracy against a golden segmentation. Operational testing also involves integrating the model to a phrase-based Machine Translation (MT) system and measuring the translation quality when the phrasing model is used to segment input text into phrases. Experiments show that the performance of this approach is comparable to other leading segmentation methods and that it exceeds that of baseline systems.

Publisher

Walter de Gruyter GmbH

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Hardware and Architecture,Modeling and Simulation,Information Systems

Link

https://www.sciendo.com/pdf/10.2478/jaiscr-2019-0005

Reference27 articles.

1. [1] D. Klein and C. D. Manning, A generative constituent-context model for improved grammar induction, Proceedings of 40th ACL Meeting, Philadelphia, USA, pages 128–135, July 2002.

2. [2] D. Klein and C. D. Manning, Corpus-based induction of syntactic structure: Models of dependency and constituency, Proceedings of 42nd ACL Meeting, Barcelona, Spain, pages 478–485, July 21-26, 2004.

3. [3] Y. Seginer, Fast unsupervised incremental parsing, Proceedings of 45th ACL Meeting, Prague, Czech Republic, pages 384–391, June 2007.

4. [4] E. Ponvert, J. Baldridge, and K. Erk, Simple unsupervised grammar induction from raw text with cascaded finite state models, Proceedings of 49th ACL Meeting, Portland, Oregon, USA, pages 1077–1086, 2011.

5. [5] D. Yarowsky and G. Ngai, Inducing multilingual PoS taggers and np bracketers via robust projection across aligned corpora, Proceedings of NAACL-2001 Conference, Pittsburgh, PA, USA, pages 200-207, 2-7 June 2001.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Population Management Approaches in the OPn Algorithm;Artificial Intelligence and Soft Computing;2021

2. Dynamic Signature Vertical Partitioning Using Selected Population-Based Algorithms;Artificial Intelligence and Soft Computing;2021

3. FastText and XGBoost Content-Based Classification for Employment Web Scraping;Artificial Intelligence and Soft Computing;2020

4. A Markov Process Approach to Redundancy in Genetic Algorithms;Artificial Intelligence and Soft Computing;2020

5. A Population-Based Method with Selection of a Search Operator;Artificial Intelligence and Soft Computing;2020