APPLYING SIMILARITY MEASURES FOR AUTOMATIC LEMMATIZATION: A CASE STUDY FOR MODERN GREEK AND ENGLISH-Reference-Cited by-同舟云学术

APPLYING SIMILARITY MEASURES FOR AUTOMATIC LEMMATIZATION: A CASE STUDY FOR MODERN GREEK AND ENGLISH

Published:2008-10 Issue:05 Volume:17 Page:1043-1064
ISSN:0218-2130
Container-title:International Journal on Artificial Intelligence Tools
language:en
Short-container-title:Int. J. Artif. Intell. Tools

Author:

LYRAS DIMITRIOS P.¹,SGARBAS KYRIAKOS N.¹,FAKOTAKIS NIKOLAOS D.¹

Affiliation:

1. Wire Communications Lab, Electrical and Computer Engineering Department, University of Patras, Rion, Patras, GR-26500, Greece

Abstract

This paper addresses the problem of automatic induction of the normalized form (lemma) of regular and mildly irregular words with no direct supervision using language-independent algorithms. More specifically, two string distance metric models (i.e. the Levenshtein Edit Distance algorithm and the Dice Coefficient similarity measure) were employed in order to deal with the automatic word lemmatization task by combining two alignment models based on the string similarity and the most frequent inflectional suffixes. The performance of the proposed model has been evaluated quantitatively and qualitatively. Experiments were performed for the Modern Greek and English languages and the results, which are set within the state-of-the-art, have showed that the proposed model is robust (for a variety of languages) and computationally efficient. The proposed model may be useful as a pre-processing tool to various language engineering and text mining applications such as spell-checkers, electronic dictionaries, morphological analyzers etc.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Artificial Intelligence

Link

https://www.worldscientific.com/doi/pdf/10.1142/S021821300800428X

Reference24 articles.

1. Grundlagen der Computerlinguistik

2. Measures of the Amount of Ecologic Association Between Species

3. How effective is suffixing?

4. Stemming algorithms: A case study for detailed evaluation

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Lemmatization for Ancient Languages: Rules or Neural Networks?;Communications in Computer and Information Science;2018

2. Lemmatization for variation-rich languages using deep learning;Digital Scholarship in the Humanities;2016-08-26

3. A biology-inspired, data mining framework for extracting patterns in sexual cyberbullying data;Knowledge-Based Systems;2016-03

4. Acronym identification in Greek legal texts;Digital Scholarship in the Humanities;2014-03-27

5. BAYESIAN RETRIEVAL USING A SIMILARITY-BASED LEMMATIZER;International Journal on Artificial Intelligence Tools;2012-10