Automatic Language Identification in Texts: A Survey-Reference-Cited by-同舟云学术

Automatic Language Identification in Texts: A Survey

Published:2019-08-25 Issue: Volume:65 Page:
ISSN:1076-9757
Container-title:Journal of Artificial Intelligence Research
language:
Short-container-title:jair

Author:

Jauhiainen Tommi,Lui Marco,Zampieri Marcos,Baldwin Timothy,Lindén Krister

Abstract

Language identification (“LI”) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used in the LI literature. We describe the features and methods using a unified notation, to make the relationships between methods clearer. We discuss evaluation methods, applications of LI, as well as off-the-shelfLI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 37 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Features and Methods;Automatic Language Identification in Texts;2024

2. Introduction to Language Identification;Automatic Language Identification in Texts;2024

3. Semantic Similarity of Common Verbal Expressions in Older Adults through a Pre-Trained Model;Big Data and Cognitive Computing;2023-12-29

4. An Evaluation of Conditional Random Fields in Predicting Out-of-Home Activities;2023 IEEE International Smart Cities Conference (ISC2);2023-09-24

5. Automatic language identification: a case study of Pahari languages;Language Resources and Evaluation;2023-05-12