KannadaLex: A lexical database with psycholinguistic information-Reference-Cited by-同舟云学术

KannadaLex: A lexical database with psycholinguistic information

Published:2024-07-12 Issue:7 Volume:23 Page:1-21
ISSN:2375-4699
Container-title:ACM Transactions on Asian and Low-Resource Language Information Processing
language:en
Short-container-title:ACM Trans. Asian Low-Resour. Lang. Inf. Process.

Author:

Aithal Shreya R.¹^ORCID,Sn Muralikrishna²^ORCID,Ganiga Raghavendra³^ORCID,Rao B. Ashwath²^ORCID,Hegde K. Govardhan²^ORCID

Affiliation:

1. Cloud + AI, Microsoft India R&D Pvt Ltd, Bengaluru, India

2. Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India

3. Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India

Abstract

Databases containing lexical properties are of primary importance to psycholinguistic research and speech-language therapy. Several lexical databases for different languages have been developed in the recent past, but Kannada, a language spoken by 50.8 million people, has no comprehensive lexical database yet. To address this, KannadaLex , a Kannada lexical database, is built as a language resource that contains orthographic, phonological, and syllabic information about words that are sourced from newspaper articles from the past decade. Along with these vital statistics such as the phonological neighborhood, syllable complexity summed syllable and bigram syllable frequencies, and lemma and inflectional family information are stored. The database is validated by correlating frequency, a well-established psycholinguistic feature, with other numerical features. The developed lexical database contains 170K words from varied disciplines, complete with psycholinguistic features. This KannadaLex is a comprehensive resource for psycholinguists, speech therapists, and linguistic researchers for analyzing Kannada and other similar languages. Psycholinguists require lexical data for choosing stimuli to conduct experiments that study the factors that enable humans to acquire, use, comprehend, and produce language. Speech and language therapists query these databases for developing the most efficient stimuli for evaluating, diagnosing, and treating communication disorders, and rehabilitation of speech after brain injuries.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3670688

Reference43 articles.

1. WordNet: A Lexical Database Organized on Psycholinguistic Principles*

2. Fuzzy Contrast Set Based Deep Attention Network for Lexical Analysis and Mental Health Treatment

3. Towards Integrated Classification Lexicon for Handling Unknown Words in Chinese-Vietnamese Neural Machine Translation

4. N. Fathy and S. Alansary. 2022. Towards a psycholinguistic database of Arabic. In 20th International Conference on Language Engineering (ESOLEC’22). ACM New York NY 103–108

5. LexiCAL: A calculator for lexical variables