Affiliation:
1. Vytautas Magnus University , Lithuania
Abstract
Summary
In this paper, we describe a new lexicographic resource for advanced learners of Lithuanian, the Lexical Database of Lithuanian Language Usage, which is the first attempt in Lithuanian lexicography to prepare a description of vocabulary based on the word usage analysis in the particular corpus. The written subpart of the Lithuanian Pedagogic Corpus (approx. 620,000 tokens) was used to develop headword lists and collect word usage information in the form of corpus patterns. In the database, there are 3,700 lexical items, words and multi-word units (compounds, idioms or sayings). For the appr. 700 most frequent words from a shared vocabulary (they appear in texts assigned to A1, A2, B1 and B2 levels, and their frequency in the whole corpus is 100 occurrences and above), we prepared a full-record entry: it includes sense-related corpus patterns with grammatical, semantic and lexical information and the examples illustrating all pattern components. The short-record entry (no patterns, only examples) is prepared for the less frequent words from the shared vocabulary, which are derivationally related to the most frequent headwords. The users are provided with 2,542 derivatives, which are linked to 940 headwords. In the database, 28,550 encoding examples are manually selected for all 3,000 headwords and 700 phrases. We discuss the features of the database, and, particularly, the adopted semi-automated procedure of Corpus Pattern Analysis, which was used for the description of word usage. We evaluate the approach applied, and discuss its advantages for users as well as provide the suggestions for the future improvements of the resource, which can be used as an additional resource in the classroom of Lithuanian as a foreign language, and, together with the available corpora, fill in a gap of usage information in the existing (learner) dictionaries.
Subject
Linguistics and Language,Language and Linguistics
Reference41 articles.
1. Barclay, S., & Schmitt, N. (2019). Current Perspectives on Vocabulary Teaching and Learning. In X. Gao (Ed.), Second Handbook on English Language Teaching (pp. 799–819). Springer.10.1007/978-3-030-02899-2_42
2. Bielinskienė, A., Kovalevskaitė, J., & Rimkutė, E. (2021). Grammatical Patterns in the Corpus-Driven Lexical Database of Lithuanian. Spausdinama: Language: Meaning and Form 12 (Valoda: nozīme un forma 12).10.22364/vnf.12.01
3. Bielinskienė, A., Boizou, L., Bumbulienė, I., Kovalevskaitė, J., Krilavičius, T., Mandravickaitė, J., Rimkutė, E., & Vilkaitė-Lozdienė, L. (sud.). (2019). Lietuvių kalbos kolokacijų žodynas. Vytauto Didžiojo universitetas. http://mwe.lt/wp-content/uploads/2019/02/zodynas.pdf
4. Boizou, L., Kovalevskaitė, J., & Rimkutė, E. (2020). Lithuanian Pedagogic Corpus: Correlations between Linguistic Features and Text Complexity. The Ninth Conference Human Language Technologies – The Baltic Perspective. In A. Utka, J. Vaičenonienė, J. Kovalevskaitė, D. Kalinauskaitė (Eds.), Frontiers in Artificial Intelligence and Applications 328 (pp. 233–240). IOS Press. https://doi.org/10.3233/FAIA200628
5. Brezina, V., & Gablasova, D. (2015). Is There a Core Vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1–22.10.1093/applin/amt018