Affiliation:
1. Lomonosov Moscow State University; Institute of Linguistics RAS
2. Lomonosov Moscow State University; Institute of Linguistics RAS; Vinogradov Russian Language Institute RAS
3. Institute of Linguistics RAS
4. Institute of Linguistics RAS; HSE University
5. Institute of Linguistics RAS; Vinogradov Russian Language Institute RAS; University of Cambridge
Abstract
In this article, we discuss the conclusions of the project on digitizing and translating the Historical-Etymological dictionary of Ossetic by V.I. Abaev, which has been carried out by the authors since 2020. The entries of the original include information of several kinds: Russian translations of an Ossetic word, related examples, usage and etymological notes. The etymological notes are usually free-form texts with a large number of cited forms from other languages (cognates, loanword sources, etc.). We argue for maximal preservation and thorough encoding of this information when creating a digital database. The optimal format for such purposes is XML markup, more specifically its variant intended for annotating text documents, TEI (Text Encoding Initiative, https://tei-c.org/). For digitizing the dictionary, we created a modification of this markup language, a module for Oxygen XML Editor, and a set of CSS files that ensure maximum similarity between our layout and the original. The same combination of XML and CSS allows generating the print version of the dictionary using the Prince typesetting system (https://www. princexml.com/) without any additional processing. For ease of search and access, we suggest supplementing the XML sources with a relational database automatically created from the lexical entries. In our project, this database serves as the source for the online interface of the dictionary (https://ossetic.iranic.space), powered by the OnLex platform [Makarov et al. 2022]. Hence, the project combines the features of several database types, which fully corresponds to the multi-faceted character of the original dictionary. We hope that our experience can be used for other digital lexicographic projects.
Funder
Russian Science Foundation
Reference10 articles.
1. 1. Abaev V.I. Istoriko-jetimologicheskij slovar’ osetinskogo jazyka [A historical-etymological dictionary of Ossetic]. Vol. I: A–K’. Moscow and Leningrad, Nauka Publ., 1958. 655 p. Vol. II: L–R. Leningrad, Nauka Publ., 1973. 448 p. Vol. III: S–T’. Leningrad, Nauka Publ., 358 p. 1979. Vol. IV: U–Z. Leningrad, Nauka, 1989. 325 p. Ukazatel’ [Index] (compiled by E. N. Schensnovich, A. V. Lushnikova, L. R. Dodyhudoeva). Moscow, Nauka, 1995. 447 p.
2. 2. Kuznecova A. I., Serdobol’skaja N. V., Usachjova M. N., Birjuk O. L., Idrisov R. I. (eds.). Slovar’ besermjanskogo dialekta udmurtskogo jazyka [A dictionary of the Besermen dialect of Udmurt]. Moscow, Tezaurus Publ., 2013. 257 p.
3. 3. Belyaev O., Khomchenkova I., Sinitsyna J., Dyachkov V. Digitizing print dictionaries using TEI: The Abaev Dictionary Project. In: IWCLUL 2021: The Seventh International Workshop on Computational Linguistics of Uralic Languages. Proceedings of the Workshop. Syktyvkar, Association for Computational Linguistics, 2021.
4. 4. Burnard L., Aston G. The BNC handbook: exploring the British National Corpus. Edinburgh, Edinburgh University Press, 1998. 256 p.
5. 5. Codd E.F. A relational model of data for large shared data banks. Communications of the ACM, 1970, no. 13 (6), pp. 377–387.