Affiliation:
1. Foreign Languages Department Faculty of Maritime Studies University of Rijeka Rijeka Croatia
Abstract
AbstractThe focus of this paper are English words and phrases used in Croatian which, unlike loanwords, have not undergone major adaptations at the orthographic, phonetic, or other levels apart from being influenced by the inflectional system of the recipient language. A list of English words in Croatian corpora was compiled using automatic algorithm extraction, corpus query language inSketch Engine, and manual word list evaluation with the end goal of publishing the first comprehensive online database of English words in Croatian. TheENGRIcorpus of Croatian was created by web crawling procedure and used together with the existing CroatianhrWaC 2.2 RFTaggercorpus to produce a list of English words and phrases. In this paper, word list compilation issues are discussed in relation to both general issues encountered in the study of interlingual lexical types (such as false cognates, antonomasia, and polysemy) as well as Croatian‐specific language properties such as its inflectional system and diacritical marks. In conclusion, we propose that manual evaluation is an indispensable method and a necessary complement to computational linguistic tools in the creation of word lists and databases of foreign words in other languages.
Funder
Hrvatska Zaklada za Znanost
Subject
Linguistics and Language,Language and Linguistics
Reference50 articles.
1. Agić Ž. Ljubešić N. &Merkler D.(2013).Lemmatization and morphosyntactic tagging of Croatian and Serbian. InJ.Piskorski L.Pivovarova H.Tanev &R.Yangarber(Eds.) Proceedings of the 4th biennial international workshop on Balto‐Slavic natural language processing(BSNLP 2013)(pp.48–57).Association for Computational Linguistics.
2. Alex B.(2005).An unsupervised system for identifying English inclusions in German text. InC.Callison‐Burch&S.Wan(Eds.) Annual Meeting of the Association for Computational Linguistics(ACL 2005) (pp.133–138).Ann Arbor.
3. Alvarez‐Mellado E.(2020).An annotated corpus of emerging anglicisms in Spanish newspaper headlines. InT.Solorio M.Choudhury K.Bali S.Sitaram A.Das &M.Diab(Eds.) Proceedings of the 4th workshop on computational approaches to code switching(LREC 2020)(pp.1–8).European Language Resources Association.
4. The Oxford Guide to Practical Lexicography
5. Engleski u hrvatskome: znanstveni izričaj biomedicine i zdravstva;Bogunović I.;Fluminensia,2013