Affiliation:
1. Faculty of Arts, University of Ljubljana; Jozef Stefan Institute, Ljubljana, Slovenia
Abstract
Abstract
In view of the pervasiveness of formulaic language in human communication and the growing awareness of its relevance to modern lexicography, this study presents a corpus-driven identification, analysis and comparison of dictionary-relevant formulaic sequences in reference corpora of written and spoken Slovenian. The sequences were identified using a semi-automatic approach, whereby the most frequently recurring word combinations in each corpus were ranked according to their statistical salience and manually inspected for formulaic expressions with lexicographic relevance. Despite its semantic heterogeneity, the resulting list illustrates the distinct characteristics of formulaic multi-word expressions, such as high frequency of usage, prevalent inclusion of grammatical words and common non-propositional meaning, especially in speech, where research revealed numerous understudied formulaic expressions related to interaction management and mitigation. The final evaluation of measures used in the identification process demonstrates their relative suitability for corpus-driven identification of dictionary-relevant formulaic expressions, with their precision varying in relation to corpus size and length of sequences under investigation.
Funder
Slovenian Research Agency
Language resources and technologies for Slovene language
Publisher
Oxford University Press (OUP)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献