Identifying dictionary-relevant formulaic sequences in written and spoken corpora-Reference-Cited by-同舟云学术

Identifying dictionary-relevant formulaic sequences in written and spoken corpora

Published:2020-04-13 Issue:4 Volume:33 Page:417-442
ISSN:0950-3846
Container-title:International Journal of Lexicography
language:en
Short-container-title:

Author:

Dobrovoljc Kaja¹^ORCID

Affiliation:

1. Faculty of Arts, University of Ljubljana; Jozef Stefan Institute, Ljubljana, Slovenia

Abstract

Abstract In view of the pervasiveness of formulaic language in human communication and the growing awareness of its relevance to modern lexicography, this study presents a corpus-driven identification, analysis and comparison of dictionary-relevant formulaic sequences in reference corpora of written and spoken Slovenian. The sequences were identified using a semi-automatic approach, whereby the most frequently recurring word combinations in each corpus were ranked according to their statistical salience and manually inspected for formulaic expressions with lexicographic relevance. Despite its semantic heterogeneity, the resulting list illustrates the distinct characteristics of formulaic multi-word expressions, such as high frequency of usage, prevalent inclusion of grammatical words and common non-propositional meaning, especially in speech, where research revealed numerous understudied formulaic expressions related to interaction management and mitigation. The final evaluation of measures used in the identification process demonstrates their relative suitability for corpus-driven identification of dictionary-relevant formulaic expressions, with their precision varying in relation to corpus size and length of sequences under investigation.

Funder

Slovenian Research Agency

Language resources and technologies for Slovene language

Publisher

Oxford University Press (OUP)

Subject

Language and Linguistics

Link

http://academic.oup.com/ijl/article-pdf/33/4/417/35002630/ecaa008.pdf

Reference64 articles.

1. ‘A Lexicographical Perspective on the Classification of Multiword Combinations’;Bergenholtz;International Journal of Lexicography,2013

2. ‘Evaluating the Frequency Threshold for Selecting Lexical Bundles by Means of an Extension of the Fisher’s Exact Test’;Bestgen;Corpora,2018

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Stylistic Labels <i>Bookish</i> and <i>Colloquial</i> in Phraseological Dictionaries;NSU Vestnik. Series: Linguistics and Intercultural Communication;2024-02-08

2. A novel frequency-range analysis (FRA) method for determining critical words among English high-stakes tests;Journal of Intelligent & Fuzzy Systems;2023-12-02

3. The Treatment of Academic Lexical Bundles in Online English Monolingual Learners’ Dictionaries;International Journal of Lexicography;2022-01-11