Principles of corpus querying: A discussion note-Reference-Cited by-同舟云学术

Principles of corpus querying: A discussion note

Published:2022-12-12 Issue:4 Volume:69 Page:599-614
ISSN:2559-8201
Container-title:Acta Linguistica Academica
language:
Short-container-title:ALing

Author:

Sass Bálint¹^ORCID

Affiliation:

1. Hungarian Research Centre for Linguistics, Institute for Lexicology, Hungary

Abstract

AbstractNowadays, it is quite common in linguistics to base research on data instead of introspection. There are countless corpora – both raw and linguistically annotated – available to us which provide essential data needed. Corpora are large in most cases, ranging from several million words to some billion words in size, clearly not suitable to investigate word by word by close reading. Basically, there are two ways to retrieve data from them: (1) through a query interface or (2) directly by automatic text processing. Here we present principles on how to soundly and effectively collect linguistic data from corpora by querying i.e. without knowledge of programming to directly manipulate the data. What is worth thinking about, which tools to use, what to do by default and how to solve problematic cases. In sum, how to obtain correct and complete data from corpora to do linguistic research.

Publisher

Akademiai Kiado Zrt.

Subject

Literature and Literary Theory,Linguistics and Language,Language and Linguistics,Cultural Studies

Link

https://akjournals.com/downloadpdf/journals/2062/69/4/article-p599.xml

Reference23 articles.

1. Representativeness in corpus design;Biber, Douglas,1993

2. Radically truncated clauses in Hungarian and beyond: Evidence for the fine structure of the minimal VP;Halm, Tamás,2021

3. Kálmán, László. 2011. A nyitótövekről [On opening stems]. Nyelv és Tudomány. https://www.nyest.hu/hirek/a-nyitotovekrol.

4. On the role of the agreement morpheme in Hungarian;Kenesei, István,1986

5. Googleology is bad science;Kilgarriff, Adam,2007

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hogy kötőszós inszubordinált mellékmondatok korpuszalapú elemzése;Jelentés és Nyelvhasználat;2024

2. When MIPVU goes to no man’s land: a new language resource for hybrid, morpheme-based metaphor identification in Hungarian;Language Resources and Evaluation;2023-12-09