DirKorp-Reference-Cited by-同舟云学术

DirKorp

Published:2023-09-12 Issue:1 Volume:11 Page:189-217
ISSN:2335-2736
Container-title:Slovenščina 2.0: empirical applied and interdisciplinary research
language:
Short-container-title:SLO2.0

Author:

Bago Petra,Karlić Virna

Abstract

In this paper, we present recent developments on a new version (v3.0) of DirKorp (Korpus direktivnih govornih činova hrvatskoga jezika), the first Croatian corpus of directive speech acts developed for the purposes of pragmatic research. The corpus contains 800 elicited speech acts collected via an online questionnaire with role-playing tasks, a method of simulated communication that is implemented under pre-set conditions. This method is suitable for researching speech acts due to the ability to collect a great number of examples of such acts of equal propositional content and illocutionary purpose used in the same controlled situations. The presented situations are classified into two categories with regard to the relationship between the participants of the communication act: (1) situations involving interlocutors who are not in a familiar relationship; (2) situations involving interlocutors in a familiar relationship. Assignments of the two categories are organized into four pairs, asking respondents to share a speech act of similar propositional content. The respondents were 100 Croatian speakers, all undergraduate (63%) or graduate students (37%) of the Faculty of Humanities and Social Sciences (University of Zagreb). The corpus has been manually annotated on the speech act level, each speech act containing up to 14 features: (1) respondent ID, (2) familiarity/unfamiliarity, (3) utterance type, (4) directive performative verb in 1st person, (5) illocutionary force, (6) propositional content, (7) T/V form, (8) exhortative, (9) lexical marker of request, (10) lexical marker of apology, (11) lexical marker of gratitude, (12) honorific title, (13) grammatical mood, and (14) modal verb in 2nd person. It contains 12,676 tokens and 1,692 types. The corpus is encoded according to the TEI P5: Guidelines for Electronic Text Encoding and Interchange, developed and maintained by the Text Encoding Initiative Consortium (TEI). DirKorp is available for download under the CC BY-SA 4.0 license from GitHub in TEI format. We describe applied pragmatic annotation as well as the structure of the corpus.

Publisher

University of Ljubljana

Subject

Linguistics and Language,Language and Linguistics

Reference50 articles.

1. Allen, J. F., Schubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., …, & Traum, D. R. (1995). The TRAINS Project: A Case Study in Building a Conversational Planning Agent. Journal of Experimental & Theoretical Artificial Intelligence, 7(1),7–48.

2. Alsop, S., & Nesi, H. (2013). Annotating a Corpus of Spoken English: The Engineering Lecture Corpus (ELC). In Proceedings of GSCP 2012: Speech and Corpora (pp. 58–62). Firenze University Press, Florence.

3. Alsop, S., & Nesi, H. (2014). The Pragmatic Annotation of a Corpus of Academic Lectures. In The International Conference on Language Resources and Evaluation 2014 Proceedings (pp. 1560–1563). Reykjavik: European Language Resources Association.

4. Anderson, A. H., Bader, M., Gurman Bard, E., Boyle, E., Doherty, G., Garrod, S., Isard, S., …, & Weinert, R. (1991). The HCRC Map Task Corpus, Language and Speech, 34(4), 351–366.

5. Austin, J. L. (1962). How to Do Things with Words. Oxford: Clarendon Press.