Affiliation:
1. Pontificia Universidad Católica de Valparaíso
Abstract
AbstractWe propose a method for the automatic induction of categories of Spanish discourse markers using parallel corpora, based on a quantitative and empirical approach that minimises explicit linguistic knowledge. We conducted the analysis the using a large Spanish-English parallel corpus. First, we used this corpus to obtain a list of parenthetical discourse markers in each language. Then, we used it as a “semantic mirror”, inspecting the English equivalences and assessing which Spanish discourse markers fulfil a similar function in discourse and vice versa. The result of this procedure is an emerging categorisation of discourse markers. The main contribution is to offer empirical evidence for the adequacy of existing manually-compiled taxonomies and the potential for discovery of new, unaccounted categories. In this article we focus on units pertaining to the Spanish language but, since the method is purely quantitative, it is possible to apply it to different languages as well.
Publisher
John Benjamins Publishing Company
Subject
Linguistics and Language,Language and Linguistics