Affiliation:
1. Department of Literature, Art and Media Studies, Universität Konstanz , Konstanz, Germany
Abstract
Abstract
The manuscript tradition of pre-modern texts poses a specific problem for scholars in the field of Digital Humanities: before printing made the production of standardized editions of texts feasible, copying texts by hand (and often by different people) was inherently an error-prone process, which not only led to differences in wording but also in spelling—across multiple transmitted variants. This applies especially to ancient texts, where the temporal distances to the archetypes tend to be fairly large. In computerized research, especially in the case of text matching within the field of citation research and text mining, these differences in wording and spelling—however small they might be—may prevent a successful matching of texts. This case study presents a solution for the problem of textual differences arising from (non-)assimilated prefixes in Latin, a feature where modern editions mostly differ from author to author, but sometimes even between two editions of the same text. With regard to the letters of the church father Jerome as well as Virgil’s Eclogues, Georgics, and Aeneid, two approaches are compared in terms of error rate and efficiency for a given set of prefixes: (1) performing and (2) reversing corpus-wide assimilation. Moreover, the broader implications of the (in-)accessibility of text-critical data in digital editions are discussed. Finally, general desiderata regarding text-critical data for computerized research on classical texts are elaborated.
Funder
German Research Foundation
Publisher
Oxford University Press (OUP)
Subject
Computer Science Applications,Linguistics and Language,Language and Linguistics,Information Systems