Affiliation:
1. Martin Luther University Halle-Wittenberg, Computer Science Institute , Germany
Abstract
Abstract
In the humanities, text comparison is essential for scholarly editing the traditions of a text with its different witnesses under study. For the computer-aided creation of a critical apparatus, there have been established approaches and tools for many years. However, they mainly focus on subsentences or sentences. An efficient and easy-to-use automatic comparison of entire chapters or books still represents a research desideratum. The Locate, Explore, Retrace and Apprehend complex text variants (LERA) working environment presented here solves this issue. It is based on a two-stage collation approach: an efficient, fully automatic alignment of text segments, which can be paragraphs, subparagraphs or sentences, with interactive post-processing options, followed by the detailed comparison at segment level. Because aligning text segments, such as paragraphs, for more than two text witnesses is algorithmically challenging, we discuss the heuristics we developed in more detail. LERA combines the entire process of document management, tokenization/segmentation, normalization, alignment, and visualization with interactive control options and exploratory tools. It has already been and is being successfully applied in several Digital Humanities projects of different languages, e.g. for Arabic, French, Hebrew as well as German and English texts.
Funder
German Federal Ministry of Education and Research
German Research Foundation
Publisher
Oxford University Press (OUP)
Subject
Computer Science Applications,Linguistics and Language,Language and Linguistics,Information Systems
Reference45 articles.
1. Lazy dynamic-programming can be eager;Allison;Information Processing Letters,1992