Abstract
AbstractWe present a comprehensive and valuable resource in the form of an aligned parallel corpus comprising translations of the Bible in Spanish. Our collection encompasses a total of eleven Bibles, originating from diverse centuries (XVI, XIX, XX), various religious denominations (Protestant, Catholic), and geographical regions (Spain, Latin America). The process of aligning the verses across these translations has been meticulously carried out, ensuring that the content is organized in a coherent manner. As a result, this corpus serves as a useful convenient resource for various linguistic analyses, including paraphrase detection, semantic clustering, and the exploration of biases present within the texts. To illustrate the utility of this resource, we provide several examples that demonstrate how it can be effectively employed in these applications.
Funder
Conahcyt
PASPA-DGAPA-UNAM
Consejo Nacional de Ciencia y Tecnología
Publisher
Springer Science and Business Media LLC
Reference39 articles.
1. Barrón-Cedeño, A., Vila, M., & Rosso, P. (2010). Detección automática de plagio: de la copia exacta a la paráfrasis. Panorama actual de la lingüística forense en el ámbito legal y policial: Teoría y práctica. Jornadas (in) formativas de lingüística forense, pages 76–96.
2. Bhagat, R., & Hovy, E. (2013). What is a paraphrase? Computational Linguistics, 39(3), 463–472.
3. Brown, P. F., de Souza, P. V., Mercer, R. L., Delia Pietra, V. J., & Lai, J. C. (1992). Class-based N-gram models of natural language. Computational Linguistics, 18(4), 467–479.
4. Bruce, R. F., & Wiebe, J. M. (1999). Recognizing subjectivity: A case study in manual tagging. Natural Language Engineering, 5(2), 187–205.
5. Burrows, S., Potthast, M., & Stein, B. (2013). Paraphrase acquisition via crowdsourcing and machine learning. ACM Transactions on Intelligent Systems and Technology (TIST), 4(3), 1–21.