SrpELTeC: A Serbian Literary Corpus for Distant Reading
-
Published:2024-06-21
Issue:2
Volume:47
Page:
-
ISSN:2591-1805
-
Container-title:Primerjalna književnost
-
language:
-
Short-container-title:PKn
Author:
Stanković Ranka,Krstev Cvetana,Vitas Duško
Abstract
The article presents SrpELTeC, a corpus developed within the COST action Distant Reading for European Literary History (CA16204). All novels in SrpELTeC were selected, prepared, and annotated using the common principles established for all language collections in the European Literary Text Collection (ELTeC). The challenges and solutions in preparing SrpELTeC from scratch are outlined. All novels were manually encoded in TEI with rich metadata and structural annotation. The automatic annotation included POS-tagging, lemmatization, and named entities, relying on Natural Language Processing resources developed and maintained by the JeRTeh Language Resources and Technologies Society. The integration of SrpELTeC with Wikidata was supported with a set of SPARQL queries for the retrieval of metadata with different visualization options. Recent activities within the COST Action NexusLinguarum—European Network for Web-centred Linguistic Data Science (CA18209) are related to the linked data version of SrpELTeC using the NLP Interchange Format. All versions of SrpELTeC are freely available under the CC-BY license.
Publisher
The Research Center of the Slovenian Academy of Sciences and Arts (ZRC SAZU)