What Is in a <unittitle>? Cross-lingual Topic Detection & Information Retrieval in Archives Portal Europe

Author:

Musso Marta1,Arnold Kerstin2,Nanni Federico3,Cannelli Beatrice4

Affiliation:

1. King's College London, London, United Kingdom of Great Britain and Northern Ireland

2. Archives Portal Europe Foundation, The Hague, Netherlands

3. TURING Institute London, United Kingdom of Great Britain and Northern Ireland

4. School of Advanced Studies, London, United Kingdom of Great Britain and Northern Ireland

Abstract

Archives Portal Europe (APE, www.archivesportaleurope.net ) is the portal of European archives, an aggregator that connects on a single research point the catalogues and digitised archival material of all archives in and about Europe. It currently hosts material from more than 30 countries and from a variety of archival institutions (such as State archives, city archives, university and parish archives, private institutions, and more). It is maintained by the Archives Portal Europe Foundation, an international consortium of State archives and other archival institutions that aim to connect the archival material of single institutions into one digital repository to allow universal access to the archival heritage of Europe, promoting new forms of archival research beyond national or local boundaries. One of the research tools made available by Archives Portal Europe is by topics; however, these are currently maintained manually by the archivists, and the vast amount of archival material ingested in the portal makes it impossible to have a comprehensive body of topics that describe the whole of the APE repository. Archives are traditionally not organised by their subject content, but around the entity (person, organization, body) that created and/or collected the documents in the course of their activities. While this is an undisputed pillar of archival management, the availability of online digital repositories for archival research requires new tools for digital archival research, particularly when different archival traditions from different countries and different types of institutions are merged into a unique research portal. Topic detection becomes a fundamental tool to guide archival research and to allow archives to be accessible to potentially world-wide users in a situation where national and linguistics barriers blur or are re-defined. This article presents the preliminary results and plan for future iterations of an AI tool for automated topic detection in a multi- lingual environment, where human-created taxonomies act as bases for the algorithms to aggregate relevant material around a specific topic. The development is based on supervised machine learning, with a combination of human inputs in different languages, and of the usage of Wikipedia pages to model the relevant vocabulary and entities.

Publisher

Association for Computing Machinery (ACM)

Reference47 articles.

1. Archives Portal Europe Foundation. 2020. ArchivesPortalEuropeFoundation/Topic-Detection. Retrieved from https://github.com/ArchivesPortalEuropeFoundation/Topic-Detection

2. Archives Portal Europe Foundation. 2020. How to use topics - Archives Portal Europe Wiki. Retrieved from http://wiki.archivesportaleurope.net/index.php/How_to_use_topics

3. Linked Data

4. Su Lin Blodgett Solon Barocas Hal Daumé III and Hanna Wallach. 2020. Language (Technology) Is Power: A Critical Survey of “Bias” in NLP. Retrieved from http://arxiv.org/abs/2005.14050

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3