News Across Languages - Cross-Lingual Document Similarity and Event Tracking

Author:

Rupnik Jan,Muhic Andrej,Leban Gregor,Skraba Primoz,Fortuna Blaz,Grobelnik Marko

Abstract

In today's world, we follow news which is distributed globally. Significant events are reported by different sources and in different languages. In this work, we address the problem of tracking of events in a large multilingual stream. Within a recently developed system Event Registry we examine two aspects of this problem: how to compare articles in different languages and how to link collections of articles in different languages which refer to the same event. Taking a multilingual stream and clusters of articles from each language, we compare different cross-lingual document similarity measures based on Wikipedia. This allows us to compute the similarity of any two articles regardless of language. Building on previous work, we show there are methods which scale well and can compute a meaningful similarity between articles from languages with little or no direct overlap in the training data. Using this capability, we then propose an approach to link clusters of articles across languages which represent the same event. We provide an extensive evaluation of the system as a whole, as well as an evaluation of the quality and robustness of the similarity measure and the linking algorithm.

Publisher

AI Access Foundation

Subject

Artificial Intelligence

Cited by 21 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Unsupervised Story Discovery from Continuous News Streams via Scalable Thematic Embedding;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18

2. NCC: Neural concept compression for multilingual document recommendation;Applied Soft Computing;2023-07

3. Finding Patient Zero and Tracking Narrative Changes in the Context of Online Disinformation Using Semantic Similarity Analysis;Mathematics;2023-04-26

4. Tracking News Stories in Short Messages in the Era of Infodemic;Lecture Notes in Computer Science;2022

5. A Multi-Platform Analysis of Political News Discussion and Sharing on Web Communities;2021 IEEE International Conference on Big Data (Big Data);2021-12-15

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3