A Survey of Cross-Lingual Text Classification and Its Applications on Fake News Detection

Author:

Lan Liang12ORCID,Huang Tao3ORCID,Li Yupeng2ORCID,Song Yunya2ORCID

Affiliation:

1. Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, P. R. China

2. Department of Interactive Media, Hong Kong Baptist University, 224 Waterloo Rd, Kowloon Tong, Hong Kong

3. Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Science, Chinese Academy of Sciences, Shanghai, 200031, P. R. China

Abstract

Cross-lingual text classification is a challenging task in natural language processing. The objective is to build accurate text classification models for low-resource languages by transferring the knowledge learned from high-resource languages. The task has been studied since 2003 and has attracted significantly growing attention in the last decade due to the success of deep learning models in natural language processing. Many new methods have been proposed to address the challenges in cross-lingual text classification. Meanwhile, cross-lingual fake news detection is one of the most important applications of cross-lingual text classification. It has already created significant social impacts on alleviating the infodemic problem in low-resource languages. The research works on cross-lingual text classification and cross-lingual fake news detection have been growing rapidly in recent years. Therefore, a comprehensive survey is imperative to summarize existing algorithms for cross-lingual text classification and explain the connections among them. This paper systematically reviews research works on cross-lingual text classifications and their applications in cross-lingual fake news detection. We categorize the evolution of cross-lingual text classification methods into four phases: (1) Traditional text classification models with translation; (2) Cross-lingual word embedding-based methods, (3) Pretraining then finetuning-based methods, and (4) Pretraining then prompting-based methods. We first discuss and analyze the representative methods in each phase in detail. Second, we provide a detailed review of their applications in the emerging fake news detection problem. Finally, we explore the potential issues of this open problem and also discuss possible future directions.

Funder

Natural Science Foundation Council of China

Guangdong Basic and Applied Basic Research Foundation

Germany/Hong Kong Joint Research Scheme

Hong Kong RGC Early Career Scheme

Interdisciplinary Research Clusters Matching Scheme

Hong Kong Baptist University, National Key R&D Program of China

Strategic Priority Research Program of Chinese Academy of Sciences

Self-supporting Program of Guangzhou Laboratory

Publisher

World Scientific Pub Co Pte Ltd

Reference45 articles.

1. A Survey of Cross-lingual Word Embedding Models

2. A. Conneau and G. Lample, Advances in Neural Information Processing Systems, 2019, Vol. 32, pp. 7059–7069.

3. A Survey of Fake News

4. A Survey of Cross-lingual Sentiment Analysis: Methodologies, Models and Evaluations

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3