High-Quality Data from Crowdsourcing towards the Creation of a Mexican Anti-Immigrant Speech Corpus

Author:

Molina-Villegas Alejandro12ORCID,Cattin Thomas23,Gazca-Hernandez Karina4ORCID,Aldana-Bobadilla Edwin14ORCID

Affiliation:

1. CONAHCYT, Mexico City 03940, Mexico

2. Centro de Investigación en Ciencias de Información Geoespacial, Mexico City 14240, Mexico

3. IFG Lab Centre de recherches et d’analyses géopolitiques, Université Paris 8, 93526 Saint-Denis, France

4. Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional—Unidad Tamaulipas, Ciudad Victoria, Tamaulipas 87130, Mexico

Abstract

Currently, a significant portion of published research on online hate speech relies on existing textual corpora. However, when examining a specific context, there is a lack of preexisting datasets that include the particularities associated with various conditions (e.g., geographic and cultural). This issue is evident in the case of online anti-immigrant speech in Mexico, where available data to study this emergent and often overlooked phenomenon are scarce. In light of this situation, we propose a novel methodology wherein three domain experts annotate a certain number of texts related to the subject. We establish a precise control mechanism based on these annotations to evaluate non-expert annotators. The evaluation of the contributors is implemented in a custom annotation platform, enabling us to conduct a controlled crowdsourcing campaign and assess the reliability of the obtained data. Our results demonstrate that a combination of crowdsourced and expert data leads to iterative improvements, not only in the accuracy achieved by various machine learning classification models (reaching 0.8828) but also in the model’s adaptation to the specific characteristics of hate speech in the Mexican Twittersphere context. In addition to these methodological innovations, the most significant contribution of our work is the creation of the first online Mexican anti-immigrant training corpus for machine-learning-based detection tasks.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference34 articles.

1. Leite, P., Correa-Lazzarini, A., Suárez, M., Flores-Rodríguez, P., Ramírez-Rojas, A., Méndez-Cadena, E., and DelPino-Pacheco, M. (2023, June 19). Guía para la Acción Pública. Comunicación sin Xenofobia. Recomendaciones Para Medios y Redes Sociales. Available online: http://www.conapred.org.mx/index.php?contenido=documento&id=411&id_opcion=147.

2. (2023, May 04). Xenofobiacero Reporte de Conversación de Migración y Xenofobia México. (OIM, 2021). Available online: https://xenofobiacero.org/blog/datos-clave-sobre-los-comentarios-de-odio-hacia-los-migrantes-en-las-redes-sociales-en-mexico.

3. Redman, T. (2023, March 15). If Your Data Is Bad, Your Machine Learning Tools Are Useless. Harvard Business Review 2018. Available online: https://hbr.org/2018/04/if-your-data-is-bad-your-machine-learning-tools-are-useless?utm_medium=social&utm_campaign=hbr&utm_source=twitter.

4. Caicedo, M., and Mena, A.M. (2022, September 28). Imaginarios de la Migración Internacional en México: Una Mirada a los que se van y a los Que Llegan: Encuesta Nacional de Migración. (Universidad Nacional Autónoma de México. Instituto de Investigaciones Jurídicas, 2015). Available online: http://ru.juridicas.unam.mx:80/xmlui/handle/123456789/58480.

5. Wong, T. (2016). The Politics of Immigration: Demographic Change, and American National Identity, Oxford University Press.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3