A Transformer Model for Manifesto Classification Using Cross-Context Training: An Ecuadorian Case Study

Author:

Barzallo Fernanda1,Baldeon-Calisto Maria12ORCID,Pérez Margorie1ORCID,Moscoso Maria Emilia1,Navarrete Danny1,Riofrío Daniel2,Medina-Peréz Pablo3,Lai-Yuen Susana K4,Benítez Diego2,Peréz Noel2ORCID,Moyano Ricardo Flores2,Fierro Mateo3

Affiliation:

1. Departamento de Ingeniería Industrial and Instituto de Innovación en Productividad y Logística CATENA-USFQ, Universidad San Francisco de Quito USFQ, Quito, Ecuador

2. Colegio de Ciencias e Ingenierías “El Politécnico”, Universidad San Francisco de Quito USFQ, Quito, Ecuador

3. Colegio de Ciencias Sociales y Humanidades COCISOH, Universidad San Francisco de Quito USFQ, Quito, Ecuador

4. Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, USA

Abstract

Content analysis of political manifestos is necessary to understand the policies and proposed actions of a party. However, manually labeling political texts is time-consuming and labor-intensive. Transformer networks have become essential tools for automating this task. Nevertheless, these models require extensive datasets to achieve good performance. This can be a limitation in manifesto classification, where the availability of publicly labeled datasets can be scarce. To address this challenge, in this work, we developed a Transformer network for the classification of manifestos using a cross-domain training strategy. Using the database of the Comparative Manifesto Project, we implemented a fractional factorial experimental design to determine which Spanish-written manifestos form the best training set for Ecuadorian manifesto labeling. Furthermore, we statistically analyzed which Transformer architecture and preprocessing operations improve the model accuracy. The results indicate that creating a training set with manifestos from Spain and Uruguay, along with implementing stemming and lemmatization preprocessing operations, produces the highest classification accuracy. In addition, we found that the DistilBERT and RoBERTa transformer networks perform statistically similarly and consistently well in manifesto classification. Using the cross-context training strategy, DistilBERT and RoBERTa achieve 60.05% and 57.64% accuracy, respectively, in the classification of the Ecuadorian manifesto. Finally, we investigated the effect of the composition of the training set on performance. The experiments demonstrate that training DistilBERT solely with Ecuadorian manifestos achieves the highest accuracy and F1-score. Furthermore, in the absence of the Ecuadorian dataset, competitive performance is achieved by training the model with datasets from Spain and Uruguay.

Funder

Universidad San Francisco de Quito, Colegio de Ciencias e Ingenierias

Publisher

SAGE Publications

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3