Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach-Reference-Cited by-同舟云学术

Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach

Published:2021-05-25 Issue:11 Volume:10 Page:1259
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Syed Muhammad Shehram Shah,Pirogova Elena^ORCID,Lech Margaret

Abstract

This paper explores the automatic prediction of public trust in politicians through the use of speech, text, and visual modalities. It evaluates the effectiveness of each modality individually, and it investigates fusion approaches for integrating information from each modality for prediction using a multimodal setting. A database was created consisting of speech recordings, twitter messages, and images representing fifteen American politicians, and labeling was carried out per a publicly available ranking system. The data were distributed into three trust categories, i.e., the low-trust category, mid-trust category, and high-trust category. First, unimodal prediction using each of the three modalities individually was performed using the database; then, using the outputs of the unimodal predictions, a multimodal prediction was later performed. Unimodal prediction was performed by training three independent logistic regression (LR) classifiers, one each for speech, text, and images. The prediction vectors from the individual modalities were then concatenated before being used to train a multimodal decision-making LR classifier. We report that the best performing modality was speech, which achieved a classification accuracy of 92.81%, followed by the images, achieving an accuracy of 77.96%, whereas the best performing model for text-modality achieved a 72.26% accuracy. With the multimodal approach, the highest classification accuracy of 97.53% was obtained when all three modalities were used for trust prediction. Meanwhile, in a bimodal setup, the best performing combination was that combining the speech and image visual modalities by achieving an accuracy of 95.07%, followed by the speech and text combination, showing an accuracy of 94.40%, whereas the text and images visual modal combination resulted in an accuracy of 83.20%.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/10/11/1259/pdf

Reference52 articles.

1. Introduction: Social Signal Processing;Vinciarelli,2017

2. Social signal processing: Survey of an emerging domain

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluation Method of Online Education Learners’ Emotional Input Based on Multimodal Data Fusion;Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering;2024

2. Exploring Political Mistrust in Pandemic Risk Communication: Mixed-Method Study Using Social Media Data Analysis;Journal of Medical Internet Research;2023-10-20

3. Exploring Political Mistrust in Pandemic Risk Communication: Mixed-Method Study Using Social Media Data Analysis (Preprint);2023-06-22

4. It’s Not Only What You Say, But Also How You Say It: Machine Learning Approach to Estimate Trust from Conversation;Human Factors: The Journal of the Human Factors and Ergonomics Society;2023-04-28

5. Multimodal Classification: Current Landscape, Taxonomy and Future Directions;ACM Computing Surveys;2022-12-15