Person Localization Model Based on a Fusion of Acoustic and Visual Inputs-Reference-Cited by-同舟云学术

Person Localization Model Based on a Fusion of Acoustic and Visual Inputs

Published:2022-02-01 Issue:3 Volume:11 Page:440
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Koren Leon^ORCID,Stipancic Tomislav^ORCID,Ricko Andrija,Orsag Luka

Abstract

PLEA is an interactive, biomimetic robotic head with non-verbal communication capabilities. PLEA reasoning is based on a multimodal approach combining video and audio inputs to determine the current emotional state of a person. PLEA expresses emotions using facial expressions generated in real-time, which are projected onto a 3D face surface. In this paper, a more sophisticated computation mechanism is developed and evaluated. The model for audio-visual person separation can locate a talking person in a crowded place by combining input from the ResNet network with input from a hand-crafted algorithm. The first input is used to find human faces in the room, and the second input is used to determine the direction of the sound and to focus attention on a single person. After an information fusion procedure is performed, the face of the person speaking is matched with the corresponding sound direction. As a result of this procedure, the robot could start an interaction with the person based on non-verbal signals. The model was tested and evaluated under laboratory conditions by interaction with users. The results suggest that the methodology can be used efficiently to focus a robot’s attention on a localized person.

Funder

Croatian Science Foundation

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/3/440/pdf

Reference32 articles.

1. Barrett, L. F. (2017). How emotions are made. The secret life of the brain. Boston, MA : Houghton Mifflin Harcourt

2. PLEA

3. Multimodal Emotion Analysis Based on Acoustic and Linguistic Features of the Voice

4. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition

5. A snapshot research and implementation of multimodal information fusion for data-driven emotion recognition

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Manufacture and development of Taban: a cute back-projected head social robot for educational purposes;Intelligent Service Robotics;2024-05-30

2. PLEA: The Embodied Virtual Being;Lecture Notes in Computer Science;2024

3. Context-Driven Method in Realization of Optimized Human-Robot Interaction;Tehnički glasnik;2022-06-23

4. Multimodal Emotion Analysis Based on Visual, Acoustic and Linguistic Features;Social Computing and Social Media: Design, User Experience and Impact;2022

5. Human Intention Recognition for Safe Robot Action Planning Using Head Pose;HCI International 2022 - Late Breaking Papers. Multimodality in Advanced Interaction Environments;2022