Audio-Visual Bimodal Combination-Based Speaker Tracking Method for Mobile Robot

Author:

Zhang Hao-Yan123,Zhang Long-Bo123,Shi Qi-Feng123,Liu Zhen-Tao123ORCID

Affiliation:

1. School of Automation, China University of Geosciences, No.388 Lumo Road, Hongshan District, Wuhan, Hubei 430074, China

2. Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan, Hubei 430074, China

3. Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education, Wuhan, Hubei 430074, China

Abstract

Initiative service is a key research direction for the new generation of service robots. It is important to automatically track humans for initiative service in human-robot interaction. To solve the problems of low precision and poor anti-interference capability of only using single-modal (audio or visual) information, a speaker positioning and tracking method based on an audio-visual bimodal combination is proposed. First, the azimuth of the speaker is obtained based on the time difference of arrival using a microphone array, and face detection based on AdaBoost is carried out using the camera. A distance and azimuth calculation model is established to obtain the position of the speaker. Second, a speaker positioning strategy based on an audio-visual bimodal combination is designed to handle different situations. Third, the path is planned by which the azimuth and distance between the robot and the speaker are maintained in a limited range. Different azimuths and distances for speaker tracking are set to perform various simulations. Finally, the mobile robot is driven to follow the path using the STM32 real-time control system. Information from the microphone array and the camera is collected and processed by Raspberry Pi. The tracking accuracy was tested under a single-face situation by setting 20 different target points, and 10 tests were carried out under each point. Under multi-face situations, the audio-visual bimodal information is combined to identify the speaker, and then the Kalman filter is used in face tracking. The experimental results demonstrate that the running trajectory of the mobile robot is close to the ideal trajectory, which ensures effective speaker tracking.

Funder

College Students’ Innovative Entrepreneurial Training Plan Program

China University of Geosciences

Publisher

Fuji Technology Press Ltd.

Reference26 articles.

1. R. Wang, “Research on Calibration Methods for Distributed Microphone Arrays,” Ph.D. Thesis, Dalian University of Technology, 2021 (in Chinese). https://doi.org/10.26991/d.cnki.gdllu.2021.003823

2. Y.-X. Zhu and H.-R. Jin, “Speaker Localization Based on Audio-Visual Bimodal Fusion,” J. Adv. Comput. Intell. Intell. Inform., Vol.25, No.3, pp. 375-382, 2021. https://doi.org/10.20965/jaciii.2021.p0375

3. X. Li and H. Liu, “A Survey of Sound Source Localization for Robot Audition,” CAAI Trans. on Intelligent System, Vol.7, No.1, pp. 9-20, 2012 (in Chinese).

4. R. Zeng, “Disturbance suppression of ecological monitoring data based on microphone array,” China Science and Technology Information, Vol.2022, No.10, pp. 118-120, 2022 (in Chinese).

5. J.-H. Duan and R.-H. Liu, “Sound source location based on BP neural network and TDOA,” Telecommunication Engineering, Vol.47, No.5, pp. 116-119, 2007 (in Chinese).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3