Audio-Visual Cross-Attention Network for Robotic Speaker Tracking-Reference-Cited by-同舟云学术

Audio-Visual Cross-Attention Network for Robotic Speaker Tracking

Author:

Qian Xinyuan¹^ORCID,Wang Zhengdong²,Wang Jiadong²^ORCID,Guan Guohui³,Li Haizhou⁴^ORCID

Affiliation:

1. Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing, China

2. Department of Electrical and Computer Engineering, National University of Singapore, Singapore

3. Department of Electrical Engineering and Computer Sciences, Univeristy of California at Berkeley, Berkeley, CA, USA

4. Guangdong Provincial Key Laboratory of Big Data Computing, Chinese University of Hong Kong, Shenzhen, China

Funder

Science and Engineering Research Council

Agency for Science, Technology and Research (A*STAR), Singapore

Deutsche Forschungsgemeinschaft

Universität Bremen

Guangdong Provincial Key Laboratory of Big Data Computing

The Chinese University of Hong Kong, Shenzhen

Internal Project of Shenzhen Research Institute of Big Data

Research Foundation of Guangdong Province

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics,Computer Science (miscellaneous),Computational Mathematics

Link

Reference62 articles.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Mobile Bot Rotation Using Sound Source Localization And Distant Speech Recognition;2024 IEEE International Conference on Robotics and Automation (ICRA);2024-05-13

4. Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

5. LOCSELECT: Target Speaker Localization with an Auditory Selective Hearing Mechanism;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14