USEV: Universal Speaker Extraction With Visual Cue-Reference-Cited by-同舟云学术

USEV: Universal Speaker Extraction With Visual Cue

Published:2022 Issue: Volume:30 Page:3032-3045
ISSN:2329-9290
Container-title:IEEE/ACM Transactions on Audio, Speech, and Language Processing
language:
Short-container-title:IEEE/ACM Trans. Audio Speech Lang. Process.

Author:

Pan Zexu¹^ORCID,Ge Meng²^ORCID,Li Haizhou³^ORCID

Affiliation:

1. Integrative Sciences and Engineering Programme, Institute of Data Science, Department of Electrical and Computer Engineering, National University of Singapore, Singapore

2. Tianjin Key Laboratory of Cognitive Computing and Application, College of Intelligence and Computing, Tianjin University, Tianjin, China

3. School of Data Science, Chinese University of Hong Kong, Shenzhen, China

Funder

Shenzhen Research Institute of Big Data

Guangdong Provincial Key Laboratory of Big Data Computing

University Development Fund

The Chinese University of Hong Kong, Shenzhen

Science and Engineering Research Council, Agency for Science, Technology and Research (A*STAR), Singapore

Deutsche Forschungsgemeinschaft

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics,Computer Science (miscellaneous),Computational Mathematics

Link

http://xplorestaging.ieee.org/ielx7/6570655/9657755/09887809.pdf?arnumber=9887809

Reference89 articles.

1. SDR – Half-baked or Well Done?

2. Learning speaker representation for neural network based multichannel speaker extraction;žmolíková;Proc IEEE Autom Speech Recognit Understanding Workshop,0

3. Speaker-Utterance Dual Attention for Speaker and Utterance Verification

4. Generalized End-to-End Loss for Speaker Verification

5. FaceFilter: Audio-Visual Speech Separation Using Still Images

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion;IEEE/ACM Transactions on Audio, Speech, and Language Processing;2024

2. A Region Based Non-overlapping Reference Speech Estimation Method for Speaker Extraction;MultiMedia Modeling;2024

3. Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction;2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU);2023-12-16

4. Leveraging Sound Local and Global Features for Language-Queried Target Sound Extraction;Neural Information Processing;2023-11-15

5. Lip landmark-based audio-visual speech enhancement with multimodal feature fusion network;Neurocomputing;2023-09