Author:
Xu Luzhen,Yan Haoyin,He Maokui,Guo Zixian,Zhou Yeping,Liu Peiqi,Zhang Jie,Dai Lirong
Publisher
Springer Science and Business Media LLC
Reference23 articles.
1. FISCUS J G, AJOT J, GAROFOLO J S. The rich transcription 2007 meeting recognition evaluation [M]//Multimodal technologies for perception of humans. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008: 373–389.
2. YU D, CHANG X K, QIAN Y M. Recognizing multi-talker speech with permutation invariant training [C]//Interspeech 2017. Stockholm: ISCA, 2017: 2456–2460.
3. SHI M, DU Z, CHEN Q, et al. CASA-ASR: Context-aware speaker-attributed ASR [DB/OL]. (2023-05-21). https://arxiv.org/abs/2305.12459
4. SEKI H, HORI T, WATANABE S, et al. A purely end-to-end system for multi-speaker speech recognition [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018: 2620–2630.
5. KANDA N, GAUR Y, WANG X F, et al. Serialized output training for end-to-end overlapped speech recognition [C]//Interspeech 2020. Shanghai: ISCA, 2020: 2797–2801.