ATC-SD Net: Radiotelephone Communications Speaker Diarization Network

Author:

Pan Weijun1,Wang Yidi2,Zhang Yumei2,Han Boyuan2ORCID

Affiliation:

1. Flight Technology and Flight Safety Research Base of the Civil Aviation Administration of China, Civil Aviation Flight University of China, Guanghan 618307, China

2. College of Air Traffic Management, Civil Aviation Flight University of China, Guanghan 618307, China

Abstract

This study addresses the challenges that high-noise environments and complex multi-speaker scenarios present in civil aviation radio communications. A novel radiotelephone communications speaker diffraction network is developed specifically for these circumstances. To improve the precision of the speaker diarization network, three core modules are designed: voice activity detection (VAD), end-to-end speaker separation for air–ground communication (EESS), and probabilistic knowledge-based text clustering (PKTC). First, the VAD module uses attention mechanisms to separate silence from irrelevant noise, resulting in pure dialogue commands. Subsequently, the EESS module distinguishes between controllers and pilots by levying voice print differences, resulting in effective speaker segmentation. Finally, the PKTC module addresses the issue of pilot voice print ambiguity using text clustering, introducing a novel flight prior knowledge-based text-related clustering model. To achieve robust speaker diarization in multi-pilot scenarios, this model uses prior knowledge-based graph construction, radar data-based graph correction, and probabilistic optimization. This study also includes the development of the specialized ATCSPEECH dataset, which demonstrates significant performance improvements over both the AMI and ATCO2 PROJECT datasets.

Funder

National Natural Science Foundation of China

National Key R&D Program of China

Safety Capacity Building Project of Civil Aviation Administration of China

Fundamental Research Funds for the Central Universities

2024 Annual Central University Fundamental Research Funds Support Project

Publisher

MDPI AG

Reference25 articles.

1. (2001). Procedures for Air Navigation Services–Air Traffic Management (PANS-ATM) (Standard No. 4444 ATM/501).

2. Delpech, E., Laignelet, M., Pimm, C., Raynal, C., Trzos, M., Arnold, A., and Pronto, D. (2018, January 7–12). A Real-Life, French-Accented Corpus of Air Traffic Control Communications. Proceedings of the Language Resources and Evaluation Conference (LREC), Miyazaki, Japan.

3. Helmke, H., Ohneiser, O., Buxbaum, J., and Kern, C. (2017, January 27–30). Increasing Atm Efficiency with Assistant Based Speech Recognition. Proceedings of the 13th USA/Europe Air Traffic Management Research and Development Seminar, Seattle, WA, USA.

4. Lin, Y. (2021). Spoken Instruction Understanding in Air Traffic Control: Challenge, Technique, and Application. Aerospace, 8.

5. Plaquet, A., and Bredin, H. (2023). Powerset Multi-Class Cross Entropy Loss for Neural Speaker Diarization. arXiv.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3