Physician-patient speech separation method based on voiceprint technology and privacy protection

Author:

Zhang Li,Liu JingRui,Jing Ming

Abstract

In speech recognition and natural language processing of doctor-patient voice communication, it is critical to distinguish what comes from the healthcare worker and what comes from the patient. In addition, speech contains acoustic and linguistic features that can be identified by machine learning models to measure the speaker's behavioral health. At the same time, it is relatively simple and attractive for patients to use voice data acquisition, as well as relatively cheap and convenient, requiring only a microphone, a quiet place and a device to collect audio samples. Thus, voice-based biomarkers can prescreen for disease, monitor disease progression and response to treatment, and be useful alternative markers for clinical studies with informed consent, but the premise of this process again requires us to distinguish between doctors and patients when taking audio samples. For audio samples of patient voices, in practice, most of the doctor's and patient's voices are not taken separately, but are mixed together. Several speaker recording methods have been used to isolate sound in the time domain; However, these studies do not address how to obtain timelabel-based speech samples, nor how to identify speakers. In this paper, a speech separation method is proposed for the audio separation situation between a doctor and several patients. The method mainly includes three parts: voiceprint segmentation clustering, cutting and splicing, speech identity determination. Doctor and patient audio can be separated while respecting the privacy of the conversation content, and can be stored separately based on the identity of the voice.

Publisher

EDP Sciences

Reference16 articles.

1. An overview of automatic speaker diarization systems

2. Speaker Diarization: A Review of Recent Research

3. Sue Tranter, et al. “An Investigation into the Interactions between Speaker Diarisation Systems and Automatic Speech Transcription B Accuracy of Cts Forced Alignments 44.” (2003).

4. Scotte Chen, et al. “Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion.” (1998).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3