Abstract
AbstractThe various speech sounds of a language are obtained by varying the shape and position of the articulators surrounding the vocal tract. Analyzing their variations is crucial for understanding speech production, diagnosing speech disorders and planning therapy. Identifying key anatomical landmarks of these structures on medical images is a pre-requisite for any quantitative analysis and the rising amount of data generated in the field calls for an automatic solution. The challenge lies in the high inter- and intra-speaker variability, the mutual interaction between the articulators and the moderate quality of the images. This study addresses this issue for the first time and tackles it by means of Deep Learning. It proposes a dedicated network architecture named Flat-net and its performance are evaluated and compared with eleven state-of-the-art methods from the literature. The dataset contains midsagittal anatomical Magnetic Resonance Images for 9 speakers sustaining 62 articulations with 21 annotated anatomical landmarks per image. Results show that the Flat-net approach outperforms the former methods, leading to an overall Root Mean Square Error of 3.6 pixels/0.36 cm obtained in a leave-one-out procedure over the speakers. The implementation codes are also shared publicly on GitHub.
Funder
This research project is supported by the START-Program of the Faculty of Medicine, RWTH Aachen. The data component of this work has been partially funded by the French ANR
Publisher
Springer Science and Business Media LLC
Reference53 articles.
1. Harshman, R., Ladefoged, P. & Goldstein, L. Factor analysis of tongue shapes. The J. Acoust. Soc. Am. 62, 693–707 (1977).
2. Beautemps, D., Badin, P. & Bailly, G. Linear degrees of freedom in speech production: Analysis of cineradio- and labio-film data and articulatory-acoustic modeling. The J. Acoust. Soc. Am. 109, 2165–2180 (2001).
3. Serrurier, A., Badin, P., Lamalle, L. & Neuschaefer-Rube, C. Characterization of inter-speaker articulatory variability: a two-level multi-speaker modelling approach based on MRI data. The J. Acoust. Soc. Am. 145, 2149–2170, https://doi.org/10.1121/1.5096631 (2019).
4. Yamasaki, R. et al. Vocal tract adjustments of dysphonic and non-dysphonic women pre-and post-flexible resonance tube in water exercise: a quantitative mri study. J. Voice 31, 442–454 (2017).
5. Guzman, M. et al. Computerized tomography measures during and after artificial lengthening of the vocal tract in subjects with voice disorders. J. voice 31, 124–e1 (2017).
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献