An Open CAPT System for Prosody Practice: Practical Steps towards Multilingual Setup

Author:

Blake John1ORCID,Bogach Natalia2ORCID,Kusakari Akemi3,Lezhenin Iurii24ORCID,Khaustova Veronica1ORCID,Xuan Son Luu5,Nguyen Van Nhi6,Pham Nam Ba2,Svechnikov Roman2,Ostapchuk Andrey2,Efimov Dmitrei2,Pyshkin Evgeny1ORCID

Affiliation:

1. School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu 965-8580, Japan

2. Institute of Computer Science and Technology, Peter the Great St. Petersburg Polytechnic University, St. Petersburg 195251, Russia

3. Hachinohe College, National Institute of Technology, Hachinohe 039-1104, Japan

4. Speech Technology Center, St. Petersburg 194044, Russia

5. Suntory System Technology Ltd., Osaka 530-8204, Japan

6. FPT Japan Holdings Co., Ltd., Tokyo 105-0011, Japan

Abstract

This paper discusses the challenges posed in creating a Computer-Assisted Pronunciation Training (CAPT) environment for multiple languages. By selecting one language from each of three different language families, we show that a single environment may be tailored to cater for different target languages. We detail the challenges faced during the development of a multimodal CAPT environment comprising a toolkit that manages mobile applications using speech signal processing, visualization, and estimation algorithms. Since the applied underlying mathematical and phonological models, as well as the feedback production algorithms, are based on sound signal processing and modeling rather than on particular languages, the system is language-agnostic and serves as an open toolkit for developing phrasal intonation training exercises for an open selection of languages. However, it was necessary to tailor the CAPT environment to the language-specific particularities in the multilingual setups, especially the additional requirements for adequate and consistent speech evaluation and feedback production. In our work, we describe our response to the challenges in visualizing and segmenting recorded pitch signals and modeling the language melody and rhythm necessary for such a multilingual adaptation, particularly for tonal syllable-timed and mora-timed languages.

Funder

Japan Society for the Promotion of Science

Publisher

MDPI AG

Reference67 articles.

1. Abercrombie, David (1967). Elements of General Phonetics, Edinburgh University Press.

2. Rhythm, timing and the timing of rhythm;Arvaniti;Phonetica,2009

3. Releasing students from the cognitive straitjacket of visual-auditory kinaesthetic learning styles;Baker;Impact,2020

4. Providing a CS unplugged experience at a distance;Bell;ACM Inroads,2022

5. Analysis of French phonetic idiosyncrasies for accent recognition;Berjon;Soft Computing Letters,2021

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Algorithm for Automatic Layout of Graphic Language and Its Application in Graphic Design;International Journal of Information Systems and Supply Chain Management;2024-06-07

2. Construction of a Multimedia Instructional Effect Assessment System of College Foreign Language Translation Based on Artificial Intelligence;International Journal of Information Systems and Supply Chain Management;2024-05-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3