Generating the Voice of the Interactive Virtual Assistant

Author:

Stan Adriana,Lőrincz Beáta

Abstract

This chapter introduces an overview of the current approaches for generating spoken content using text-to-speech synthesis (TTS) systems, and thus the voice of an Interactive Virtual Assistant (IVA). The overview builds upon the issues which make spoken content generation a non-trivial task, and introduces the two main components of a TTS system: text processing and acoustic modelling. It then focuses on providing the reader with the minimally required scientific details of the terminology and methods involved in speech synthesis, yet with sufficient knowledge so as to be able to make the initial decisions regarding the choice of technology for the vocal identity of the IVA. The speech synthesis methodologies’ description begins with the basic, easy to run, low-requirement rule-based synthesis, and ends up within the state-of-the-art deep learning landscape. To bring this extremely complex and extensive research field closer to commercial deployment, an extensive indexing of the readily and freely available resources and tools required to build a TTS system is provided. Quality evaluation methods and open research problems are, as well, highlighted at end of the chapter.

Publisher

IntechOpen

Reference112 articles.

1. J. Benesty, M. M. Sondhi, and Y. A. Huang, Springer Handbook of Speech Processing. Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2007

2. P. Taylor, Text-to-Speech Synthesis. Cambridge University Press, 2009

3. “Missing fundamental,” en.wikipedia.org/wiki/ Missing fundamental, online; accessed 15-December-2020

4. S. King, “Speech Zone - Windowing,” speech.zone/windowing/, online; accessed 15-December-2020

5. “Fourier analysis,” en.wikipedia.org/wiki/Fourier analysis, online; accessed 15-December-2020

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. The Role of Vocal Persona in Natural and Synthesized Speech;2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG);2023-01-05

2. On the Potential of Modular Voice Conversion for Virtual Agents;2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW);2021-09-28

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3