Abstract
This paper presents a comprehensive literature review, the main aim of which is to gather information on the ways in which artificial intelligence is currently being used in content generation in the podcast industry, what tools can be used to do so, and how the merging of the two fields has evolved. Based on the structure and role specification of the podcast host, AI tools that could fulfill these roles were identified. The paper specifically focuses on podcast production in the early stages, i.e. the conception, development and curation of raw content, for which advanced technologies for Automatic Speech Recognition (ASR), Speech Synthesis (TTS) and Generative Pre-trained Transformer (GPT) are specific. This review is based on systematic research in databases, academic journals, conference proceedings, and other relevant sources related to artificial intelligence, the podcasting industry, and a generalization of their results. In particular, the Google Scholar database for scholarly articles and Google search engines were used to collect information on these tools. Finally, individual research on AI-generated content in the time range between 2006 – 2023 was construed using a neutral interpretation. The final selection includes 14 relevant studies in the field of AI and podcasting interfacing and 48 selected AI tools that can mostly be used individually and separately in the entire podcast production process. The contribution of this literature review is the structured consolidation of information, promotion of interdisciplinary research, and provision of state of the art in the field.
Publisher
University of Saints Cyril and Methodius
Reference27 articles.
1. Achiam, J. (n.d.). Pioneering research on the path to AGI. https://openai.com/research/overview
2. Aldrich, D., Bell, B., & Batzel, T. (2006). Automated podcasting solution expands the boundaries of the classroom. In A. Nagorski, G. Brouilette, & C. Rhodes (Eds.), SIGUCCS’06: Proceedings of the 34th annual ACM SIGUCCS fall conference: Expanding the boundaries. Association for Computing Machinery (pp. 1-4). https://doi.org/10.1145/1181216.1181217
3. Bischoff, A. (2006). Podcast based m-learning with pediaphon – a web based text-to-speech interface for the free wikipedia encyclopedia. In M. Huba (Ed.), Virtual University VU’06 – Proceedings: 7th international conference (pp. 173-176). Slovak University of Technology.
4. Buzzi, M. C., Buzzi, M., Leporini, B., & Mori, G. (2011). Educational impact of structured podcasts on blind users. In C. Stephanidis (Ed.), Universal access in human-computer interaction. Applications and services – 6th international conference (pp. 521-529). Springer. https://doi.org/10.1007/978-3-642-21657-2_56
5. Cambre, J., Colnago, J., Maddock, J., Tsai, J., & Kaye, J. (2020). Choice of voices: A largescale evaluation of text-to-speech voice quality for long-form content. In R. Bernhaupt, F. Mueller, D. Verweij, & J. Andres (Eds.), Proceedings of the 2020 CHI conference on human factors in computing systems (CHI’20) (1-13). Association for Computing Machinery. https://doi.org/10.1145/3313831.3376789