DExter: Learning and Controlling Performance Expression with Diffusion Models

Author:

Zhang Huan1ORCID,Chowdhury Shreyan2ORCID,Cancino-Chacón Carlos Eduardo2ORCID,Liang Jinhua1ORCID,Dixon Simon1ORCID,Widmer Gerhard2ORCID

Affiliation:

1. Centre for Digital Music, Queen Mary University of London, London E1 4NS, UK

2. Institute of Computational Perception, Johannes Kepler Universität Linz, 4040 Linz, Austria

Abstract

In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. The main challenge faced in performance rendering tasks is the continuous and sequential modeling of expressive timing and dynamics over time, which is critical for capturing the evolving nuances that characterize live musical performances. In this approach, performance parameters are represented in a continuous expression space, and a diffusion model is trained to predict these continuous parameters while being conditioned on a musical score. Furthermore, DExter also enables the generation of interpretations (expressive variations of a performance) guided by perceptually meaningful features by being jointly conditioned on score and perceptual-feature representations. Consequently, we find that our model is useful for learning expressive performance, generating perceptually steered performances, and transferring performance styles. We assess the model through quantitative and qualitative analyses, focusing on specific performance metrics regarding dimensions like asynchrony and articulation, as well as through listening tests that compare generated performances with different human interpretations. The results show that DExter is able to capture the time-varying correlation of the expressive parameters, and it compares well to existing rendering models in subjectively evaluated ratings. The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified via a proxy model predicting perceptual characteristics of differently steered performances.

Funder

UKRI Centre for Doctoral Training in Artificial Intelligence and Music

UK Research and Innovation

European Research Council

Publisher

MDPI AG

Reference43 articles.

1. In Search of the Horowitz Factor;Widmer;AI Mag.,2003

2. Cancino-Chacón, C., Peter, S., Hu, P., Karystinaios, E., Henkel, F., Foscarin, F., and Widmer, G. (2023, January 19–25). The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist. Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Macau, China.

3. Morsi, A., Zhang, H., Maezawa, A., Dixon, S., and Serra, X. (2024, January 1–6). Simulating and Validating Piano Performance Mistakes for Music Learning Context. Proceedings of the Sound and Music Computing Conference (SMC), Porto, Portugal.

4. Aljanaki, A. (2018, January 23–27). A data-driven approach to mid-level perceptual musical feature modeling. Proceedings of the 19th International Society for Music Information Retrieval Conference, (ISMIR), Paris, France.

5. Chowdhury, S., Vall, A., Haunschmid, V., and Widmer, G. (2019, January 4–8). Towards explainable music emotion recognition: The route via Mid-level features. Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3