DExter: Learning and Controlling Performance Expression with Diffusion Models-Reference-Cited by-同舟云学术

DExter: Learning and Controlling Performance Expression with Diffusion Models

Published:2024-07-26 Issue:15 Volume:14 Page:6543
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Zhang Huan¹^ORCID,Chowdhury Shreyan²^ORCID,Cancino-Chacón Carlos Eduardo²^ORCID,Liang Jinhua¹^ORCID,Dixon Simon¹^ORCID,Widmer Gerhard²^ORCID

Affiliation:

1. Centre for Digital Music, Queen Mary University of London, London E1 4NS, UK

2. Institute of Computational Perception, Johannes Kepler Universität Linz, 4040 Linz, Austria

Abstract

In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. The main challenge faced in performance rendering tasks is the continuous and sequential modeling of expressive timing and dynamics over time, which is critical for capturing the evolving nuances that characterize live musical performances. In this approach, performance parameters are represented in a continuous expression space, and a diffusion model is trained to predict these continuous parameters while being conditioned on a musical score. Furthermore, DExter also enables the generation of interpretations (expressive variations of a performance) guided by perceptually meaningful features by being jointly conditioned on score and perceptual-feature representations. Consequently, we find that our model is useful for learning expressive performance, generating perceptually steered performances, and transferring performance styles. We assess the model through quantitative and qualitative analyses, focusing on specific performance metrics regarding dimensions like asynchrony and articulation, as well as through listening tests that compare generated performances with different human interpretations. The results show that DExter is able to capture the time-varying correlation of the expressive parameters, and it compares well to existing rendering models in subjectively evaluated ratings. The perceptual-feature-conditioned generation and transferring capabilities of DExter are verified via a proxy model predicting perceptual characteristics of differently steered performances.

Funder

UKRI Centre for Doctoral Training in Artificial Intelligence and Music

UK Research and Innovation

European Research Council

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/15/6543/pdf

Reference43 articles.

1. In Search of the Horowitz Factor;Widmer;AI Mag.,2003

2. Cancino-Chacón, C., Peter, S., Hu, P., Karystinaios, E., Henkel, F., Foscarin, F., and Widmer, G. (2023, January 19–25). The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist. Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Macau, China.

3. Morsi, A., Zhang, H., Maezawa, A., Dixon, S., and Serra, X. (2024, January 1–6). Simulating and Validating Piano Performance Mistakes for Music Learning Context. Proceedings of the Sound and Music Computing Conference (SMC), Porto, Portugal.

4. Aljanaki, A. (2018, January 23–27). A data-driven approach to mid-level perceptual musical feature modeling. Proceedings of the 19th International Society for Music Information Retrieval Conference, (ISMIR), Paris, France.

5. Chowdhury, S., Vall, A., Haunschmid, V., and Widmer, G. (2019, January 4–8). Towards explainable music emotion recognition: The route via Mid-level features. Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands.