Affiliation:
1. Center for Artificial Intelligence and Data Science, Universität Würzburg, Würzburg, Germany
2. International Audio Laboratories Erlangen, Erlangen, Germany
Abstract
The availability of digital music data in various modalities provides opportunities both for music enjoyment and music research. Regarding the latter, the computer-assisted analysis of tonal structures is a central topic. For Western classical music, studies typically rely on machine-readable scores, which are tedious to create for large-scale works and comprehensive corpora. As an alternative, music audio recordings, which are readily available, can be analyzed with computational methods. With this article, we want to bridge the gap between score- and audio-based measurements of tonal structures by leveraging the power of deep neural networks. Such networks are commonly trained in an end-to-end fashion, which introduces biases towards the training repertoire or towards specific annotators. To overcome these problems, we propose a multi-step strategy. First, we compute pitch-class representations of the audio recordings using networks trained on score–audio pairs. Second, we measure the presence of specific tonal structures using a pattern-matching technique that solely relies on music theory knowledge and does not require annotated training data. Third, we highlight these measurements with interactive visualizations, thus leaving the interpretation to the musicological experts. Our experiments on Richard Wagner's large-scale cycle
Der Ring des Nibelungen
indicate that deep pitch-class representations lead to a high similarity between score- and audio-based measurements of tonal structures, thus demonstrating how to leverage multi-modal data for application scenarios in the computational humanities, where an explicit and interpretable methodology is essential.
Publisher
Association for Computing Machinery (ACM)