From Music Scores to Audio Recordings: Deep Pitch-Class Representations for Measuring Tonal Structures-Reference-Cited by-同舟云学术

From Music Scores to Audio Recordings: Deep Pitch-Class Representations for Measuring Tonal Structures

Published:2024-07-31 Issue:3 Volume:17 Page:1-19
ISSN:1556-4673
Container-title:Journal on Computing and Cultural Heritage
language:en
Short-container-title:J. Comput. Cult. Herit.

Author:

Weiß Christof¹^ORCID,Müller Meinard²^ORCID

Affiliation:

1. Center for Artificial Intelligence and Data Science, Universität Würzburg, Würzburg, Germany

2. International Audio Laboratories Erlangen, Erlangen, Germany

Abstract

The availability of digital music data in various modalities provides opportunities both for music enjoyment and music research. Regarding the latter, the computer-assisted analysis of tonal structures is a central topic. For Western classical music, studies typically rely on machine-readable scores, which are tedious to create for large-scale works and comprehensive corpora. As an alternative, music audio recordings, which are readily available, can be analyzed with computational methods. With this article, we want to bridge the gap between score- and audio-based measurements of tonal structures by leveraging the power of deep neural networks. Such networks are commonly trained in an end-to-end fashion, which introduces biases towards the training repertoire or towards specific annotators. To overcome these problems, we propose a multi-step strategy. First, we compute pitch-class representations of the audio recordings using networks trained on score–audio pairs. Second, we measure the presence of specific tonal structures using a pattern-matching technique that solely relies on music theory knowledge and does not require annotated training data. Third, we highlight these measurements with interactive visualizations, thus leaving the interpretation to the musicological experts. Our experiments on Richard Wagner's large-scale cycle Der Ring des Nibelungen indicate that deep pitch-class representations lead to a high similarity between score- and audio-based measurements of tonal structures, thus demonstrating how to leverage multi-modal data for application scenarios in the computational humanities, where an explicit and interpretable methodology is essential.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3659103

Reference51 articles.

1. Jazz Bass Transcription Using a U-Net Architecture

2. Lei J. Ba Jamie R. Kiros and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https://arxiv.org/abs/1607.06450

3. Automatic Music Transcription: An Overview

4. On the Relative Importance of Individual Components of Chord Recognition Systems