PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network-Reference-Cited by-同舟云学术

PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network

Published:2019-07-17 Issue: Volume:33 Page:1174-1181
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Wang Bryan,Yang Yi-Hsuan

Abstract

Music creation is typically composed of two parts: composing the musical score, and then performing the score with instruments to make sounds. While recent work has made much progress in automatic music generation in the symbolic domain, few attempts have been made to build an AI model that can render realistic music audio from musical scores. Directly synthesizing audio with sound sample libraries often leads to mechanical and deadpan results, since musical scores do not contain performance-level information, such as subtle changes in timing and dynamics. Moreover, while the task may sound like a text-to-speech synthesis problem, there are fundamental differences since music audio has rich polyphonic sounds. To build such an AI performer, we propose in this paper a deep convolutional model that learns in an end-to-end manner the score-to-audio mapping between a symbolic representation of music called the pianorolls and an audio representation of music called the spectrograms. The model consists of two subnets: the ContourNet, which uses a U-Net structure to learn the correspondence between pianorolls and spectrograms and to give an initial result; and the TextureNet, which further uses a multi-band residual network to refine the result by adding the spectral texture of overtones and timbre. We train the model to generate music clips of the violin, cello, and flute, with a dataset of moderate size. We also present the result of a user study that shows our model achieves higher mean opinion score (MOS) in naturalness and emotional expressivity than a WaveNet-based model and two off-the-shelf synthesizers. We open our source code at https://github.com/bwang514/PerformanceNet

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. EEG-driven automatic generation of emotive music based on transformer;Frontiers in Neurorobotics;2024-08-19

2. Understanding the Use of AI-Based Audio Generation Models by End-Users;Extended Abstracts of the CHI Conference on Human Factors in Computing Systems;2024-05-02

3. Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

4. Neural electric bass guitar synthesis framework enabling attack-sustain-representation-based technique control;EURASIP Journal on Audio, Speech, and Music Processing;2024-01-11

5. Exploring AI Music Generation: A Review of Deep Learning Algorithms and Datasets for Undergraduate Researchers;Communications in Computer and Information Science;2023-12-12