Improve Singing Quality Prediction Using Self-supervised Transfer Learning and Human Perception Feedback-Reference-Cited by-同舟云学术

Improve Singing Quality Prediction Using Self-supervised Transfer Learning and Human Perception Feedback

Published:2023-12-06 Issue: Volume: Page:1-7
ISSN:
Container-title:ACM Multimedia Asia 2023
language:
Short-container-title:

Author:

Chan Ping-Chen¹^ORCID,Chen Po-Wei¹^ORCID,Soo Von-Wun²^ORCID

Affiliation:

1. Institute of Information Systems and Applications, National Tsing Hua University, Taiwan

2. Institute of Information Systems and Applications, National Tsing Hua University, Taiwan and Department of Artificial Intelligence, Chang Gung University, Taiwan

Funder

National Science Council

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3595916.3626443

Reference41 articles.

1. 2018. Digital archive mobile performances (DAMP). https://ccrma.stanford.edu/damp/publications/ 2018. Digital archive mobile performances (DAMP). https://ccrma.stanford.edu/damp/publications/

2. Rosana Ardila , Megan Branson , Kelly Davis , Michael Henretty , Michael Kohler , Josh Meyer , Reuben Morais , Lindsay Saunders , Francis M Tyers , and Gregor Weber . 2019. Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670 ( 2019 ). Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M Tyers, and Gregor Weber. 2019. Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670 (2019).

3. Alexei Baevski and Abdelrahman Mohamed . 2020 . Effectiveness of self-supervised pre-training for ASR . In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7694–7698 . Alexei Baevski and Abdelrahman Mohamed. 2020. Effectiveness of self-supervised pre-training for ASR. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7694–7698.

4. Alexei Baevski , Yuhao Zhou , Abdelrahman Mohamed , and Michael Auli . 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations . In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates , Inc ., 12449–12460. https://proceedings.neurips.cc/paper/ 2020 /file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 12449–12460. https://proceedings.neurips.cc/paper/2020/file/92d1e1eb1cd6f9fba3227870bb6d7f07-Paper.pdf

5. Deep Speaker Embeddings for Short-Duration Speaker Verification