Abstract
Optimising the parameters of the audio thumbnailing procedure can improve the final results. Previously, experiments with the thumbnail length parameter have shown strong potential to enhance thumbnail boundaries detection for Beatles songs. However, usage of the thumbnail length parameter has been limited to only changing the thumbnail length lower bound. The purpose is to use the thumbnail length upper bound in combination with the lower bound to improve thumbnail boundaries' detection for Beatles songs. I experiment with the thumbnail length upper bound while fixing the lower bound, then analyse the F-measure results based on segment boundaries. I use a thumbnail procedure with a repetition-based fitness measure as the foundation. The results demonstrate that the thumbnail length upper bound can increase an estimated thumbnail boundaries' accuracy for Beatles songs. I select a pair of lower and upper bounds that slightly improves the F-measure based on segment boundaries, unlike using only the lower bound. In conclusion, this study optimises the thumbnail length bounds to improve the audio thumbnailing procedure with a repetition-based fitness measure for Beatles songs. It is demonstrated that the upper bound can improve the F-measure if chosen correctly. Unexpectedly, the upper bound can be omitted without losing much in the accuracy of thumbnail boundaries' detection. Additionally, I indicate further directions to optimise thumbnail length bounds for popular music and its genres (like pop, rock). Also, I describe other supplemental tasks for future work.
Publisher
National Academy of Sciences of Ukraine (Co. LTD Ukrinformnauka) (Publications)
Reference18 articles.
1. Müller, M., & Zalkow, F. (2019). FMP Notebooks: Educational Material for Teaching and Learning Fundamentals Of Music Processing. In ISMIR Conference (pp. 573-580). Retrieved December 21, 2022, from https://www.audiolabs-erlangen.de/resources/MIR/FMP/data/C0/2019_MuellerZalkow_FMP_ISMIR.pdf.
2. Nieto, O., Mysore, G. J., Wang, C.-. i ., Smith, J. B. L., Schlüter, J., Grill, T., & McFee, B. (2020). Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications. Transactions of the International Society for Music Information Retrieval, 3(1), 246-263. http://doi.org/10.5334/tismir.54.
3. Muller, M., Jiang, N., & Grosche, P. (2013). A robust fitness measure for capturing repetitions in music recordings with applications to audio thumbnailing. IEEE Transactions on Audio, Speech, and Language Processing, 21(3), 531-543. https://doi.org/10.1109/tasl.2012.2227732.
4. Jiang, N., & Muller, M. (2015). Estimating double thumbnails for Music Recordings. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). https://doi.org/10.1109/icassp.2015.7177949.
5. He, Q., Sun, X., Yu, Y., & Li, W. (2022). Deepchorus: A hybrid model of multi-scale convolution and self-attention for chorus detection. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 411-415. https://doi.org/10.1109/icassp43922.2022.9746919.