Affiliation:
1. Babeş–Bolyai University , Faculty of Mathematics and Computer Science , Cluj-Napoca , Romania
2. Cluj-Napoca , Romania
Abstract
Abstract
Music information retrieval has lately become an important field of information retrieval, because by profound analysis of music pieces important information can be collected: genre labels, mood prediction, artist identification, just to name a few. The lack of large-scale music datasets containing audio features and metadata has lead to the construction and publication of the Million Song Dataset (MSD) and its satellite datasets. Nonetheless, mainly because of licensing limitations, no freely available lyrics datasets have been published for research.
In this paper we describe the construction of an English lyrics dataset based on the Last.fm Dataset, connected to LyricWiki’s database and MusicBrainz’s encyclopedia. To avoid copyright issues, only the URLs to the lyrics are stored in the database. In order to demonstrate the eligibility of the compiled dataset, in the second part of the paper we present genre classification experiments with lyrics-based features, including bagof-n-grams, as well as higher-level features such as rhyme-based and statistical text features. We obtained results similar to the experimental outcomes presented in other works, showing that more sophisticated textual features can improve genre classification performance, and indicating the superiority of the binary weighting scheme compared to tf–idf.
Reference60 articles.
1. [1] C. Apté, F. Damerau, and S. M. Weiss. Toward language independent automated learning of text categorization models. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 23–30, Dublin, Ireland, 1994. Springer-Verlag. ⇒171
2. [2] J. Atherton and B. Kaneshiro. I said it first: Topological analysis of lyrical influence networks. In ISMIR, pages 654–660, 2016. ⇒162
3. [3] T. Bertin-Mahieux, D. P. W. Ellis, B. Whitman,and P. Lamere. The million song dataset. In A. Klapuri and C. Leider, editors, ISMIR, pages 591–596. University of Miami, 2011. ⇒159, 160
4. [4] M. Besson, F. Faita, I. Peretz, A.-M. Bonnel, and J. Requin. Singing in the brain: Independence of lyrics and tunes. Psychological Science, 9(6):494–498, 1998. ⇒160, 169
5. [5] C. M. Bishop. Pattern recognition and machine learning. Springer, 2006. ⇒174
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献