Skip to main content
Have a personal or library account? Click to login
Multimodal Deep Learning for Music Genre Classification Cover
Open Access
|Sep 2018

References

  1. Adomavicius, G., & Kwon, Y. (2012). Improving aggregate recommendation diversity using ranking based techniques. IEEE Transactions on Knowledge and Data Engineering, 24(5), 896911. DOI: 10.1109/TKDE.2011.15
  2. Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 17981828. DOI: 10.1109/TPAMI.2013.50
  3. Bertin-Mahieux, T., Eck, D., Maillet, F., & Lamere, P. (2008). Autotagger: A model for predicting social tags from acoustic features on large music databases. Journal of New Music Research, 37(2), 115135. DOI: 10.1080/09298210802479250
  4. Bertin-Mahieux, T., Ellis, D. P. W., Whitman, B., & Lamere, P. (2011). The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference.
  5. Bogdanov, D., Porter, A., Herrera, P., & Serra, X. (2016). Cross-collection evaluation for music classification tasks. In Proceedings of the 17th International Society for Music Information Retrieval Conference, 379385.
  6. Choi, K., Fazekas, G., & Sandler, M. (2016a). Automatic tagging using deep convolutional neural networks. In Proceedings of the 17th International Society for Music Information Retrieval Conference, 805811.
  7. Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2016b). Convolutional recurrent neural networks for music classification. arXiv preprint arXiv:1609.04243.
  8. Choi, K., Lee, J.H., & Downie, J.S. (2014). What is this song about anyway?: Automatic classification of subject using user interpretations and lyrics. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 453454. DOI: 10.1109/JCDL.2014.6970221
  9. Chollet, F. (2016). Information-theoretical label embeddings for large-scale image classification. arXiv preprint arXiv:1607.05691.
  10. Dieleman, S., Brakel, P., & Schrauwen, B. (2011). Audio-based music classification with a pretrained convolutional network. In Proceedings of the 12th International Society for Music Information Retrieval Conference, 669674.
  11. Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. In IEEE International Conference on Acoustics, Speech and Signal Processing, 69646968. DOI: 10.1109/ICASSP.2014.6854950
  12. Dorfer, M., Arzt, A., & Widmer, G. (2016). Towards score following in sheet music images. Proceedings of the 17th International Society for Music Information Retrieval Conference.
  13. Downie, J.S., & Hu, X. (2006). Review mining for music digital libraries: phase II. Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, 196197. DOI: 10.1145/1141753.1141796
  14. Flexer, A. (2007). A closer look on artist filters for musical genre classification. In Proceedings of the 8th International Conference on Music Information Retrieval.
  15. Gouyon, F., Dixon, S., Pampalk, E., & Widmer, G. (2004). Evaluating rhythmic descriptors for musical genre classification. In Proceedings of the AES 25th International Conference, 196204.
  16. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770778. DOI: 10.1109/CVPR.2016.90
  17. Howard, A.G. (2013). Some improvements on deep convolutional neural network based image classification. arXiv preprint arXiv:1312.5402.
  18. Hu, X., & Downie, J. (2006). Stylistics in customer reviews of cultural objects. SIGIR Forum, 4951.
  19. Hu, X., Downie, J., West, K., & Ehmann, A. (2005). Mining music reviews: Promising preliminary results. In Proceedings of the 6th International Conference on Music Information Retrieval.
  20. Jain, H., Prabhu, Y., & Varma, M. (2016). Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 935944. DOI: 10.1145/2939672.2939756
  21. Kim, Y. (2014). Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 17461751. DOI: 10.3115/v1/D14-1181
  22. Kingma, D.P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  23. Laurier, C., Grivolla, J., & Herrera, P. (2008). Multimodal music mood classification using audio and lyrics. In Seventh IEEE International Conference on Machine Learning and Applications, 688693. DOI: 10.1109/ICMLA.2008.96
  24. Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems, 21772185.
  25. Libeks, J., & Turnbull, D. (2011). You can judge an artist by an album cover: Using images for music annotation. IEEE MultiMedia, 18(4), 3037. DOI: 10.1109/MMUL.2011.1
  26. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014). Microsoft COCO: Common objects in context. In European Conference on Computer Vision, 740755. DOI: 10.1007/978-3-319-10602-1_48
  27. Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In Proceedings of the 1st International Symposium on Music Information Retrieval.
  28. Maaten, L. v. d., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 25792605.
  29. McAuley, J., Targett, C., Shi, Q., & Van Den Hengel, A. (2015). Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 4352. DOI: 10.1145/2766462.2767755
  30. McFee, B., Bertin-Mahieux, T., Ellis, D. P. W., & Lanckriet, G. R. G. (2012). The Million Song Dataset challenge. In WWW’12 Companion: Proceedings of the 21st International Conference on World Wide Web, 909916. DOI: 10.1145/2187980.2188222
  31. McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., & Nieto, O. (2015). librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, 17.
  32. McKay, C., & Fujinaga, I. (2008). Combining features extracted from audio, symbolic and cultural sources. In Proceedings of the 9th International Conference on Music Information Retrieval.
  33. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 31113119.
  34. Moro, A., Raganato, A., & Navigli, R. (2014). Entity linking meets word sense disambiguation: A unified approach. Transactions of the Association for Computational Linguistics, 2, 231244.
  35. Navigli, R., & Ponzetto, S.P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217250. DOI: 10.1016/j.artint.2012.07.001
  36. Neumayer, R., & Rauber, A. (2007). Integration of text and audio features for genre classification in music information retrieval. In European Conference on Information Retrieval, 724727. DOI: 10.1007/978-3-540-71496-5_78
  37. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A.Y. (2011). Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning, 689696.
  38. Oramas, S. (2017). Semantic enrichment for similarity and classification. In Knowledge Extraction and Representation Learning for Music Recommendation and Classification, chapter 6, 7588. PhD Thesis, Universitat Pompeu Fabra.
  39. Oramas, S., Espinosa-Anke, L., Lawlor, A., & Serra, X. (2016a). Exploring customer reviews for music genre classification and evolutionary studies. In Proceedings of the 17th International Society for Music Information Retrieval Conference.
  40. Oramas, S., Espinosa-Anke, L., Sordo, M., Saggion, H., & Serra, X. (2016b). ELMD: An automatically generated entity linking gold standard dataset in the music domain. In Proceedings of the 10th International Conference on Language Resources and Evaluation.
  41. Oramas, S., Gómez, F., Gómez, E., & Mora, J. (2015). FlaBase: Towards the creation of a flamenco music knowledge base. In Proceedings of the 16th International Society for Music Information Retrieval Conference.
  42. Oramas, S., Nieto, O., Barbieri, F., & Serra, X. (2017a). Multi-label music genre classification from audio, text, and images using deep features. Proceedings of the 18th International Society for Music Information Retrieval Conference.
  43. Oramas, S., Nieto, O., Sordo, M., & Serra, X. (2017b). A deep multimodal approach for cold-start music recommendation. 2nd Workshop on Deep Learning for Recommender Systems, collocated with RecSys 2017.
  44. Pachet, F., & Cazaly, D. (2000). A taxonomy of musical genres. In Content-Based Multimedia Information Access, 2, 12381245.
  45. Pons, J., Lidy, T., & Serra, X. (2016). Experimenting with musically motivated convolutional neural networks. In 14th International Workshop on Content-Based Multimedia Indexing, 16. IEEE.
  46. Pons, J., Nieto, O., Prockup, M., Schmidt, E.M., Ehmann, A.F., & Serra, X. (2017). End-to-end learning for music audio tagging at scale. arXiv preprint arXiv:1711.02520.
  47. Razavian, A.S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 512519. DOI: 10.1109/CVPRW.2014.131
  48. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3), 211252. DOI: 10.1007/s11263-015-0816-y
  49. Sanden, C., & Zhang, J.Z. (2011). Enhancing multi-label music genre classification through ensemble techniques. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 705714. SIGIR ’11. DOI: 10.1145/2009916.2010011
  50. Schedl, M., Orio, N., Liem, C., & Peeters, G. (2013). A professionally annotated and enriched multi-modal data set on popular music. In Proceedings of the 4th ACM Multimedia Systems Conference, 7883. DOI: 10.1145/2483977.2483985
  51. Schindler, A., & Rauber, A. (2015). An audio-visual approach to music genre classification through affective color features. In European Conference on Information Retrieval, 6167.
  52. Schörkhuber, C., & Klapuri, A. (2010). Constant-Q transform toolbox for music processing. In 7th Sound and Music Computing Conference, 364.
  53. Schreiber, H. (2015). Improving genre annotations for the Million Song Dataset. Proceedings of the 16th International Society for Music Information Retrieval Conference.
  54. Sermanet, P., & LeCun, Y. (2011). Traffic sign recognition with multi-scale convolutional networks. In International Joint Conference on Neural Networks, 28092813. IEEE.
  55. Seyerlehner, K., Schedl, M., Pohle, T., & Knees, P. (2010a). Using block-level features for genre classification, tag classification and music similarity estimation. Submission to Audio Music Similarity and Retrieval Task of MIREX.
  56. Seyerlehner, K., Widmer, G., Schedl, M., & Knees, P. (2010b). Automatic music tag classification based on block-level. In 7th Sound and Music Computing Conference.
  57. Sordo, M. (2012). Semantic annotation of music col-lections: A computational approach. PhD thesis, Universitat Pompeu Fabra.
  58. Srivastava, N., & Salakhutdinov, R.R. (2012). Multi-modal learning with deep Boltzmann machines. In Advances in Neural Information Processing Systems, 22222230.
  59. Sturm, B.L. (2012). A survey of evaluation in music genre recognition. In International Workshop on Adaptive Multimedia Retrieval, 2966.
  60. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 19.
  61. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 28182826.
  62. Tsoumakas, G., & Katakis, I. (2006). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3).
  63. Turnbull, D., Barrington, L., Torres, D., & Lanckriet, G. (2008). Semantic annotation and retrieval of music and sound effects. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 467476. DOI: 10.1109/TASL.2007.913750
  64. Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293302. DOI: 10.1109/TSA.2002.800560
  65. van den Oord, A., Dieleman, S., & Schrauwen, B. (2013). Deep content-based music recommendation. In Advances in Neural Information Processing Systems, 26432651.
  66. Wang, F., Wang, X., Shao, B., Li, T., & Ogihara, M. (2009). Tag integrated multi-label music style classification with hypergraph. In Proceedings of the 10th International Society for Music Information Retrieval Conference.
  67. Wu, X., Qiao, Y., Wang, X., & Tang, X. (2016). Bridging music and image via cross-modal ranking analysis. IEEE Transactions on Multimedia, 18(7), 13051318. DOI: 10.1109/TMM.2016.2557722
  68. Yan, F., & Mikolajczyk, K. (2015). Deep correlation for matching images and text. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 34413450. DOI: 10.1109/CVPR.2015.7298966
  69. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, 33203328.
  70. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 29212929.
  71. Zobel, J., & Moffat, A. (1998). Exploring the similarity space. ACM SIGIR Forum, 32(1), 1834.
DOI: https://doi.org/10.5334/tismir.10 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jan 20, 2018
Accepted on: May 1, 2018
Published on: Sep 4, 2018
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2018 Sergio Oramas, Francesco Barbieri, Oriol Nieto, Xavier Serra, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.