Skip to main content
Have a personal or library account? Click to login
Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models Cover

Attend to Chords: Improving Harmonic Analysis of Symbolic Music Using Transformer-Based Models

By: Tsung-Ping Chen and  Li Su  
Open Access
|Feb 2021

References

  1. Ba, L. J., Kiros, R., and Hinton, G. E. (2016). Layer normalization. In arXiv preprint arXiv: 1607.06450.
  2. Bahdanau, D., Cho, K., and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR).
  3. Belinkov, Y., Durrani, N., Dalvi, F., Sajjad, H., and Glass, J. R. (2017). What do neural machine translation models learn about morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), pages 861872. DOI: 10.18653/v1/P17-1080
  4. Boulanger-Lewandowski, N., Bengio, Y., and Vincent, P. (2013). Audio chord recognition with recurrent neural networks. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR), pages 335340.
  5. Carsault, T., Nika, J., and Esling, P. (2018). Using musical relationships between chord labels in automatic chord extraction tasks. In Proceedings of the 19th International Society for Music Information Retrieval Conference, (ISMIR), pages 1825.
  6. Chen, T. and Su, L. (2018). Functional harmony recognition of symbolic music data with multi-task recurrent neural networks. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), pages 9097.
  7. Chen, T. and Su, L. (2019). Harmony Transformer: Incorporating chord segmentation into harmony recognition. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 259267.
  8. Cho, T. and Bello, J. P. (2009). Real-time implementation of HMM-based chord estimation in music audio. In Proceedings of the International Computer Music Conference (ICMC).
  9. Chung, J., Ahn, S., and Bengio, Y. (2017). Hierarchical multiscale recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR).
  10. Dai, Z., Yang, Z., Yang, Y., Carbonell, J. G., Le, Q. V., and Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Conference of the Association for Computational Linguistics (ACL), pages 29782988. DOI: 10.18653/v1/P19-1285
  11. de Haas, W. B., Magalhães, J. P., Veltkamp, R. C., and Wiering, F. (2011). HARMTRACE: Improving harmonic similarity estimation using functional harmony analysis. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR), pages 6772.
  12. Degani, A., Dalai, M., Leonardi, R., and Migliorati, P. (2015). Harmonic change detection for musical chords segmentation. In 2015 IEEE International Conference on Multimedia and Expo (ICME), pages 16. DOI: 10.1109/ICME.2015.7177404
  13. Degani, A., Dalai, M., Leonardi, R., and Migliorati, P. (2017). Audio chord estimation based on meter modeling and two-stage decoding. In Proceedings of the 10th International Symposium on Image and Signal Processing and Analysis (ISPA), pages 6569. DOI: 10.1109/ISPA.2017.8073570
  14. Deng, J. and Kwok, Y. (2016). A hybrid Gaussian-HMM-deep learning approach for automatic chord estimation with very large vocabulary. In Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR), pages 812818.
  15. Deng, J. and Kwok, Y. (2017). Large vocabulary automatic chord estimation with an even chance training scheme. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 531536.
  16. Devaney, J., Arthur, C., Condit-Schultz, N., and Nisula, K. (2015). Theme and variation encodings with Roman numerals (TAVERN): A new data set for symbolic music analysis. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pages 728734.
  17. Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT), pages 41714186.
  18. Donahue, C., Mao, H. H., Li, Y. E., Cottrell, G. W., and McAuley, J. J. (2019). LakhNES: Improving multiinstrumental music generation with cross-domain pre-training. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 685692.
  19. Dong, H. and Yang, Y. (2018). Convolutional generative adversarial networks with binary neurons for polyphonic music generation. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), pages 190196.
  20. Fujishima, T. (1999). Realtime chord recognition of musical sound: a system using Common Lisp Music. In Proceedings of the International Computer Music Conference (ICMC).
  21. Gotham, M. and Ireland, M. (2019). Taking form: A representation standard, conversion code, and example corpora for recording, visualizing, and studying analyses of musical form. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 693699.
  22. Gu, J., Bradbury, J., Xiong, C., Li, V. O. K., and Socher, R. (2018). Non-autoregressive neural machine translation. In Proceedings of the 6th International Conference on Learning Representations (ICLR).
  23. Harte, C., Sandler, M., and Gasser, M. (2006). Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia, pages 2126. DOI: 10.1145/1178723.1178727
  24. He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770778. DOI: 10.1109/CVPR.2016.90
  25. Hermann, K. M., Kociský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. (2015). Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems (NIPS), pages 16931701.
  26. Hori, T., Nakamura, K., and Sagayama, S. (2017). Music chord recognition from audio data using bidirectional encoder-decoder LSTMs. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pages 13121315. DOI: 10.1109/APSIPA.2017.8282235
  27. Hossain, M. Z., Sohel, F., Shiratuddin, M. F., Laga, H., and Bennamoun, M. (2019). Bi-SAN-CAP: Bidirectional self-attention for image captioning. In Proceedings of the Digital Image Computing: Techniques and Applications (DICTA), pages 17. DOI: 10.1109/DICTA47822.2019.8946003
  28. Hou, J., Guo, W., Song, Y., and Dai, L. (2020). Segment boundary detection directed attention for online end-to-end speech recognition. EURASIP J. Audio, Speech and Music Processing, 2020(1), 3. DOI: 10.1186/s13636-020-0170-z
  29. Huang, C. A., Vaswani, A., Uszkoreit, J., Simon, I., Hawthorne, C., Shazeer, N., Dai, A. M., Hoffman, M. D., Dinculescu, M., and Eck, D. (2019). Music Transformer: Generating music with long-term structure. In Proceedings of 7th International Conference on Learning Representations (ICLR).
  30. Humphrey, E. J. and Bello, J. P. (2012). Rethinking automatic chord recognition with convolutional neural networks. In Proceedings of the 11th International Conference on Machine Learning and Applications (ICMLA), pages 357362. DOI: 10.1109/ICMLA.2012.220
  31. Humphrey, E. J. and Bello, J. P. (2015). Four timely insights on automatic chord estimation. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pages 673679.
  32. Illescas, P. R., Rizo, D., and Quereda, J. M. I. (2007). Harmonic, melodic, and functional automatic analysis. In Proceedings of the International Computer Music Conference (ICMC).
  33. Jiang, J., Chen, K., Li, W., and Xia, G. (2019). Large-vocabulary chord transcription via chord structure decomposition. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 644651.
  34. Korzeniowski, F. and Widmer, G. (2016). A fully convolutional deep auditory model for musical chord recognition. In Proceedings of the 26th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pages 16. DOI: 10.1109/MLSP.2016.7738895
  35. Korzeniowski, F. and Widmer, G. (2017). On the futility of learning complex frame-level language models for chord recognition. In Proceedings of the AES International Conference on Semantic Audio.
  36. Korzeniowski, F. and Widmer, G. (2018). Improved chord recognition by combining duration and harmonic language models. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), pages 1017.
  37. Lee, K. (2006). Automatic chord recognition from audio using enhanced pitch class profile. In Proceedings of the International Computer Music Conference (ICMC).
  38. Li, X. and Wu, X. (2015). Long Short-Term Memory based convolutional recurrent neural networks for large vocabulary speech recognition. In Proceedings of the 16th Annual Conference of the International Speech Communication Association (INTERSPEECH), pages 32193223. DOI: 10.1109/ICASSP.2015.7178826
  39. Lim, Y., Chan, C. S., and Loo, F. Y. (2020). Style-conditioned music generation. In IEEE International Conference on Multimedia and Expo, ICME 2020, London, UK, July 6–10, 2020, pages 16. DOI: 10.1109/ICME46284.2020.9102870
  40. Masada, K. and Bunescu, R. C. (2017). Chord recognition in symbolic music using Semi-Markov Conditional Random Fields. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 272278.
  41. Masada, K. and Bunescu, R. C. (2019). Chord recognition in symbolic music: A segmental CRF model, segment-level features, and comparative evaluations on classical and popular music. Trans. Int. Soc. Music. Inf. Retr., 2(1), 113. DOI: 10.5334/tismir.18
  42. Mauch, M. and Dixon, S. (2010). Simultaneous estimation of chords and musical context from audio. IEEE Trans. Audio, Speech & Language Processing (TASLP), 18(6), 12801289. DOI: 10.1109/TASL.2009.2032947
  43. McFee, B. and Bello, J. P. (2017). Structured training for large-vocabulary chord recognition. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), pages 188194.
  44. Melamud, O., Goldberger, J., and Dagan, I. (2016). Context2vec: Learning generic context embedding with bidirectional LSTM. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL), pages 5161. DOI: 10.18653/v1/K16-1006
  45. Micchi, G., Gotham, M., and Giraud, M. (2020). Not all roads lead to Rome: Pitch representation and model architecture for automatic harmonic analysis. Trans. Int. Soc. Music. Inf. Retr., 3(1), 4254. DOI: 10.5334/tismir.45
  46. Miller, A. H., Fisch, A., Dodge, J., Karimi, A., Bordes, A., and Weston, J. (2016). Key-value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1400409. DOI: 10.18653/v1/D16-1147
  47. Neuwirth, M., Harasim, D., Moss, F. C., and Rohrmeier, M. (2018). The annotated Beethoven corpus (ABC): A dataset of harmonic analyses of all Beethoven string quartets. Front. Digital Humanities, 5. DOI: 10.3389/fdigh.2018.00016
  48. Ni, Y., McVicar, M., Santos-Rodríguez, R., and Bie, T. D. (2013). Understanding effects of subjectivity in measuring chord estimation accuracy. IEEE ACM Trans. Audio Speech Lang. Process., 21(12), 26072615. DOI: 10.1109/TASL.2013.2280218
  49. Oudre, L., Févotte, C., and Grenier, Y. (2011). Probabilistic template-based chord recognition. IEEE Trans. Audio, Speech & Language Processing (TASLP), 19(8), 22492259. DOI: 10.1109/TASL.2010.2098870
  50. Parikh, A. P., Täckström, O., Das, D., and Uszkoreit, J. (2016). A decomposable attention model for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 22492255. DOI: 10.18653/v1/D16-1244
  51. Park, J., Choi, K., Jeon, S., Kim, D., and Park, J. (2019). A bi-directional Transformer for musical chord recognition. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 620627.
  52. Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., and Tran, D. (2018). Image Transformer. In Proceedings of the 35th International Conference on Machine Learning (ICML), pages 40524061.
  53. Passos, A. T., Sampaio, M., Kröger, P., and de Cidra, G. (2009). Functional harmonic analysis and computational musicology in Rameau. In Proceedings of the 12th Brazilian Symposium on Computer Music (SBCM).
  54. Pauwels, J., O’Hanlon, K., Gómez, E., and Sandler, M. B. (2019). 20 years of automatic chord recognition from audio. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 5463.
  55. Raphael, C. and Stoddard, J. (2004). Functional harmonic analysis using probabilistic models. Computer Music Journal, 28(3), 4552. DOI: 10.1162/0148926041790676
  56. Ren, Y., Ruan, Y., Tan, X., Qin, T., Zhao, S., Zhao, Z., and Liu, T. (2019). Fastspeech: Fast, robust and controllable text to speech. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems (NeurIPS), pages 31653174.
  57. Rhodes, C., Lewis, D., and Müllensiefen, D. (2009). Bayesian model selection for harmonic labelling. In Klouche, T. and Noll, T., editors, Mathematics and Computation in Music, pages 107116. Springer Berlin Heidelberg. DOI: 10.1007/978-3-642-04579-0_11
  58. Rocher, T., Robine, M., Hanna, P., and Strandh, R. (2009). Dynamic chord analysis for symbolic music. In Proceedings of the 2009 International Computer Music Conference (ICMC).
  59. Scholz, R. E. P. and Ramalho, G. L. (2008). COCHONUT: recognizing complex chords from MIDI guitar sequences. In Proceedings of the 9th International Conference on Music Information Retrieval (ISMIR), pages 2732.
  60. Shaw, P., Uszkoreit, J., and Vaswani, A. (2018). Selfattention with relative position representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT), pages 464468. DOI: 10.18653/v1/N18-2074
  61. Sheh, A. and Ellis, D. P. W. (2003). Chord segmentation and recognition using EM-trained Hidden Markov Models. In Proceedings of the 4th International Conference on Music Information Retrieval (ISMIR).
  62. Shen, T., Zhou, T., Long, G., Jiang, J., and Zhang, C. (2018). Bi-directional block self-attention for fast and memory-efficient sequence modeling. In Proceedings of the 6th International Conference on Learning Representations (ICLR).
  63. Stark, A. M. and Plumbley, M. D. (2009). Real-time chord recognition for live performance. In Proceedings of the International Computer Music Conference (ICMC).
  64. Tsui, V. and MacLean, W. J. (2002). Harmonic analysis using neural networks. In Proceedings of the International Computer Music Conference (ICMC).
  65. Tymoczko, D., Gotham, M., Cuthbert, M. S., and Ariza, C. (2019). The Romantext format: A flexible and standard method for representing Roman numerial analyses. In Proceedings of the 20th International Society for Music Information Retrieval Conference (ISMIR), pages 123129.
  66. Ueda, Y., Uchiyama, Y., Nishimoto, T., Ono, N., and Sagayama, S. (2010). HMM-based approach for automatic chord detection using refined acoustic features. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 55185521. DOI: 10.1109/ICASSP.2010.5495218
  67. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems (NIPS), pages 59986008.
  68. Wang, Y., Lee, H., and Lee, L. (2018). Segmental audio word2vec: Representing utterances as sequences of vectors with applications in spoken term detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 62696273. DOI: 10.1109/ICASSP.2018.8462002
  69. Yang, M., Su, L., and Yang, Y. (2016). Highlighting root notes in chord recognition using cepstral features and multi-task learning. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pages 18. DOI: 10.1109/APSIPA.2016.7820865
  70. Yoshioka, T., Kitahara, T., Komatani, K., Ogata, T., and Okuno, H. G. (2004). Automatic chord transcription with concurrent recognition of chord symbols and boundaries. In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR).
  71. Zenz, V. and Rauber, A. (2007). Automatic chord detection incorporating beat and key detection. In Proceedings of the IEEE International Conference on Signal Processing and Communications (ICSPC), pages 11751178. DOI: 10.1109/ICSPC.2007.4728534
  72. Zhou, X. and Lerch, A. (2015). Chord detection using deep learning. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), pages 5258.
DOI: https://doi.org/10.5334/tismir.65 | Journal eISSN: 2514-3298
Language: English
Submitted on: May 10, 2020
Accepted on: Jan 7, 2021
Published on: Feb 24, 2021
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2021 Tsung-Ping Chen, Li Su, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.