Skip to main content
Have a personal or library account? Click to login

References

  1. Balke, S., Berndt, A., and Müller, M. (2025). ChoraleBricks: A modular multitrack dataset for wind music research. Transaction of the International Society for Music Information Retrieval (TISMIR), 8(1), 3954.
  2. Bertin, N., and de Cheveigné, A. (2005). Scalable metadata and quick retrieval of audio signals. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), London, UK (pp. 238244).
  3. Bertin‑Mahieux, T., Ellis, D. P. W., Whitman, B., and Lamere, P. (2011). The million song dataset. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR) Miami, Florida, USA (pp. 591596).
  4. Bittner, R. M., Fuentes, M., Rubinstein, D., Jansson, A., Choi, K., and Kell, T. (2019). mirdata: Software for reproducible usage of datasets. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands (pp. 99106).
  5. Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., and Bello, J. P. (2014). MedleyDB: A multitrack dataset for annotation‑intensive MIR research. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan (pp. 155160).
  6. Böck, S., Korzeniowski, F., Schlüter, J., Krebs, F., and Widmer, G. (2016). madmom: A new Python audio and music signal processing library. In Proceedings of the ACM International Conference on Multimedia (ACM‑MM), Amsterdam, The Netherlands (pp. 11741178).
  7. Böck, S., Krebs, F., and Widmer, G. (2016). Joint beat and downbeat tracking with recurrent neural networks. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), New York City, New York, USA (pp. 255261).
  8. Bruderer, M. J., McKinney, M., and Kohlrausch, A. (2006). Structural boundary perception in popular music. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Victoria, Canada (pp. 198201).
  9. Cho, T., and Bello, J. P. (2011). A feature smoothing method for chord recognition using recurrence plots. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Miami, Florida, USA (pp. 651656).
  10. Davies, M. E. P., Hamel, P., Yoshii, K., and Goto, M. (2014). AutoMashUpper: Automatic creation of multi‑song music mashups. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(12), 17261737.
  11. Derrien, O., Duhamel, P., Charbit, M., and Richard, G. (2006). A new quantization optimization algorithm for the mpeg advanced audio coder using a statistical subband model of the quantization noise. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 14(4), 13281339.
  12. Dessein, A., Cont, A., and Lemaitre, G. (2010). Real‑time polyphonic music transcription with non‑negative matrix factorization and beta‑divergence. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Utrecht, The Netherlands (pp. 489494).
  13. Downie, J. S. (2008). The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoustical Science and Technology, 29(4), 247255.
  14. Emiya, V., Bertin, N., David, B., and Badeau, R. (2010). MAPS ‑ A piano database for multipitch estimation and automatic transcription of music. Research report.
  15. Ewert, S., Müller, M., and Grosche, P. (2009). High resolution audio synchronization using chroma onset features. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan (pp. 18691872).
  16. Fabbro, G., Uhlich, S., Lai, C.‑H., Choi, W., Martínez‑Ramírez, M., Liao, W., Gadelha, I., Ramos, G., Hsu, E., Rodrigues, H., Stöter, F., Défossez, A., Luo, Y., Yu, J., Chakraborty, D., Mohanty, S., Solovyev, R., Stempkovskiy, A., Habruseva, T., . . . Mitsufuji, Y. (2024). The sound demixing challenge 2023 – music demixing track. Transactions of the International Society for Music Information Retrieval (TISMIR), 7(1), 6384.
  17. Fujihara, H., and Goto, M. (2007). A music information retrieval system based on singing voice timbre. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Vienna, Austria (pp. 467470).
  18. Fujihara, H., Goto, M., Ogata, J., Komatani, K., Ogata, T., and Okuno, H. G. (2006). Automatic synchronization between lyrics and music cd recordings based on viterbi alignment of segregated vocal signals. In IEEE International Symposium on Multimedia (ISM), Los Alamitos, California, USA (pp. 257264).
  19. Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., and Okuno, H. G. (2005). Singer identification based on accompaniment sound reduction and reliable frame selection. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), London, UK (pp. 329336).
  20. Fujihara, H., Kitahara, T., Goto, M., Komatani, K., Ogata, T., and Okuno, H. G. (2006). F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, France (pp. 253256).
  21. Gao, Y., Zhang, X., and Li, W. (2021). Vocal melody extraction via hrnet‑based singing voice separation and encoder‑decoder‑based f0 estimation. Electronics, 10(3), 114.
  22. Gaultier, C., Kitić, S., Gribonval, R., and Bertin, N. (2021). Sparsity‑based audio declipping methods: Selected overview, new algorithms, and large‑scale evaluation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 11741187.
  23. Gillet, O., and Richard, G. (2005). Indexing and querying drum loops databases. In Proceedings of the International Workshop on Content‑Based Multimedia Indexing (CBMI).
  24. Gotham, M., Bemman, B., and Vatolkin, I. (2025). Towards an ‘everything corpus’: A framework and guidelines for the curation of more comprehensive multimodal music data. Transactions of the International Society for Music Information Retrieval, 8(1), 7092.
  25. Goto, M. (2003). A chorus‑section detecting method for musical audio signals. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hong Kong, China (pp. 437440).
  26. Goto, M. (2004). Development of the RWC Music Database. In Proceedings of the International Congress on Acoustics (ICA) (pp. 553556).
  27. Goto, M. (2006). AIST Annotation for the RWC Music Database. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Victoria, Canada (pp. 359360).
  28. Goto, M., and Goto, T. (2005). Musicream: New music playback interface for streaming, sticking, sorting, and recalling musical pieces. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), London, UK (pp. 404411).
  29. Goto, M., Hashiguchi, H., Nishimura, T., and Oka, R. (2002). RWC Music Database: Popular, classical and jazz music databases. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Paris, France (pp. 287288).
  30. Goto, M., Hashiguchi, H., Nishimura, T., and Oka, R. (2003). RWC Music Database: Music genre database and musical instrument sound database. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Baltimore, Maryland, USA (pp. 229230).
  31. Grosche, P., and Müller, M. (2011). Extracting predominant local pulse information from music recordings. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 16881701.
  32. Hamanaka, M. (2006). Music scope headphones: Natural user interface for selection of music. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Victoria, Canada (pp. 302307).
  33. Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C. A., Dieleman, S., Elsen, E., Engel, J. H., and Eck, D. (2019). Enabling factorized piano music modeling and generation with the MAESTRO dataset. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, Louisiana, USA.
  34. Kameoka, H., Nishimoto, T., and Sagayama, S. (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982994.
  35. Katayose, H., Yatsui, A., and Goto, M. (2005). A mix‑down assistant interface with reuse of examples. In Proceedings of the International Conference on Automated Production of Cross Media Content for Multi‑Channel Distribution (AXMEDIS), (pp. 18).
  36. Kitahara, T., Goto, M., Komatani, K., Ogata, T., and Okuno, H. G. (2005). Instrument identification in polyphonic music: Feature weighting with mixed sounds, pitch‑dependent timbre modeling, and use of musical context. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), London, UK (pp. 558563).
  37. Korzeniowski, F., and Widmer, G. (2016). Feature learning for chord recognition: The deep chroma extractor. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), New York City, New York, USA (pp. 3743).
  38. Lagrange, M., Ozerov, A., and Vincent, E. (2012). Robust singer identification in polyphonic music using melody enhancement and uncertainty‑based learning. In Proceedings of the 13th International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal.
  39. Lehner, B., Sonnleitner, R., and Widmer, G. (2013). Towards light‑weight, real‑time‑capable singing voice detection. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil (pp. 5358).
  40. Lin, L., Xia, G., Zhang, Y., and Jiang, J. (2024). Arrange, inpaint, and refine: Steerable long‑term music audio generation and editing via content‑based controls. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) (pp. 76907698).
  41. Mauch, M., and Dixon, S. (2014). pYIN: A fundamental frequency estimator using probabilistic threshold distributions. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy (pp. 659663).
  42. McFee, B., Kim, J. W., Cartwright, M., Salamon, J., Bittner, R. M., and Bello, J. P. (2019). Open‑source practices for music signal processing research: Recommendations for transparent, sustainable, and reproducible audio research. IEEE Signal Processing Magazine, 36(1), 128137.
  43. McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., and Nieto, O. (2015). Librosa: Audio and music signal analysis in Python. In Proceedings the Python Science Conference, Austin, Texas, USA (pp. 1825).
  44. Müller, M., Balke, S., and Goto, M. (2025). The story behind the RWC Music Database: An interview with Masataka Goto. Transaction of the International Society for Music Information Retrieval (TISMIR), 8(1), 110.
  45. Müller, M., Özer, Y., Krause, M., Prätzlich, T., and Driedger, J. (2021). Sync Toolbox: A Python package for efficient, robust, and accurate music synchronization. Journal of Open Source Software (JOSS), 6(64), 3434.
  46. Müller, M., and Zalkow, F. (2019). FMP Notebooks: Educational material for teaching and learning fundamentals of music processing. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Delft, The Netherlands (pp. 573580).
  47. Nakano, T., and Goto, M. (2009). VocaListener: A singing‑to‑singing synthesis system based on iterative parameter estimation. In Proceedings of the 6th Sound and Music Computing Conference (SMC) (pp. 343348).
  48. Nakano, T., Yoshii, K., Wu, Y., Nishikimi, R., Lin, K. W. E., and Goto, M. (2019). Joint singing pitch estimation and voice separation based on a neural harmonic structure renderer. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 160164).
  49. Nishikimi, R., Nakamura, E., Fukayama, S., Goto, M., and Yoshii, K. (2019). Automatic singing transcription based on encoder‑decoder recurrent neural networks with a weakly‑supervised attention mechanism. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK (pp. 161165).
  50. Özer, Y., Brütting, L., Schwär, S., and Müller, M. (2024). libsoni: A Python toolbox for sonifying music annotations and feature representations. Journal of Open Source Software (JOSS), 9(96), 06524.
  51. Paulus, J., and Klapuri, A. P. (2006). Music structure analysis by finding repeated parts. In Proceedings of the 1st ACM Audio and Music Computing Multimedia Workshop, Santa Barbara, California, USA (pp. 5968).
  52. Paulus, J., and Klapuri, A. P. (2009). Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 17(6), 11591170.
  53. Prätzlich, T., Driedger, J., and Müller, M. (2016). Memory‑restricted multiscale dynamic time warping. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China (pp. 569573).
  54. Raffel, C., McFee, B., Humphrey, E. J., Salamon, J., Nieto, O., Liang, D., and Ellis, D. P. W. (2014). MIR_EVAL: A transparent implementation of common MIR metrics. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan (pp. 367372).
  55. Rafii, Z., Liutkus, A., Stöter, F., Mimilakis, S. I., and Bittner, R. (2017). The MUSDB18 corpus for music separation.
  56. Reed, J., and Lee, C. (2006). A study on music genre classification based on universal acoustic models. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Victoria, Canada (pp. 8994).
  57. Ryynänen, M., and Klapuri, A. (2005). Polyphonic music transcription using note event modeling. In Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, New York, USA (pp. 319322).
  58. Ryynänen, M., and Klapuri, A. (2006). Transcription of the singing melody in polyphonic music. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Victoria, Canada (pp. 222227).
  59. Tsushima, H., Nakamura, E., Itoyama, K., and Yoshii, K. (2017). Function‑and rhythm‑aware melody harmonization based on tree‑structured parsing and split‑merge sampling of chord sequences. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Suzhou, China (pp. 502508).
  60. Vincent, E., and Gribonval, R. (2005). Construction d’estimateurs oracles pour la séparation de sources. In XXe colloque GRETSI (traitement du signal et des images), Louvain‑la‑Neuve, Belgium.
  61. Wang, J.‑C., Hung, Y.‑N., and Smith, J. B. L. (2022). To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 416420).
  62. Watanabe, K., and Goto, M. (2023). Text‑to‑lyrics generation with image‑based semantics and reduced risk of plagiarism. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Milan, Italy (pp. 398406).
  63. Weiß, C., Arifi‑Müller, V., Krause, M., Zalkow, F., Klauk, S., Kleinertz, R., and Müller, M. (2023). Wagner Ring dataset: A complex opera scenario for music processing and computational musicology. Transactions of the International Society for Music Information Retrieval (TISMIR), 6(1), 135149.
  64. Weiß, C., Zalkow, F., Arifi‑Müller, V., Müller, M., Koops, H. V., Volk, A., and Grohganz, H. (2021). Schubert Winterreise dataset: A multimodal scenario for music analysis. ACM Journal on Computing and Cultural Heritage (JOCCH), 14(2), 25:118.
  65. Yoshii, K., and Goto, M. (2008). Music Thumbnailer: Visualizing musical pieces in thumbnail images based on acoustic features. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Philadelphia, Pennsylvania, USA (pp. 211216).
  66. Yoshii, K., Goto, M., and Okuno, H. G. (2004). Automatic drum sound description for real‑world music using template adaptation and matching methods. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Barcelona, Spain (pp. 184191).
  67. Yoshii, K., Komatani, K., Ogata, T., Okuno, H., and Goto, M. (2006). An error correction framework based on drum pattern periodicity for improving drum sound detection. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, France (pp. 237240).
  68. Yoshii, K., Tomioka, R., Mochihashi, D., and Goto, M. (2013). Infinite positive semidefinite tensor factorization for source separation of mixture signals. In International conference on machine learning (ICML) (pp. 576584).
  69. Zhou, X., and Lerch, A. (2015). Chord detection using deep learning. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain (pp. 5258).
DOI: https://doi.org/10.5334/tismir.326 | Journal eISSN: 2514-3298
Language: English
Submitted on: Jul 21, 2025
Accepted on: Nov 10, 2025
Published on: Feb 13, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Stefan Balke, Johannes Zeitler, Vlora Arifi-Müller, Brian McFee, Tomoyasu Nakano, Masataka Goto, Meinard Müller, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.