HamNava: A Dataset for Multi‑Label Instrument Classification

Pouya Mohseni; Bagher BabaAli; Hooman Asadi

doi:10.5334/tismir.257

HamNava: A Dataset for Multi‑Label Instrument Classification

Transactions of the International Society for Music Information Retrieval

Volume 8 (2025): Issue 1

By: Pouya Mohseni, Bagher BabaAli and Hooman Asadi

Open Access

|Jul 2025

Abstract

Despite significant advancements in music information retrieval, much of the progress has focused on musical traditions rooted in Western cultures. One of the hindrances preventing researchers from delving further into other musical traditions is the lack of datasets. This work introduces a new dataset, HamNava, constructed for multi‑label instrument classification. The dataset consists of 6,000 audio excerpts from Iranian classical music with a length of five seconds, each fully labeled with the presence or absence of eight classical instruments and vocals by a flexible number of annotators. We detail the instrument selection process and the methodology used to crowd‑source the annotations. To encourage future work, we also provide statistical results, a dataset split, and a baseline cross‑cultural multi‑label instrument classification on the introduced dataset.

References

Agus, T. R., Suied, C., Thorpe, S. J., and Pressnitzer, D. (2012). Fast recognition of musical sounds based on timbre. The Journal of the Acoustical Society of America, 131(5), 4124–4133.
Search in Google Scholar Back to article
Akbari, A., and Gabdulhakov, R. (2019). Platform surveillance and resistance in Iran and Russia: The case of Telegram. Surveillance & Society, 17(1–2), 223–231.
Search in Google Scholar Back to article
Asadi, H. (2004). Theoretical foundation of Persian classical music: Dastgāh as a multi‑modal cycle. Māhoor Music Quarterly, 22, 43–50.
Search in Google Scholar Back to article
BabaAli, B., and Mohseni, P. (2024). On the effectiveness of self‑supervised pre‑trained models for Persian traditional music information retrieval. ssrn. 4968082.
Search in Google Scholar Back to article
Baevski, A., Hsu, W.‑N., Xu, Q., Babu, A., Gu, J., and Auli, M. (2022). Data2vec: A general framework for self‑supervised learning in speech, vision and language. In International Conference on Machine Learning, pp. 1298–1312. PMLR.
Search in Google Scholar Back to article
Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., and Bello, J. P. (2014). MedleyDB: A multitrack dataset for annotation‑intensive MIR research. In Proceedings of the 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan, pp. 155–160. ISMIR.
Search in Google Scholar Back to article
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K., Davis, J. Q., Demszky, D., . . . Liang, P. (2021). On the opportunities and risks of foundation models. arXiv:2108.07258.
Search in Google Scholar Back to article
Bosch, J. J., Janer, J., Fuhrmann, F., and Herrera, P. (2012). A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In Proceedings of the 13th International Society for Music Information Retrieval Conference, Porto, Portugal, pp. 559–564. ISMIR.
Search in Google Scholar Back to article
Caro Repetto, R. (2018). The Musical Dimension of Chinese Traditional Theatre: An Analysis From Computer Aided Musicology [PhD. thesis]. Universitat Pompeu Fabra.
Search in Google Scholar Back to article
Caro Repetto, R., Pretto, N., Chaachoo, A., Bozkurt, B., and Serra, X. (2018). An open corpus for the computational research of Arab‑Andalusian music. In Proceedings of the 5th International Conference on Digital Libraries for Musicology, DLfM ’18, New York, NY, USA, pp. 78–86. Association for Computing Machinery.
Search in Google Scholar Back to article
Défossez, A., Copet, J., Synnaeve, G., and Adi, Y. (2022). High fidelity neural audio compression. arXiv:2210.13438.
Search in Google Scholar Back to article
Duan, Z., van Kranenburg, P., Nam, J., and Rao, P. (2023). Editorial for TISMIR special collection: Cultural diversity in MIR research. Transactions of the International Society for Music Information Retrieval, 6(1), 203–205.
Search in Google Scholar Back to article
Ericsson, L., Gouk, H., Loy, C. C., and Hospedales, T. M. (2022). Self‑supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine, 39(3), 42–62.
Search in Google Scholar Back to article
Gerhardt, K., and Kirsch, M. (2024). Choralbücher from northern Germany c. 1800: A dataset for studies in hymnology, music culture and figured bass. Transactions of the International Society for Music Information Retrieval, 7(1), 306–315.
Search in Google Scholar Back to article
Gillman, D., Kutlay, A., and Goyat, U. (2022). Teach yourself Georgian folk songs dataset: A annotated corpus of traditional vocal polyphony. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India, pp. 353–360. ISMIR.
Search in Google Scholar Back to article
Gulati, S. (2016). Computational Approaches for Melodic Description in Indian Art Music Corpora [PhD. thesis]. Universitat Pompeu Fabra.
Search in Google Scholar Back to article
Heydarian, P., and Bainbridge, D. (2019). Dastgàh recognition in Iranian music: Different features and optimized parameters. In Proceedings of the 6th International Conference on Digital Libraries for Musicology, DLfM ’19, New York, NY, USA, pp. 53–57. Association for Computing Machinery.
Search in Google Scholar Back to article
Hsu, W.‑N., Bolte, B., Tsai, Y.‑H. H., Lakhotia, K., Salakhutdinov, R., and Mohamed, A. (2021). Hubert: Self‑supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 3451–3460.
Search in Google Scholar Back to article
Humphrey, E., Durand, S., and McFee, B. (2018). Openmic‑2018: An open dataset for multiple instrument recognition. In Proceedings of the 19th International Society for Music Information Retrieval Conference, Paris, France, pp. 438–444. ISMIR.
Search in Google Scholar Back to article
Jamshidi, F., Pike, G., Das, A., and Chapman, R. (2024). Machine learning techniques in automatic music transcription: A systematic survey. arXiv:2406.15249.
Search in Google Scholar Back to article
Ji, S., Yang, X., and Luo, J. (2023). A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 56(1), 1–39.
Search in Google Scholar Back to article
Lartillot, O., Johansson, M. S., Elowsson, A., Monstad, L. L., and Cyvin, M. (2023). A dataset of norwegian hardanger fiddle recordings with precise annotation of note and beat onsets. Transactions of the International Society for Music Information Retrieval, 6(1), 186–202.
Search in Google Scholar Back to article
Li, B., Liu, X., Dinesh, K., Duan, Z., and Sharma, G. (2019). Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications. IEEE Transactions on Multimedia, 21(2), 522–535.
Search in Google Scholar Back to article
Li, Y., Yuan, R., Zhang, G., Ma, Y., Chen, X., Yin, H., Xiao, C., Lin, C., Ragni, A., Benetos, E., Gyenge, N., Dannenberg, R. B., Liu, R., Chen, W., Xia, G., Shi, Y., Huang, W., Wang, Z., Guo, Y., and Fu, J. (2023). Mert: Acoustic music understanding model with large‑scale self‑supervised training. arXiv:2306.00107.
Search in Google Scholar Back to article
Li, Y., Yuan, R., Zhang, G., Ma, Y., Lin, C., Chen, X., Ragni, A., Yin, H., Hu, Z., He, H., Benetos, E., Gyenge, N., Liu, R., and Fu, J. (2022). Map‑music2vec: A simple and effective baseline for self‑supervised music audio representation learning. arXiv:2212.02508.
Search in Google Scholar Back to article
Liberman, A. M., and Mattingly, I. G. (1989). A specialization for speech perception. Science, 243(4890), 489–494.
Search in Google Scholar Back to article
Lidy, T., Silla, C. N., Jr., Cornelis, O., Gouyon, F., Rauber, A., Kaestner, C. A., and Koerich, A. L. (2010). On the suitability of state‑of‑the‑art music information retrieval methods for analyzing, categorizing and accessing non‑western and ethnic music collections. Signal Processing, 90(4), 1032–1048.
Search in Google Scholar Back to article
Ma, Y., Øland, A., Ragni, A., Del Sette, B. M., Saitis, C., Donahue, C., Lin, C., Plachouras, C., Benetos, E., Shatri, E., Morreale, F., Zhang, G., Fazekas, G., Xia, G., Zhang, H., Manco, I., Huang, J., Guinot, J., Lin, L., . . . Wang, Z. (2024). Foundation models for music: A survey. arXiv:2408.14340.
Search in Google Scholar Back to article
Ma, Y., Yuan, R., Li, Y., Zhang, G., Lin, C., Chen, X., Ragni, A., Yin, H., Benetos, E., Gyenge, N., Liu, R., Xia, G., Dannenberg, R. B., Guo, Y., and Fu, J. (2023). On the effectiveness of speech self‑supervised learning for music. In Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, pp. 457–465. ISMIR.
Search in Google Scholar Back to article
Maia, L., Fuentes, M., Biscainho, L., Rocamora, M., and Essid, S. (2019). Sambaset: A dataset of historical samba de enredo recordings for computational music analysis. In Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, pp. 628–635. ISMIR.
Search in Google Scholar Back to article
Maia, L. S., Tomaz Jr, P., Fuentes, M., Rocamora, M., Biscainho, L., Costa, M., and Cohen, S. (2018). A novel dataset of Brazilian rhythmic instruments and some experiments in computational rhythm analysis. In 2018 AES Latin American Congress of Audio Engineering, pp. 53–60. Audio Engineering Society.
Search in Google Scholar Back to article
Manilow, E., Seetharaman, P., and Pardo, B. (2020). Simultaneous separation and transcription of mixtures with multiple polyphonic and percussive instruments. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 771–775. IEEE.
Search in Google Scholar Back to article
Manilow, E., Wichern, G., Seetharaman, P., and Le Roux, J. (2019). Cutting music source separation some slakh: A dataset to study the impact of training data quality and quantity. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 45–49. IEEE.
Search in Google Scholar Back to article
Martín‑Morató, I., Harju, M., Ahokas, P., and Mesaros, A. (2023). Training sound event detection with soft labels from crowdsourced annotations. In ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE.
Search in Google Scholar Back to article
Massoudieh, M. T. (2016). Polyphony in Iranian music. Translingual Discourse in Ethnomusicology, 2, 82–102.
Search in Google Scholar Back to article
Méndez Méndez, A. E., Cartwright, M., Bello, J. P., and Nov, O. (2022). Eliciting confidence for improving crowdsourced audio annotations. In Proceedings of the ACM on Human‑Computer Interaction, 6(CSCW1).
Search in Google Scholar Back to article
Nettl, B. (2012). Iran XI: Music. Encyclopædia Iranica, XIII(5), 474–480.
Search in Google Scholar Back to article
Nikzat, B., and Repetto, R. C. (2022). KDC: An open corpus for computational research of dastgāhi music. In Proceedings of the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India, pp. 321–328. ISMIR.
Search in Google Scholar Back to article
Nunes, L. O., Rocamora, M., Jure, L., and Biscainho, L. W. P. (2015). Beat and downbeat tracking based on rhythmic patterns applied to the Uruguayan candombe drumming. In Proceedings of the 16th International Society for Music Information Retrieval Conference, Málaga, Spain, pp. 264–270. ISMIR.
Search in Google Scholar Back to article
Papaioannou, C., Valiantzas, I., Giannakopoulos, T., Kaliakatsos‑Papakostas, M., and Potamianos, A. (2022). A dataset for Greek traditional and folk music: Lyra. arXiv:2211.11479.
Search in Google Scholar Back to article
Rosenzweig, S., Scherbaum, F., Shugliashvili, D., Arifi‑Müller, V., and Müller, M. (2020). Erkomaishvili dataset: A curated corpus of traditional Georgian vocal music for computational musicology. Transactions of the International Society for Music Information Retrieval, 3(1), 31–41.
Search in Google Scholar Back to article
Şentürk, S. (2016). Computational Analysis of Audio Recordings and Music Scores for the Description and Discovery of Ottoman‑Turkish Makam [PhD. thesis]. Universitat Pompeu Fabra.
Search in Google Scholar Back to article
Serra, X. (2014). Creating research corpora for the computational study of music: The case of the CompMusic project. In Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society.
Search in Google Scholar Back to article
Serra, X. (2017). The computational study of a musical culture through its digital traces. Acta Musicologica, 89(1), 24–44.
Search in Google Scholar Back to article
Srinivasamurthy, A. (2016). A Data‑Driven Bayesian Approach to Automatic Rhythm Analysis of Indian Art Music [PhD thesis]. Universitat Pompeu Fabra.
Search in Google Scholar Back to article
Thickstun, J., Harchaoui, Z., and Kakade, S. (2016). Learning features of music from scratch. arXiv:1611.09827.
Search in Google Scholar Back to article
Zaker Jafari, N. (2019). History of music ensemble and instrumental accompaniment in the Islamic era of Iran until the Qajar. Journal of Fine Arts: Performing Arts & Music, 24(2), 5–16.
Search in Google Scholar Back to article
Zhang, Y., Zhou, Z., Li, X., Yu, F., and Sun, M. (2022). CCOM‑HuQin: An annotated multimodal Chinese fiddle performance dataset. arXiv:2209.06496.
Search in Google Scholar Back to article
Zhou, M., Xu, S., Liu, Z., Wang, Z., Yu, F., Li, W., and Han, B. (2025). CCMusic: An open and diverse database for Chinese music information retrieval research. Transactions of the International Society for Music Information Retrieval, 8(1), 22–38.
Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.5334/tismir.257 | Journal eISSN: 2514-3298

Journal RSS Feed

Language: English

Submitted on: Feb 15, 2025

Accepted on: Jun 15, 2025

Published on: Jul 28, 2025

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Iranian classical music,

culturally diverse MIR,

multi‑instrument,

crowd‑sourced annotation

© 2025 Pouya Mohseni, Bagher BabaAli, Hooman Asadi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 8 (2025): Issue 1