HamNava: A Dataset for Multi‑Label Instrument Classification

Pouya Mohseni; Bagher BabaAli; Hooman Asadi

doi:10.5334/tismir.257

HamNava: A Dataset for Multi‑Label Instrument Classification

Transactions of the International Society for Music Information Retrieval

Volume 8 (2025): Issue 1

By: Pouya Mohseni, Bagher BabaAli and Hooman Asadi

Open Access

|Jul 2025

Figures & Tables

Table 1

Comparison of widely used instrument recognition datasets in polyphonic music, where ‘Predominant’ refers to the main instrument, ‘Partial’ refers to some active instruments, and ‘Full’ refers to all instruments present in the audio.

Dataset	#Examples	#Instruments	Duration	Labeling
OpenMIC‑2018	20,000	20	10 s	Partial
IRMAS	6,705	11	3 s	Predominant
Slakh2100	1405	35	Track	Full
Cerberus4	1327	4	Track	Full
MusicNet	330	11	Track	Full
MedleyDB	122	80	Track	Full
URMP	44	14	Track	Full
HamNava	6,000	9	5 s	Full

Screenshots of the crowd‑sourcing app used for annotations: the Persian interface (right) and its English translation (left).

Analysis of instrument distributions in annotated excerpts, binarized with a 0.5 threshold.

Distribution of excerpts by difficulty level and annotation quality across instruments, with inter‑annotator agreement and annotator confidence factors.

Normalized percentage matrix of instrument co‑occurrence across annotated excerpts, binarized with a 0.5 threshold.

(a) Trained on hard‑labeled data and evaluated using accuracy and F1 score.

	MSE		R2
Model	Val	Test	Val	Test
CNN Baseline	9.07	9.28	50.78	51.08
Music2vec	5.09	5.10	72.74	73.13
MusicHuBERT	6.85	7.30	64.40	61.54
MERT	5.15	5.54	72.33	70.83

(b) Trained on soft‑labeled data and evaluated using accuracy and F1 metrics, considering model outputs greater than 0.5 as 1.

	Acc		F1
Model	Val	Test	Val	Test
CNN Baseline	84.48	82.5	67.76	65.81
Music2vec	91.31	91.59	84.69	86.06
MusicHuBERT	89.45	88.21	81.73	81.02
MERT	91.38	90.85	84.91	84.98

(c) Trained on hard‑labeled data, with the aggregation of elicited confidence greater than 0.5 considered as 1, and evaluated using accuracy and F1 metrics.

	Acc		F1
Model	Val	Test	Val	Test
CNN Baseline	85.49	84.47	69.96	68.62
Music2vec	91.49	91.36	82.11	81.39
MusicHuBERT	88.97	87.76	75.84	72.47
MERT	91.14	90.39	80.85	78.49

[i] The validation set is used for hyperparameter tuning, and the test score of the best‑performing model on the validation set is highlighted in bold.

Table 3

Instrument‑wise performance of Music2Vec on the test set.

Label Type*	Metric	Tonbak	Singer	Tar	Kamancheh	Santour	Oud	Ney	Setar	Daf
Soft	MSE	5.97	3.28	6.19	7.05	6.06	3.08	5.78	6.03	2.45
	R2	70.69	86.19	69.23	64.56	62.68	77.67	63.98	56.08	51.37
	Acc	91.87	96.22	89.69	86.48	91.41	93.36	90.61	88.66	95.99
	F1	93.47	95.45	88.19	82.23	79.22	83.52	78.65	65.26	72.87
Hard	Acc	92.55	95.99	89.92	86.71	91.18	93.01	90.15	87.17	95.53
Hard	F1	94.05	95.16	88.60	83.04	79.02	83.20	78.39	60.28	70.68

[i] The best performance for each instrument is highlighted in bold. The asterisk (*) refers to the labels used during training, where ‘Hard’ means the aggregation of elicited confidence scores greater than 0.5 is considered as 1 during training.

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/tismir.257 | Journal eISSN: 2514-3298

Journal RSS Feed

Language: English

Submitted on: Feb 15, 2025

Accepted on: Jun 15, 2025

Published on: Jul 28, 2025

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Iranian classical music,

culturally diverse MIR,

multi‑instrument,

crowd‑sourced annotation

© 2025 Pouya Mohseni, Bagher BabaAli, Hooman Asadi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 8 (2025): Issue 1

HamNava: A Dataset for Multi‑Label Instrument Classification

Figures & Tables

Table 1

Figure 1

Figure 2

Figure 3

Figure 4

Table 3

Paradigm

My account