Skip to main content
Have a personal or library account? Click to login
HamNava: A Dataset for Multi‑Label Instrument Classification Cover
Open Access
|Jul 2025

Figures & Tables

Table 1

Comparison of widely used instrument recognition datasets in polyphonic music, where ‘Predominant’ refers to the main instrument, ‘Partial’ refers to some active instruments, and ‘Full’ refers to all instruments present in the audio.

Dataset#Examples#InstrumentsDurationLabeling
OpenMIC‑201820,0002010 sPartial
IRMAS6,705113 sPredominant
Slakh2100140535TrackFull
Cerberus413274TrackFull
MusicNet33011TrackFull
MedleyDB12280TrackFull
URMP4414TrackFull
HamNava6,00095 sFull
Figure 1

Screenshots of the crowd‑sourcing app used for annotations: the Persian interface (right) and its English translation (left).

Figure 2

Analysis of instrument distributions in annotated excerpts, binarized with a 0.5 threshold.

Figure 3

Distribution of excerpts by difficulty level and annotation quality across instruments, with inter‑annotator agreement and annotator confidence factors.

Figure 4

Normalized percentage matrix of instrument co‑occurrence across annotated excerpts, binarized with a 0.5 threshold.

(a) Trained on hard‑labeled data and evaluated using accuracy and F1 score.

MSER2
ModelValTestValTest
CNN Baseline9.079.2850.7851.08
Music2vec5.095.1072.7473.13
MusicHuBERT6.857.3064.4061.54
MERT5.155.5472.3370.83

(b) Trained on soft‑labeled data and evaluated using accuracy and F1 metrics, considering model outputs greater than 0.5 as 1.

AccF1
ModelValTestValTest
CNN Baseline84.4882.567.7665.81
Music2vec91.3191.5984.6986.06
MusicHuBERT89.4588.2181.7381.02
MERT91.3890.8584.9184.98

(c) Trained on hard‑labeled data, with the aggregation of elicited confidence greater than 0.5 considered as 1, and evaluated using accuracy and F1 metrics.

AccF1
ModelValTestValTest
CNN Baseline85.4984.4769.9668.62
Music2vec91.4991.3682.1181.39
MusicHuBERT88.9787.7675.8472.47
MERT91.1490.3980.8578.49

[i] The validation set is used for hyperparameter tuning, and the test score of the best‑performing model on the validation set is highlighted in bold.

Table 3

Instrument‑wise performance of Music2Vec on the test set.

Label Type*MetricTonbakSingerTarKamanchehSantourOudNeySetarDaf
SoftMSE5.973.286.197.056.063.085.786.032.45
R270.6986.1969.2364.5662.6877.6763.9856.0851.37
Acc91.8796.2289.6986.4891.4193.3690.6188.6695.99
F193.4795.4588.1982.2379.2283.5278.6565.2672.87
HardAcc92.5595.9989.9286.7191.1893.0190.1587.1795.53
F194.0595.1688.6083.0479.0283.2078.3960.2870.68

[i] The best performance for each instrument is highlighted in bold. The asterisk (*) refers to the labels used during training, where ‘Hard’ means the aggregation of elicited confidence scores greater than 0.5 is considered as 1 during training.

DOI: https://doi.org/10.5334/tismir.257 | Journal eISSN: 2514-3298
Language: English
Submitted on: Feb 15, 2025
Accepted on: Jun 15, 2025
Published on: Jul 28, 2025
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2025 Pouya Mohseni, Bagher BabaAli, Hooman Asadi, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.