Skip to main content
Have a personal or library account? Click to login
Supervised Contrastive Models for Music Information Retrieval in Classical Persian Music Cover

Supervised Contrastive Models for Music Information Retrieval in Classical Persian Music

Open Access
|Jan 2026

Figures & Tables

Table 1

Data distribution of the PCID dataset.

Instrument# Train# Test# Val
Daf3,119 (52 m)391 (6.5 m)389 (6.5 m)
Divan3,536 (59 m)442 (7 m)442 (7 m)
Dutar3,039 (50.5 m)381 (6 m)379 (6 m)
Gheychak2,991 (50 m)375 (6 m)374 (6 m)
Kamancheh8,039 (2 h, 14 m)1,006 (16.5 m)1,004 (16.5 m)
Ney Anban4,004 (1 h, 6 m)501 (8 m)500 (8 m)
Ney8,134 (2 h, 15 m)1,018 (17 m)1,016 (17 m)
Oud9,110 (2 h, 32 m)1,140 (19 m)1,138 (19 m)
Qanun3,642 (1 h, 1 m)456 (7.5 m)455 (7.5 m)
Rubab2,991 (50 m)375 (6 m)373 (6 m)
Santur7,856 (2 h, 11 m)983 (16 m)982 (16 m)
Setar12,132 (3 h, 22 m)1,518 (25 m)1,516 (25 m)
Tanbour4,680 (1 h, 18 m)585 (9.5 m)585 (9.5 m)
Tar7,646 (2 h, 7 m)957 (16 m)955 (16 m)
Tonbak4,163 (1 h, 9 m)521 (8.5 m)520 (8.5 m)
Figure 1

Flowchart of our proposed model structure of the model.

Figure 2

Our proposed contrastive (base) model architecture.

Figure 3

Accuracy vs. input length tested on the Nava and PCID datasets (trained on the PCID 5 Instruments subset).

Figure 4

Accuracy vs. input length tested on the Nava and PCID datasets (trained on PCID).

Figure 5

Accuracy vs. input length tested on the Nava and PCID datasets (trained on the original Nava dataset).

Figure 6

Comparison of test accuracy between the proposed model, Baba Ali et al. (2019), and Baba Ali (2024).

Figure 7

Comparison of accuracy for Dastgah detection across Baba Ali et al. (2019), Baba Ali (2024), and the proposed method.

Figure 8

Architecture of the best model for the classifier of the one‑second, 15‑class classification task.

Figure 9

Architecture of the best model for the meta‑classifier of the 20‑second, 15‑class classification task.

Figure 10

t‑SNE projection of penultimate‑layer features for 10,000 one‑second test segments from the PCID.

Figure 11

Normalized confusion matrix (one‑second input, PCID test set).

Table 2

Comparison of instrument classification performance across different studies.

StudyDataset#of classesMethodologyAccuracy (%)F1‑Score (%)
Our StudyExtended Dataset (15 instruments)15Supervised contrastive learning with SSA97.4898
Our StudySubset of Extended Dataset (5 instruments)5Supervised contrastive learning with SSA99.78100
Our StudyNava Dataset (Modified)5Supervised contrastive learning with SSA99.88100
Agostini et al. (2003)Orchestral Instruments Dataset27Spectral features with KNN and neural networks70–80N/A
Essid et al. (2006)Solo Recordings and Mixtures of Western Instruments7MFCCs, timbral descriptors with SVM65–75N/A
Han et al. (2016)Subset of MIREX Dataset (Various Genres and Instruments)11Deep CNNs for predominant instrument recognition7580
Solanki and Pandey (2022)IMRAS Dataset (6705 recordings)11Eight‑layer deep CNN with mel spectrogram input92.61N/A
Prabavathy et al. (2020)RWC Database, MusicBrainz.org, IRMAS, NSynth16SVM and KNN with MFCC and sonogram features99.2995.15
Gong et al. (2021)ChMusic Dataset (Traditional Chinese Instruments)11MFCCs with KNN and majority voting94.15N/A
Humphrey et al. (2018)OpenMIC‑2018 Dataset20Deep learning with CNN and multi‑instance learningN/A78 (AUC‑PR)
Reghunath and Rajan (2022)Polyphonic Music Dataset11Transformer‑based ensemble method8579
Mousavi et al. (2019)PCMIR Dataset (Persian Classical Music)6MFCCs, spectral features with neural network80N/A
Baba Ali et al. (2019)Nava Dataset (Original)5MFCC and i‑vector with SVM84.7584
Baba Ali et al. (2024)Nava Dataset (Original)5Self‑supervised, pre‑trained models99.6499.64
DOI: https://doi.org/10.5334/tismir.271 | Journal eISSN: 2514-3298
Language: English
Submitted on: Apr 26, 2025
Accepted on: Dec 6, 2025
Published on: Jan 7, 2026
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2026 Ali Ahmadi Katamjani, Seyed Abolghasem Mirroshandel, Mahdi Aminian, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.