Improving Motif Discovery of Symbolic Polyphonic Music with Motif Note Identification

Jun-You Wang; Yu-Chia Kuo; Li Su

doi:10.5334/tismir.250

Improving Motif Discovery of Symbolic Polyphonic Music with Motif Note Identification

Transactions of the International Society for Music Information Retrieval

Volume 8 (2025): Issue 1

By: Jun-You Wang, Yu-Chia Kuo and Li Su

Open Access

|Sep 2025

Figures & Tables

A visualization of the motif‑discovery task. The excerpt comes from the first four bars of Beethoven’s *Sonata No. 1 in F Minor*, Op. 2, No. 1. Red and blue notes represent two different motifs, while gray notes are non‑motif notes that do not belong to an occurrence of any motif.

An overview of the proposed motif discovery framework, which divides the motif‑discovery task into motif note identification (MNID) and repeated pattern discovery (RPD). During training (left‑hand side), pseudo‑labeling is adopted to address the low‑resource issue for MNID, where the intersection of a pre‑trained melody identification (denoted as MeloID) model’s prediction and the MNID model’s prediction is treated as the pseudo‑label. During testing (right‑hand side), we first use an MNID model to identify motif notes from the input musical score. Then, we apply an RPD algorithm (such as SIATEC or CSA (Hsiao et al., 2023; Meredith et al., 2002) to discover repeated patterns from the identified motif notes.

Comparison of the melody identification (MeloID) and motif note identification (MNID) results on two examples of the Mozart Piano Sonata dataset. **(a)** and **(b)** are the first 10 beats of the the first movement of Mozart’s *Piano Sonata No. 6 in D Major, KV284*. **(c)** and **(d)** are the first 10 beats of the second movement of Mozart’s *Piano Sonata No. 1 in C Major, KV279*.

Table 1

Hyperparameters of the CSA algorithm and their values used in the experiments of both datasets.

Name	Definition	Values
Name	Definition	BPS‑Motif	JKU‑PDD
$θ$	The cardinal score threshold that decides whether to merge two motifs	0.5	0.7
$m$	The minimum note number for any motif occurrence	4	4
$θ_{o}$	The onset tolerance for matching two vectors (between two note pairs)	0	0
$θ_{p}$	The pitch tolerance for matching two vectors (between two note pairs)	1	3
$γ_{o}$	The inter‑onset interval threshold for the compactness condition	2	2
$γ_{p}$	The inter‑pitch interval threshold for the compactness condition	$\infty$	$\infty$
$δ$	The maximum duration of motifs/patterns	12	$\infty$

Table 2

MNID results on both the BPS‑Motif dataset and the JKU‑PDD. The subscripts denote the standard deviations across five folds. Note that, strictly speaking, the results on JKU‑PDD may not reflect the actual motif note identification results, as the ground‑truths here are not always motif notes. Please refer to Section 4.1 for detailed discussion.

MNID	Setting	BPS‑Motif				JKU‑PDD
MNID	Setting	Accuracy	F1‑score	Precision	Recall	Accuracy	F1‑score	Precision	Recall
Skyline	N/A	0.776 $_{0.029}$	0.671 $_{0.055}$	0.591 $_{0.060}$	0.822 $_{0.036}$	0.869 $_{0.131}$	0.716 $_{0.300}$	0.634 $_{0.328}$	0.999 $_{0.003}$
CNN	✗	0.785 $_{0.021}$	0.652 $_{0.043}$	0.604 $_{0.061}$	0.738 $_{0.057}$	0.849 $_{0.100}$	0.681 $_{0.273}$	0.594 $_{0.294}$	0.960 $_{0.048}$
	PL	0.777 $_{0.019}$	0.629 $_{0.046}$	0.596 $_{0.075}$	0.698 $_{0.088}$	0.860 $_{0.088}$	0.678 $_{0.277}$	0.599 $_{0.303}$	0.920 $_{0.075}$
	MI	0.802 $_{0.028}$	0.660 $_{0.050}$	0.636 $_{0.059}$	0.713 $_{0.077}$	0.873 $_{0.078}$	0.693 $_{0.265}$	0.625 $_{0.313}$	0.928 $_{0.062}$
MidiBERT	✗	0.823 $_{0.028}$	0.706 $_{0.053}$	0.669 $_{0.065}$	0.776 $_{0.071}$	0.828 $_{0.128}$	0.662 $_{0.292}$	0.571 $_{0.316}$	0.981 $_{0.012}$
	PL	0.831 $_{0.026}$	0.716 $_{0.056}$	0.679 $_{0.058}$	0.782 $_{0.066}$	0.825 $_{0.131}$	0.660 $_{0.293}$	0.567 $_{0.317}$	0.982 $_{0.013}$
	MI	0.839 $_{0.029}$	0.721 $_{0.063}$	0.701 $_{0.076}$	0.763 $_{0.061}$	0.866 $_{0.088}$	0.683 $_{0.280}$	0.599 $_{0.318}$	0.944 $_{0.041}$

Table 3

The total RPD runtime on the BPS‑Motif dataset in minutes using one Intel i9‑13900KF CPU.

RPD	Runtime (min)
SIATEC	340.9
SIATEC_CS	4076.9
CSA	978.3

Table 4

Motif discovery results on the BPS‑Motif dataset. PL denotes pseudo‑labeling; MI denotes intersection of the melody line and pseudo‑labels. The subscripts est, occ, and thr indicate the establishment, occurrence, and three‑layer measurements, respectively. The last three rows are the oracle setting that assumes an 100% accuracy of motif note identification.

MNID	Setting	RPD	P $_{est}$	R $_{est}$	F $_{est}$	P $_{occ}$	R $_{occ}$	F $_{occ}$	P $_{thr}$	R $_{thr}$	F $_{thr}$
✗	N/A	SIATEC	0.180	0.644	0.280	0.210	0.277	0.224	0.041	0.300	0.071
Skyline	N/A		0.226	0.660	0.333	0.410	0.275	0.308	0.065	0.335	0.107
CNN	✗		0.240	0.631	0.344	0.403	0.221	0.268	0.073	0.319	0.116
	PL		0.236	0.609	0.336	0.399	0.240	0.281	0.072	0.307	0.112
	MI		0.249	0.643	0.354	0.445	0.251	0.307	0.076	0.331	0.121
MidiBERT	✗		0.257	0.653	0.364	0.446	0.263	0.319	0.081	0.337	0.126
	PL		0.258	0.654	0.366	0.440	0.266	0.320	0.082	0.341	0.127
	MI		0.258	0.656	0.366	0.433	0.260	0.309	0.083	0.342	0.130
✗	N/A	CSA	0.553	0.847	0.663	0.130	0.570	0.204	0.127	0.263	0.169
Skyline	N/A		0.555	0.806	0.652	0.312	0.464	0.354	0.214	0.357	0.264
CNN	✗		0.521	0.761	0.610	0.330	0.410	0.349	0.202	0.334	0.246
	PL		0.516	0.718	0.593	0.327	0.388	0.343	0.198	0.320	0.238
	MI		0.540	0.748	0.619	0.327	0.387	0.340	0.211	0.340	0.255
MidiBERT	✗		0.576	0.811	0.668	0.332	0.439	0.362	0.224	0.363	0.272
	PL		0.580	0.818	0.673	0.336	0.436	0.363	0.228	0.359	0.275
	MI		0.591	0.803	0.676	0.350	0.430	0.372	0.231	0.358	0.276
Oracle	N/A	SIATEC	0.288	0.732	0.410	0.536	0.361	0.418	0.105	0.443	0.165
	N/A	SIATEC_CS	0.327	0.583	0.413	0.491	0.312	0.356	0.185	0.322	0.231
	N/A	CSA	0.734	0.866	0.789	0.443	0.540	0.468	0.353	0.480	0.400

Table 5

Motif discovery results on JKU‑PDD. Note that, in this experiment, we use the annotations in the monophonic version as ground‑truth, as they resemble motif annotations better.

MNID	Setting	RPD	P $_{est}$	R $_{est}$	F $_{est}$	P $_{occ}$	R $_{occ}$	F $_{occ}$	P $_{thr}$	R $_{thr}$	F $_{thr}$
✗	N/A	SIATEC_CS	0.210	0.323	0.251	0.000	0.000	0.000	0.220	0.277	0.243
Skyline	N/A		0.415	0.663	0.492	0.380	0.631	0.472	0.403	0.534	0.439
CNN	✗		0.446	0.626	0.511	0.410	0.653	0.501	0.420	0.548	0.460
	PL		0.418	0.670	0.492	0.395	0.665	0.494	0.387	0.536	0.426
	MI		0.433	0.707	0.520	0.502	0.762	0.601	0.403	0.577	0.454
MidiBERT	✗		0.359	0.589	0.431	0.239	0.327	0.276	0.360	0.487	0.402
	PL		0.398	0.625	0.469	0.290	0.467	0.357	0.383	0.492	0.412
	MI		0.387	0.668	0.468	0.396	0.606	0.478	0.372	0.513	0.417
✗	N/A	CSA	0.155	0.404	0.214	0.140	0.355	0.180	0.062	0.269	0.095
Skyline	N/A		0.281	0.567	0.350	0.371	0.489	0.407	0.235	0.527	0.295
CNN	✗		0.261	0.509	0.324	0.295	0.414	0.340	0.222	0.505	0.277
	PL		0.251	0.476	0.306	0.273	0.392	0.310	0.222	0.462	0.271
	MI		0.261	0.504	0.324	0.256	0.388	0.298	0.232	0.511	0.288
MidiBERT	✗		0.245	0.568	0.321	0.383	0.490	0.414	0.200	0.535	0.263
	PL		0.267	0.576	0.340	0.398	0.477	0.418	0.220	0.539	0.277
	MI		0.253	0.496	0.315	0.249	0.412	0.309	0.212	0.480	0.263
Oracle	N/A	SIATEC	0.209	0.508	0.280	0.715	0.684	0.699	0.204	0.527	0.270
	N/A	SIATEC_CS	0.589	0.661	0.620	0.553	0.680	0.606	0.621	0.656	0.637
	N/A	CSA	0.263	0.407	0.310	0.300	0.419	0.337	0.245	0.434	0.295

Table 6

The average number of distinct motifs (per song) of the ground‑truth and RPD algorithms’ predictions on the BPS‑Motif dataset and the JKU‑PDD. The Oracle MNID is employed.