
Figure 1
A visualization of the motif‑discovery task. The excerpt comes from the first four bars of Beethoven’s Sonata No. 1 in F Minor, Op. 2, No. 1. Red and blue notes represent two different motifs, while gray notes are non‑motif notes that do not belong to an occurrence of any motif.

Figure 2
An overview of the proposed motif discovery framework, which divides the motif‑discovery task into motif note identification (MNID) and repeated pattern discovery (RPD). During training (left‑hand side), pseudo‑labeling is adopted to address the low‑resource issue for MNID, where the intersection of a pre‑trained melody identification (denoted as MeloID) model’s prediction and the MNID model’s prediction is treated as the pseudo‑label. During testing (right‑hand side), we first use an MNID model to identify motif notes from the input musical score. Then, we apply an RPD algorithm (such as SIATEC or CSA (Hsiao et al., 2023; Meredith et al., 2002) to discover repeated patterns from the identified motif notes.

Figure 3
Comparison of the melody identification (MeloID) and motif note identification (MNID) results on two examples of the Mozart Piano Sonata dataset. (a) and (b) are the first 10 beats of the the first movement of Mozart’s Piano Sonata No. 6 in D Major, KV284. (c) and (d) are the first 10 beats of the second movement of Mozart’s Piano Sonata No. 1 in C Major, KV279.
Table 1
Hyperparameters of the CSA algorithm and their values used in the experiments of both datasets.
| Name | Definition | Values | |
|---|---|---|---|
| BPS‑Motif | JKU‑PDD | ||
| The cardinal score threshold that decides whether to merge two motifs | 0.5 | 0.7 | |
| The minimum note number for any motif occurrence | 4 | 4 | |
| The onset tolerance for matching two vectors (between two note pairs) | 0 | 0 | |
| The pitch tolerance for matching two vectors (between two note pairs) | 1 | 3 | |
| The inter‑onset interval threshold for the compactness condition | 2 | 2 | |
| The inter‑pitch interval threshold for the compactness condition | |||
| The maximum duration of motifs/patterns | 12 | ||
Table 2
MNID results on both the BPS‑Motif dataset and the JKU‑PDD. The subscripts denote the standard deviations across five folds. Note that, strictly speaking, the results on JKU‑PDD may not reflect the actual motif note identification results, as the ground‑truths here are not always motif notes. Please refer to Section 4.1 for detailed discussion.
| MNID | Setting | BPS‑Motif | JKU‑PDD | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | F1‑score | Precision | Recall | Accuracy | F1‑score | Precision | Recall | ||
| Skyline | N/A | 0.776 | 0.671 | 0.591 | 0.822 | 0.869 | 0.716 | 0.634 | 0.999 |
| CNN | ✗ | 0.785 | 0.652 | 0.604 | 0.738 | 0.849 | 0.681 | 0.594 | 0.960 |
| PL | 0.777 | 0.629 | 0.596 | 0.698 | 0.860 | 0.678 | 0.599 | 0.920 | |
| MI | 0.802 | 0.660 | 0.636 | 0.713 | 0.873 | 0.693 | 0.625 | 0.928 | |
| MidiBERT | ✗ | 0.823 | 0.706 | 0.669 | 0.776 | 0.828 | 0.662 | 0.571 | 0.981 |
| PL | 0.831 | 0.716 | 0.679 | 0.782 | 0.825 | 0.660 | 0.567 | 0.982 | |
| MI | 0.839 | 0.721 | 0.701 | 0.763 | 0.866 | 0.683 | 0.599 | 0.944 | |
Table 3
The total RPD runtime on the BPS‑Motif dataset in minutes using one Intel i9‑13900KF CPU.
| RPD | Runtime (min) |
|---|---|
| SIATEC | 340.9 |
| SIATEC_CS | 4076.9 |
| CSA | 978.3 |
Table 4
Motif discovery results on the BPS‑Motif dataset. PL denotes pseudo‑labeling; MI denotes intersection of the melody line and pseudo‑labels. The subscripts est, occ, and thr indicate the establishment, occurrence, and three‑layer measurements, respectively. The last three rows are the oracle setting that assumes an 100% accuracy of motif note identification.
| MNID | Setting | RPD | P | R | F | P | R | F | P | R | F |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ✗ | N/A | SIATEC | 0.180 | 0.644 | 0.280 | 0.210 | 0.277 | 0.224 | 0.041 | 0.300 | 0.071 |
| Skyline | N/A | 0.226 | 0.660 | 0.333 | 0.410 | 0.275 | 0.308 | 0.065 | 0.335 | 0.107 | |
| CNN | ✗ | 0.240 | 0.631 | 0.344 | 0.403 | 0.221 | 0.268 | 0.073 | 0.319 | 0.116 | |
| PL | 0.236 | 0.609 | 0.336 | 0.399 | 0.240 | 0.281 | 0.072 | 0.307 | 0.112 | ||
| MI | 0.249 | 0.643 | 0.354 | 0.445 | 0.251 | 0.307 | 0.076 | 0.331 | 0.121 | ||
| MidiBERT | ✗ | 0.257 | 0.653 | 0.364 | 0.446 | 0.263 | 0.319 | 0.081 | 0.337 | 0.126 | |
| PL | 0.258 | 0.654 | 0.366 | 0.440 | 0.266 | 0.320 | 0.082 | 0.341 | 0.127 | ||
| MI | 0.258 | 0.656 | 0.366 | 0.433 | 0.260 | 0.309 | 0.083 | 0.342 | 0.130 | ||
| ✗ | N/A | CSA | 0.553 | 0.847 | 0.663 | 0.130 | 0.570 | 0.204 | 0.127 | 0.263 | 0.169 |
| Skyline | N/A | 0.555 | 0.806 | 0.652 | 0.312 | 0.464 | 0.354 | 0.214 | 0.357 | 0.264 | |
| CNN | ✗ | 0.521 | 0.761 | 0.610 | 0.330 | 0.410 | 0.349 | 0.202 | 0.334 | 0.246 | |
| PL | 0.516 | 0.718 | 0.593 | 0.327 | 0.388 | 0.343 | 0.198 | 0.320 | 0.238 | ||
| MI | 0.540 | 0.748 | 0.619 | 0.327 | 0.387 | 0.340 | 0.211 | 0.340 | 0.255 | ||
| MidiBERT | ✗ | 0.576 | 0.811 | 0.668 | 0.332 | 0.439 | 0.362 | 0.224 | 0.363 | 0.272 | |
| PL | 0.580 | 0.818 | 0.673 | 0.336 | 0.436 | 0.363 | 0.228 | 0.359 | 0.275 | ||
| MI | 0.591 | 0.803 | 0.676 | 0.350 | 0.430 | 0.372 | 0.231 | 0.358 | 0.276 | ||
| Oracle | N/A | SIATEC | 0.288 | 0.732 | 0.410 | 0.536 | 0.361 | 0.418 | 0.105 | 0.443 | 0.165 |
| N/A | SIATEC_CS | 0.327 | 0.583 | 0.413 | 0.491 | 0.312 | 0.356 | 0.185 | 0.322 | 0.231 | |
| N/A | CSA | 0.734 | 0.866 | 0.789 | 0.443 | 0.540 | 0.468 | 0.353 | 0.480 | 0.400 |
Table 5
Motif discovery results on JKU‑PDD. Note that, in this experiment, we use the annotations in the monophonic version as ground‑truth, as they resemble motif annotations better.
| MNID | Setting | RPD | P | R | F | P | R | F | P | R | F |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ✗ | N/A | SIATEC_CS | 0.210 | 0.323 | 0.251 | 0.000 | 0.000 | 0.000 | 0.220 | 0.277 | 0.243 |
| Skyline | N/A | 0.415 | 0.663 | 0.492 | 0.380 | 0.631 | 0.472 | 0.403 | 0.534 | 0.439 | |
| CNN | ✗ | 0.446 | 0.626 | 0.511 | 0.410 | 0.653 | 0.501 | 0.420 | 0.548 | 0.460 | |
| PL | 0.418 | 0.670 | 0.492 | 0.395 | 0.665 | 0.494 | 0.387 | 0.536 | 0.426 | ||
| MI | 0.433 | 0.707 | 0.520 | 0.502 | 0.762 | 0.601 | 0.403 | 0.577 | 0.454 | ||
| MidiBERT | ✗ | 0.359 | 0.589 | 0.431 | 0.239 | 0.327 | 0.276 | 0.360 | 0.487 | 0.402 | |
| PL | 0.398 | 0.625 | 0.469 | 0.290 | 0.467 | 0.357 | 0.383 | 0.492 | 0.412 | ||
| MI | 0.387 | 0.668 | 0.468 | 0.396 | 0.606 | 0.478 | 0.372 | 0.513 | 0.417 | ||
| ✗ | N/A | CSA | 0.155 | 0.404 | 0.214 | 0.140 | 0.355 | 0.180 | 0.062 | 0.269 | 0.095 |
| Skyline | N/A | 0.281 | 0.567 | 0.350 | 0.371 | 0.489 | 0.407 | 0.235 | 0.527 | 0.295 | |
| CNN | ✗ | 0.261 | 0.509 | 0.324 | 0.295 | 0.414 | 0.340 | 0.222 | 0.505 | 0.277 | |
| PL | 0.251 | 0.476 | 0.306 | 0.273 | 0.392 | 0.310 | 0.222 | 0.462 | 0.271 | ||
| MI | 0.261 | 0.504 | 0.324 | 0.256 | 0.388 | 0.298 | 0.232 | 0.511 | 0.288 | ||
| MidiBERT | ✗ | 0.245 | 0.568 | 0.321 | 0.383 | 0.490 | 0.414 | 0.200 | 0.535 | 0.263 | |
| PL | 0.267 | 0.576 | 0.340 | 0.398 | 0.477 | 0.418 | 0.220 | 0.539 | 0.277 | ||
| MI | 0.253 | 0.496 | 0.315 | 0.249 | 0.412 | 0.309 | 0.212 | 0.480 | 0.263 | ||
| Oracle | N/A | SIATEC | 0.209 | 0.508 | 0.280 | 0.715 | 0.684 | 0.699 | 0.204 | 0.527 | 0.270 |
| N/A | SIATEC_CS | 0.589 | 0.661 | 0.620 | 0.553 | 0.680 | 0.606 | 0.621 | 0.656 | 0.637 | |
| N/A | CSA | 0.263 | 0.407 | 0.310 | 0.300 | 0.419 | 0.337 | 0.245 | 0.434 | 0.295 |
Table 6
The average number of distinct motifs (per song) of the ground‑truth and RPD algorithms’ predictions on the BPS‑Motif dataset and the JKU‑PDD. The Oracle MNID is employed.
| RPD | BPS‑Motif | JKU‑PDD |
|---|---|---|
| SIATEC | 13365.5 | 2799.8 |
| SIATEC_CS | 34.9 | 15.6 |
| CSA | 27.3 | 43.2 |
| Ground‑truth | 8.2 | 4.6 |

Figure 4
Illustration of motif discovery results for Beethoven’s Piano Sonata No. 10 in G Major, 1st movement, mm. 8–18. From top to bottom: original score, piano roll with ground truth motif annotation, piano roll with the results using the common structure algorithm, and piano roll with results using MNID (MidiBERT with MI) plus the common structure algorithm. For the piano roll representation, a note is represented as an onset (circle dot) and a duration (horizontal line); the bar line is represented as the vertical blue dotted line. Notes with gray color are non‑motif notes. Motif notes are in the colors other than gray, and the notes belonging to the same motif are represented as the same color. The note group of each occurrence is bounded by a gray translucent box. The motif name is marked in the beginning of each occurrence. For the ground truth, the motif names are lowercase alphabets following the dataset annotation. For the motif discovery results, the motif names are uppercase and assigned in alphabetical order according to the first occurrence time.

Figure 5
Illustration of motif discovery results for Beethoven’s Piano Sonata No. 16 in G Major, 1st movement, mm. 74–81. See the caption of Figure 4 for the notation in detail.
