
Figure 1
A schematic example of musical structure.

Figure 2
An idealized self-similarity matrix, extracted from Paulus et al. (2010).

Figure 3
Cosine, Autocorrelation and RBF self-similarities for the song POP01 of RWC Pop.
Table 1
Standard metrics (see Section 4.1) when aligning the reference annotations on the downbeats (compared to the original annotations).
| Dataset | P0.5s | R0.5s | F0.5s | P3s | R3s | F3s | |
| SALAMI | Annotation 1 | 82.47% | 82.14% | 82.30% | 99.94% | 99.56% | 99.74% |
| Annotation 2 | 80.97% | 80.92% | 80.94% | 99.92% | 99.84% | 99.88% | |
| RWC Pop | 96.46% | 96.21% | 96.33% | 100% | 99.73% | 99.86% |

Figure 4
Segmentation results of state-of-the-art algorithms on the SALAMI-test and RWC Pop datasets, for beat-aligned (original) vs. downbeat-aligned boundaries. The SALAMI-test dataset is defined by Ullrich et al. (2014), and introduced in Section 4.2.1.
Table 2
Different time synchronizations for the Foote (2000) algorithm on the SALAMI-test dataset. The SALAMI-test dataset is defined by Ullrich et al. (2014), and introduced in Section 4.2.1.
| Time synchronization | P0.5s | R0.5s | F0.5s | P3s | R3s | F3s | |
| Beat-synchronized | Original | 26.98% | 34.58% | 29.21% | 50.10% | 63.30% | 54.02% |
| Re-aligned on downbeats | 31.05% | 39.15% | 33.33% | 50.08% | 62.95% | 53.78% | |
| Bar-synchronized | 37.68% | 36.36% | 35.97% | 58.06% | 56.11% | 55.57% | |
| Barwise TF Matrix | 39.22% | 42.66% | 39.67% | 59.60% | 64.82% | 60.36% |
Table 3
Different time synchronizations for the Foote (2000) algorithm on the RWC Pop dataset.
| Time synchronization | P0.5s | R0.5s | F0.5s | P3s | R3s | F3s | |
| Beat-synchronized | Original | 31.86% | 24.38% | 27.29% | 67.21% | 51.92% | 57.95% |
| Re-aligned on downbeats | 42.30% | 32.82% | 36.52% | 66.67% | 51.44% | 57.44% | |
| Bar-synchronized | 43.53% | 26.32% | 32.46% | 69.25% | 42.22% | 51.97% | |
| Barwise TF Matrix | 53.09% | 37.19% | 43.30% | 79.35% | 56.03% | 65.04% |

Figure 5
Example of computing an optimal segmentation with 4 bars.

Figure 6
Full kernel of size 10.

Figure 7
Band kernels, of size 10.

Figure 8
Distribution of segment sizes in terms of number of bars, in the annotations.
Table 4
Boundary retrieval performance with the different self-similarities on the train dataset (Full kernel, no penalty function).
| Self-similarity | P0bar | R0bar | F0bar | P1bar | R1bar | F1bar |
| Cosine | 50.83% | 30.82% | 36.77% | 62.80% | 37.72% | 45.19% |
| Autocorrelation | 32.59% | 64.69% | 41.30% | 42.10% | 83.73% | 53.41% |
| RBF | 50.27% | 45.38% | 45.84% | 64.79% | 58.81% | 59.30% |
Table 5
Boundary retrieval performance with the RBF self-similarity, on both test datasets (Full kernel, no penalty function).
| Dataset | P0bar | R0bar | F0bar | P1bar | R1bar | F1bar |
| SALAMI – test | 48.52% | 48.65% | 46.68% | 62.76% | 63.09% | 60.51% |
| RWC Pop | 60.72% | 53.61% | 56.01% | 77.68% | 67.62% | 71.09% |

Figure 9
Distribution of segment sizes, with the full kernel, according to the self-similarity matrix. Results on the SALAMI-train dataset.

Figure 10
Boundary retrieval performance (F-measures only) according to the full and band kernels (with different numbers of bands). Results on the train dataset with RBF self-similarity matrices.

Figure 11
Distribution of estimated segment sizes, according to different kernels, on the train dataset.
Table 6
Boundary retrieval performance with the 7-band kernel, on both test datasets (RBF self-similarity, no penalty function).
| Dataset | P0bar | R0bar | F0bar | P1bar | R1bar | F1bar |
| SALAMI – test | 37.24% | 59.80% | 44.33% | 50.38% | 80.52% | 59.88% |
| RWC Pop | 59.41% | 68.19% | 62.82% | 75.53% | 86.56% | 79.81% |
Table 7
Boundary retrieval performance depending on the penalty function, for the SALAMI-train dataset, with the RBF self-similarity and the 7-band kernel.
| Penalty function | Best λ | P0bar | R0bar | F0bar | P1bar | R1bar | F1bar | |
| Without penalty | – | 40.26% | 57.38% | 45.81% | 54.26% | 77.67% | 61.81% | |
| Target deviation | 0.01 | 40.38% | 57.36% | 45.88% | 54.37% | 77.57% | 61.84% | |
| α = 1 | 0.01 | 40.45% | 56.98% | 45.81% | 54.61% | 77.20% | 61.89% | |
| α = 2 | 0.01 | 39.75% | 54.32% | 44.43% | 54.93% | 75.31% | 61.46% | |
| Modulo 8 | 0.04 | 41.04% | 58.34% | 46.63% | 54.25% | 77.44% | 61.72% |
Table 8
Boundary retrieval performance with the modulo 8 penalty function (λ = 0.04), on both test datasets (RBF self-similarity, 7-band kernel).
| Dataset | P0bar | R0bar | F0bar | P1bar | R1bar | F1bar |
| SALAMI – test | 38.36% | 60.96% | 45.44% | 50.76% | 80.51% | 60.09% |
| RWC Pop | 62.11% | 70.05% | 65.17% | 77.35% | 86.95% | 81.02% |
Table 9
Boundary retrieval performance, comparing the F-measures with tolerance expressed barwise and in absolute time.
| Dataset | F0bar | F0.5s | F1bar | F3s |
| SALAMI-test | 45.44% | 42.00% | 60.09% | 60.61% |
| RWC Pop | 65.17% | 64.44% | 81.02% | 80.64% |

Figure 12
Boundary retrieval performance of the CBM algorithm on the SALAMI dataset, compared to state-of-the-art algorithms. Hatched bars correspond to supervised algorithms. The star * represents algorithms where the evaluation subset is not exactly the same as ours, thus preventing accurate comparison.

Figure 13
Boundary retrieval performance of the CBM algorithm on the RWC Pop dataset, compared to state-of-the-art algorithms. Hatched bars correspond to supervised algorithms.
Table 10
CBM algorithm, performed on Barwise TF matrix vs. Beatwise TF matrix, on the SALAMI-test dataset. For fairer comparison, results at both scales are computed without penalty function.
| SALAMI | P0.5s | R0.5s | F0.5s | P3s | R3s | F3s |
| Beatwise (cosine, 63-band kernel) | 35.90% | 41.61% | 37.36% | 55.75% | 64.52% | 58.03% |
| Barwise (RBF, 7-band kernel) | 34.49% | 54.56% | 41.04% | 50.70% | 80.78% | 60.51% |
Table 11
CBM algorithm, performed on Barwise TF matrix vs. Beatwise TF matrix, on RWC Pop. For fairer comparison, results at both scales are computed without penalty function.
| RWC Pop | P0.5s | R0.5s | F0.5s | P3s | R3s | F3s |
| Beatwise (cosine, 63-band kernel) | 46.22% | 44.38% | 44.57% | 72.54% | 68.85% | 69.51% |
| Barwise (RBF, 7-band kernel) | 59.09% | 67.13% | 62.28% | 75.17% | 85.90% | 79.47% |
| Algorithm 1 CBM algorithm, computing the optimal segmentation given a score function U(). |
| Input: Bars {bk ∈ ⟦1, B⟧}, score function u Output: Optimal segmentation Z* = {ζi} |
![]() |

