Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm

Axel Marmoret; Jérémy E. Cohen; Frédéric Bimbot

doi:10.5334/tismir.167

Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm

Transactions of the International Society for Music Information Retrieval

Volume 6 (2023): Issue 1

By: Axel Marmoret , Jérémy E. Cohen and Frédéric Bimbot

Open Access

|Nov 2023

Figures & Tables

A schematic example of musical structure.

An idealized self-similarity matrix, extracted from Paulus et al. (2010).

Cosine, Autocorrelation and RBF self-similarities for the song *POP01* of RWC Pop.

Table 1

Standard metrics (see Section 4.1) when aligning the reference annotations on the downbeats (compared to the original annotations).

Dataset		P_0.5s	R_0.5s	F_0.5s	P_3s	R_3s	F_3s
SALAMI	Annotation 1	82.47%	82.14%	82.30%	99.94%	99.56%	99.74%
	Annotation 2	80.97%	80.92%	80.94%	99.92%	99.84%	99.88%
RWC Pop		96.46%	96.21%	96.33%	100%	99.73%	99.86%

Segmentation results of state-of-the-art algorithms on the SALAMI-test and RWC Pop datasets, for beat-aligned (original) vs. downbeat-aligned boundaries. The SALAMI-test dataset is defined by Ullrich et al. (2014), and introduced in Section 4.2.1.

Table 2

Different time synchronizations for the Foote (2000) algorithm on the SALAMI-test dataset. The SALAMI-test dataset is defined by Ullrich et al. (2014), and introduced in Section 4.2.1.

Time synchronization		P_0.5s	R_0.5s	F_0.5s	P_3s	R_3s	F_3s
Beat-synchronized	Original	26.98%	34.58%	29.21%	50.10%	63.30%	54.02%
	Re-aligned on downbeats	31.05%	39.15%	33.33%	50.08%	62.95%	53.78%
Bar-synchronized		37.68%	36.36%	35.97%	58.06%	56.11%	55.57%
Barwise TF Matrix		39.22%	42.66%	39.67%	59.60%	64.82%	60.36%

Table 3

Different time synchronizations for the Foote (2000) algorithm on the RWC Pop dataset.

Time synchronization		P_0.5s	R_0.5s	F_0.5s	P_3s	R_3s	F_3s
Beat-synchronized	Original	31.86%	24.38%	27.29%	67.21%	51.92%	57.95%
	Re-aligned on downbeats	42.30%	32.82%	36.52%	66.67%	51.44%	57.44%
Bar-synchronized		43.53%	26.32%	32.46%	69.25%	42.22%	51.97%
Barwise TF Matrix		53.09%	37.19%	43.30%	79.35%	56.03%	65.04%

Example of computing an optimal segmentation with 4 bars.

Distribution of segment sizes in terms of number of bars, in the annotations.

Table 4

Boundary retrieval performance with the different self-similarities on the train dataset (Full kernel, no penalty function).

Self-similarity	P_0bar	R_0bar	F_0bar	P_1bar	R_1bar	F_1bar
Cosine	50.83%	30.82%	36.77%	62.80%	37.72%	45.19%
Autocorrelation	32.59%	64.69%	41.30%	42.10%	83.73%	53.41%
RBF	50.27%	45.38%	45.84%	64.79%	58.81%	59.30%

Table 5

Boundary retrieval performance with the RBF self-similarity, on both test datasets (Full kernel, no penalty function).

Dataset	P_0bar	R_0bar	F_0bar	P_1bar	R_1bar	F_1bar
SALAMI – test	48.52%	48.65%	46.68%	62.76%	63.09%	60.51%
RWC Pop	60.72%	53.61%	56.01%	77.68%	67.62%	71.09%

Distribution of segment sizes, with the full kernel, according to the self-similarity matrix. Results on the SALAMI-train dataset.

Boundary retrieval performance (F-measures only) according to the full and band kernels (with different numbers of bands). Results on the train dataset with RBF self-similarity matrices.

Distribution of estimated segment sizes, according to different kernels, on the train dataset.

Table 6

Boundary retrieval performance with the 7-band kernel, on both test datasets (RBF self-similarity, no penalty function).

Dataset	P_0bar	R_0bar	F_0bar	P_1bar	R_1bar	F_1bar
SALAMI – test	37.24%	59.80%	44.33%	50.38%	80.52%	59.88%
RWC Pop	59.41%	68.19%	62.82%	75.53%	86.56%	79.81%

Table 7

Boundary retrieval performance depending on the penalty function, for the SALAMI-train dataset, with the RBF self-similarity and the 7-band kernel.

Penalty function		Best λ	P_0bar	R_0bar	F_0bar	P_1bar	R_1bar	F_1bar
Without penalty		–	40.26%	57.38%	45.81%	54.26%	77.67%	61.81%
Target deviation	$α = \frac{1}{2}$	0.01	40.38%	57.36%	45.88%	54.37%	77.57%	61.84%
	α = 1	0.01	40.45%	56.98%	45.81%	54.61%	77.20%	61.89%
	α = 2	0.01	39.75%	54.32%	44.43%	54.93%	75.31%	61.46%
Modulo 8		0.04	41.04%	58.34%	46.63%	54.25%	77.44%	61.72%

Table 8

Boundary retrieval performance with the modulo 8 penalty function (λ = 0.04), on both test datasets (RBF self-similarity, 7-band kernel).

Dataset	P_0bar	R_0bar	F_0bar	P_1bar	R_1bar	F_1bar
SALAMI – test	38.36%	60.96%	45.44%	50.76%	80.51%	60.09%
RWC Pop	62.11%	70.05%	65.17%	77.35%	86.95%	81.02%

Table 9

Boundary retrieval performance, comparing the F-measures with tolerance expressed barwise and in absolute time.

Dataset	F_0bar	F_0.5s	F_1bar	F_3s
SALAMI-test	45.44%	42.00%	60.09%	60.61%
RWC Pop	65.17%	64.44%	81.02%	80.64%

Boundary retrieval performance of the CBM algorithm on the SALAMI dataset, compared to state-of-the-art algorithms. Hatched bars correspond to supervised algorithms. The star * represents algorithms where the evaluation subset is not exactly the same as ours, thus preventing accurate comparison.

Boundary retrieval performance of the CBM algorithm on the RWC Pop dataset, compared to state-of-the-art algorithms. Hatched bars correspond to supervised algorithms.

Table 10

CBM algorithm, performed on Barwise TF matrix vs. Beatwise TF matrix, on the SALAMI-test dataset. For fairer comparison, results at both scales are computed without penalty function.

SALAMI	P_0.5s	R_0.5s	F_0.5s	P_3s	R_3s	F_3s
Beatwise (cosine, 63-band kernel)	35.90%	41.61%	37.36%	55.75%	64.52%	58.03%
Barwise (RBF, 7-band kernel)	34.49%	54.56%	41.04%	50.70%	80.78%	60.51%

Table 11

CBM algorithm, performed on Barwise TF matrix vs. Beatwise TF matrix, on RWC Pop. For fairer comparison, results at both scales are computed without penalty function.

RWC Pop	P_0.5s	R_0.5s	F_0.5s	P_3s	R_3s	F_3s
Beatwise (cosine, 63-band kernel)	46.22%	44.38%	44.57%	72.54%	68.85%	69.51%
Barwise (RBF, 7-band kernel)	59.09%	67.13%	62.28%	75.17%	85.90%	79.47%

Algorithm 1 CBM algorithm, computing the optimal segmentation given a score function U().

Input: Bars {b_k ∈ ⟦1, B⟧}, score function u
Output: Optimal segmentation Z* = {ζ_i}

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/tismir.167 | Journal eISSN: 2514-3298

Journal RSS Feed

Language: English

Submitted on: Mar 30, 2023

Accepted on: Nov 2, 2023

Published on: Nov 30, 2023

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Music Structure Analysis,

Audio Signals,

Barwise Music Processing,

Self-Similarity Matrix Segmentation

© 2023 Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 6 (2023): Issue 1

Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm

Figures & Tables

Figure 1

Figure 2

Figure 3

Table 1

Figure 4

Table 2

Table 3

Figure 5

Figure 6

Figure 7

Figure 8

Table 4

Table 5

Figure 9

Figure 10

Figure 11

Table 6

Table 7

Table 8

Table 9

Figure 12

Figure 13

Table 10

Table 11

Paradigm

My account