The Sound Demixing Challenge 2023 &ndash; Music Demixing Track

Giorgio Fabbro; Stefan Uhlich; Chieh-Hsin Lai; Woosung Choi; Marco Martínez-Ramírez; Weihsiang Liao; Igor Gadelha; Geraldo Ramos; Eddie Hsu; Hugo Rodrigues; Fabian-Robert Stöter; Alexandre Défossez; Yi Luo; Jianwei Yu; Dipam Chakraborty; Sharada Mohanty; Roman Solovyev; Alexander Stempkovskiy; Tatiana Habruseva; Nabarun Goswami; Tatsuya Harada; Minseok Kim; Jun Hyung Lee; Yuanliang Dong; Xinran Zhang; Jiafeng Liu; Yuki Mitsufuji

doi:10.5334/tismir.171

The Sound Demixing Challenge 2023 – Music Demixing Track

Transactions of the International Society for Music Information Retrieval

Volume 7 (2024): Issue 1

By: Giorgio Fabbro , Stefan Uhlich , Chieh-Hsin Lai , Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty , Roman Solovyev , Alexander Stempkovskiy , Tatiana Habruseva , Nabarun Goswami , Tatsuya Harada , Minseok Kim, Jun Hyung Lee , Yuanliang Dong , Xinran Zhang , Jiafeng Liu and Yuki Mitsufuji

Open Access

|Apr 2024

Figures & Tables

Comparison of validation loss when training the same model on a small dataset (red), a large dataset with errors (purple) and the same large dataset once the errors have been corrected (green). All experiments were evaluated on the same validation set.

Statistics collected during our internal data cleaning activity. The values in the rows are normalized so that they sum to 1. For example, in all the errors we found in our internal data, the chance that a guitar was labeled as bass is 32%.

The process of cleaning the stems of one song in the noisy dataset using the proposed robust baseline model. We propose two different methods: *filtered* and *redistributed*.

Table 1

Final LabelNoise leaderboard (models trained only on SDXDB23_LabelNoise; top-5).

Rank	Participant	Prize	Global SDR (dB)					Submissions to LabelNoise
Rank	Participant	Prize	Mean	Bass	Drums	Other	Vocals	1st phase	2nd phase
Submissions
1.	CCOM	1^st	7.46	8.12	7.99	5.34	8.37	7	50
2.	subatomicseer	2^nd	6.60	6.67	7.03	4.61	8.07	65	33
3.	kuielab	3^rd	6.51	6.71	6.71	4.82	7.82	99	25
4.	aim-less		6.44	6.75	7.19	4.56	7.28	10	22
5.	yang_tong		6.33	6.29	7.46	3.94	7.65	-	2
Baselines
	UMX		3.01	3.77	2.84	1.62	3.83
	Demucs		4.84	5.55	5.68	2.89	5.23
	MDX-Net		3.49	4.26	2.84	2.42	4.42

Table 2

Final Bleeding leaderboard (models trained only on SDXDB23_Bleeding; top-5).

Rank	Participant	Prize	Global SDR (dB)					Submissions to Bleeding
Rank	Participant	Prize	Mean	Bass	Drums	Other	Vocals	1st phase	2nd phase
Submissions
1.	kuielab	1^st	6.58	6.98	6.65	4.96	7.74	99	13
2.	ZFTurbo	2^nd	6.38	6.94	6.86	4.62	7.12	32	4
3.	subatomicseer	3^rd	6.31	6.33	6.86	4.59	7.47	65	11
4.	CCOM		6.20	6.34	6.32	4.28	7.87	7	17
5.	alina_porechina		5.87	6.01	6.10	4.09	7.30	99	118
Baselines
	UMX		3.61	3.90	3.85	2.50	4.17
	Demucs		5.33	5.90	5.56	3.69	6.19
	MDX-Net		3.56	4.00	2.30	2.65	5.29

Table 3

Final Standard leaderboard (models trained on any data; top-5).

Rank	Participant	Prize	Global SDR (dB)					Submissions to Standard
Rank	Participant	Prize	Mean	Bass	Drums	Other	Vocals	1st phase	2nd phase
Submissions
1.	SAMI-ByteDance		9.97	11.15	10.27	7.08	11.36	13	5
2.	ZFTurbo	1^st	9.26	9.94	9.53	7.05	10.51	32	24
3.	kimberley_jensen	2^nd	9.18	10.06	9.47	6.80	10.40	86	134
4.	kuielab	3^rd	8.97	9.72	9.43	6.72	10.01	99	54
5.	alina_porechina		8.63	9.92	9.29	6.23	9.07	99	172
Baselines
	UMX-L		6.52	6.62	6.84	4.89	7.73
	BSRNN		6.14	5.63	6.53	4.43	7.98
	X-UMX-M		6.30	5.85	6.87	4.42	8.04

Table 4

Results of our iterative refinement baseline. We use a source separation algorithm trained on corrupted data to improve the dataset: training the same model on the improved data increases the separation quality.

	Global SDR (dB)
	Mean	Bass	Drums	Other	Vocals
MoisesDB (203 songs)
Original dataset	4.43	4.65	5.06	3.02	5.00
Improved dataset (redistributed)	4.27	4.68	4.93	2.72	4.75
Improved dataset (filtered)	4.46	5.07	5.16	2.77	4.86
SDXDB23_LabelNoise
Original dataset	3.01	3.76	2.83	1.62	3.82
Improved dataset (redistributed)	3.44	4.00	3.81	1.86	4.08
Improved dataset (filtered)	3.90	4.57	4.57	2.22	4.25
SDXDB23_Bleeding
Original dataset	3.60	3.90	3.84	2.50	4.17
Improved dataset (redistributed)	3.59	3.73	4.07	2.40	4.17
Improved dataset (filtered)	4.09	4.65	4.76	2.52	4.44

Table 5

(Team ZFTurbo) Separation performance varying the number of shifts and overlap during the inference of HTDemucs. Increasing both lead to higher performance, with marginal improvements for very high parameter values.

		Global SDR (dB)
Shifts	Overlap Ratio	Mean	Bass	Drums	Other	Vocals
2	0.5	9.43	12.15	11.35	5.81	8.40
4	0.75	9.47	12.22	11.40	5.84	8.41
1	0.95	9.48	12.24	11.41	5.84	8.43
10	0.95	9.49	12.25	11.41	5.85	8.43

Table 6

(Team ZFTurbo) SDR scores for the final ensemble on our MultiSong dataset (Solovyev et al., 2023) and on MDXDB21. We report separately the scores visible during the competition (only on 18 songs) and at the end (on 27 songs).

	Global SDR (dB)
Dataset	Mean	Bass	Drums	Other	Vocals
MultiSong MVSep	10.11	12.68	11.68	6.67	9.62
MDXDB21 (18 songs)	9.41	9.87	9.52	7.43	10.81
MDXDB21 (27 songs)	9.25	9.94	9.53	7.05	10.51

Table 7

(Team subatomicseer) Our scores on the LabelNoise leaderboard.

	Global SDR (dB)
Model (Mean Teacher loss)	Mean	Bass	Drums	Other	Vocals
WHTDemucs (V1)	5.93	6.41	5.73	4.42	7.17
DTUNet (V2)	5.93	5.84	6.71	4.10	7.08
Blend	6.60	6.70	7.03	4.61	8.07

Table 8

(Team subatomicseer) Our scores on the Bleeding leaderboard.

	Global SDR (dB)
Model (Mean Teacher)	Mean	Bass	Drums	Other	Vocals
WHTDemucs (V1)	5.86	5.90	5.61	4.68	7.25
DTUNet (V2)	5.62	5.37	6.18	3.92	7.00
Blend	6.31	6.33	6.86	4.59	7.47

Table 9

(Team subatomicseer) Performance of the individual models of our ensemble on our validation set. Please note that HTDemucs is trained with more data than our internal models.

	Global SDR (dB)
Model (Training Songs)	Mean	Bass	Drums	Other	Vocals
DTUNet (347)	8.79	8.75	10.65	6.76	8.99
BSRNN (347)	8.65	8.06	10.80	6.38	9.37
HTDemucs (800)	9.19	9.68	10.76	7.17	9.15

Table 10

(Team CCOM) Performance of HTDemucs using our approach. The baseline is trained on SDXDB23_LabelNoise, then we train a model using loss truncation only. We use this model to filter the dataset (denoted with 1^st in the Table) and train a new model. Finally, we repeat the dataset filtering (denoted with 2^nd) and fine-tune the model to obtain the best performance.

	Global SDR (dB)
Training Setup	Mean	Bass	Drums	Other	Vocals
Baseline	4.96	5.07	5.76	3.14	5.85
With loss truncation	6.26	6.94	6.62	4.45	7.09
With filtered data (1^st)	6.89	7.34	7.58	4.88	7.74
With filtered data (2^nd)	7.46	8.12	7.99	5.34	8.37

Table 11

(Team kuielab) Ablation study on loss truncation. Please note that these are the scores of an individual TFC-TDF-UNet v3 model, not of the final ensemble.

		Global SDR (dB)
Task	Loss Truncation	Mean	Bass	Drums	Other	Vocals
LabelNoise	No	5.05	5.31	5.31	3.45	6.12
LabelNoise	Yes	6.26	6.43	6.38	4.64	7.58
Bleeding	No	5.80	6.11	5.86	4.36	6.87
Bleeding	Yes	6.22	6.58	6.20	4.69	7.41

Table 12

(Team kuielab) Comparison of TFC-TDF-UNets v2 and v3 on the MUSDB18-HQ benchmark. Speed denotes the relative GPU inference speed with respect to real-time on the challenge evaluation server.

	Global SDR (dB)
Model	Mean	Bass	Drums	Other	Vocals	Speed
v2	7.03	6.85	6.87	5.44	8.96	12.8x
v3	7.90	7.36	8.81	6.19	9.22	15.0x

Results of the listening test by assessor category.

Results of the listening test on bass removal and extraction.

Results of the listening test on drum removal and extraction.

Results of the listening test on other removal and extraction.

Results of the listening test on vocal removal and extraction.

Table 13

Final ranking obtained with TrueSkill. We used the default parameters for each player (μ = 25 and σ = 8.33). We report the average SDR score on leaderboard Standard as reference.

	Model	μ	σ	SDR (Mean)
1	kimberley_jensen	24.793	0.779	9.18
2	ZFTurbo	24.362	0.779	9.26
3	SAMI-ByteDance	24.011	0.779	9.97

Correlation between the SDR scores and the results of the listening test.

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/tismir.171 | Journal eISSN: 2514-3298

Journal RSS Feed

Language: English

Submitted on: Aug 22, 2023

Accepted on: Feb 13, 2024

Published on: Apr 18, 2024

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Music Source Separation,

Sound,

© 2024 Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 7 (2024): Issue 1

The Sound Demixing Challenge 2023 – Music Demixing Track

Figures & Tables

Figure 1

Figure 2

Figure 3

Table 1

Table 2

Table 3

Table 4

Table 5

Table 6

Table 7

Table 8

Table 9

Table 10

Table 11

Table 12

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Table 13

Figure 10

Paradigm

My account