The Sound Demixing Challenge 2023 &ndash; Music Demixing Track

Giorgio Fabbro; Stefan Uhlich; Chieh-Hsin Lai; Woosung Choi; Marco Martínez-Ramírez; Weihsiang Liao; Igor Gadelha; Geraldo Ramos; Eddie Hsu; Hugo Rodrigues; Fabian-Robert Stöter; Alexandre Défossez; Yi Luo; Jianwei Yu; Dipam Chakraborty; Sharada Mohanty; Roman Solovyev; Alexander Stempkovskiy; Tatiana Habruseva; Nabarun Goswami; Tatsuya Harada; Minseok Kim; Jun Hyung Lee; Yuanliang Dong; Xinran Zhang; Jiafeng Liu; Yuki Mitsufuji

doi:10.5334/tismir.171

The Sound Demixing Challenge 2023 – Music Demixing Track

Transactions of the International Society for Music Information Retrieval

Volume 7 (2024): Issue 1

By: Giorgio Fabbro , Stefan Uhlich , Chieh-Hsin Lai , Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty , Roman Solovyev , Alexander Stempkovskiy , Tatiana Habruseva , Nabarun Goswami , Tatsuya Harada , Minseok Kim, Jun Hyung Lee , Yuanliang Dong , Xinran Zhang , Jiafeng Liu and Yuki Mitsufuji

Open Access

|Apr 2024

Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., and Bello, J. P. (2014). MedleyDB: A multitrack dataset for annotation-intensive MIR research. In Wang, H., Yang, Y., and Lee, J. H., editors, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pages 155–160.
Search in Google Scholar Back to article
Braun, S. and Tashev, I. J. (2020). A consolidated view of loss functions for supervised deep learningbased speech enhancement. 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pages 72–76. DOI: 10.1109/TSP52935.2021.9522648
Open DOI Search in Google Scholar Back to article
Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., and Liu, Y. (2020). Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347.
Search in Google Scholar Back to article
Choi, W., Kim, M., Chung, J., Lee, D., and Jung, S. (2020). Investigating U-Nets with various intermediate blocks for spectrogram-based singing voice separation. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pages 192–198.
Search in Google Scholar Back to article
Défossez, A. (2021). Hybrid spectrogram and waveform source separation. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation.
Search in Google Scholar Back to article
García, H. F., Aguilar, A., Manilow, E., and Pardo, B. (2021). Leveraging hierarchical structures for fewshot musical instrument recognition. In Lee, J. H., Lerch, A., Duan, Z., Nam, J., Rao, P., van Kranenburg, P., and Srinivasamurthy, A., editors, Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pages 220–228.
Search in Google Scholar Back to article
Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems, 31.
Search in Google Scholar Back to article
Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., et al. (2021). The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349. DOI: 10.1109/ICCV48922.2021.00823
Open DOI Search in Google Scholar Back to article
Hennequin, R., Khlif, A., Voituret, F., and Moussallam, M. (2020). Spleeter: A fast and efficient music source separation tool with pre-trained models. Journal of Open Source Software, 5(50): 2154. DOI: 10.21105/joss.02154
Open DOI Search in Google Scholar Back to article
Herbrich, R., Minka, T., and Graepel, T. (2007). Trueskilltm: A Bayesian skill rating system. In Scholkopf, B., Platt, J. C., and Hoffman, T., editors, Advances in Neural Information Processing Systems 19 (NIPS-06), pages 569–576. MIT Press. DOI: 10.7551/mitpress/7503.003.0076
Open DOI Search in Google Scholar Back to article
Hinton, G. E., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv, abs/1503.02531.
Search in Google Scholar Back to article
Kang, D. and Hashimoto, T. B. (2020). Improved natural language generation via loss truncation. ArXiv, abs/2004.14589. DOI: 10.18653/v1/2020.acl-main.66
Open DOI Search in Google Scholar Back to article
Kim, M., Choi, W., Chung, J., Lee, D., and Jung, S. (2021). KUIELab-MDX-Net: A two-stream neural network for music demixing. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation.
Search in Google Scholar Back to article
Kim, M. and Lee, J. H. (2023). Sound Demixing Challenge 2023 – Music Demixing Track technical report. arXiv preprint arXiv:2306.09382.
Search in Google Scholar Back to article
Kong, Q., Cao, Y., Iqbal, T., Wang, Y., Wang, W., and Plumbley, M. D. (2020). PANNs: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 2880–2894. DOI: 10.1109/TASLP.2020.3030497
Open DOI Search in Google Scholar Back to article
Koo, J., Chae, Y., Jeon, C.-B., and Lee, K. (2023). Self-refining of pseudo labels for music source separation with noisy labeled data. arXiv preprint arXiv:2307.12576.
Search in Google Scholar Back to article
Lai, C.-H., Zou, D., and Lerman, G. (2019). Robust subspace recovery layer for unsupervised anomaly detection. arXiv preprint arXiv:1904.00152.
Search in Google Scholar Back to article
Lai, C.-H., Zou, D., and Lerman, G. (2023). Robust variational autoencoding with Wasserstein penalty for novelty detection. In International Conference on Artificial Intelligence and Statistics, pages 3538–3567. PMLR.
Search in Google Scholar Back to article
Li, J., Socher, R., and Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394.
Search in Google Scholar Back to article
Liu, H., Xie, L., Wu, J., and Yang, G. (2020). Channelwise subband input for better voice and accompaniment separation on high resolution music. In Interspeech. DOI: 10.21437/Interspeech.2020-2555
Open DOI Search in Google Scholar Back to article
Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., Ono, N., and Fontecave, J. (2017). The 2016 Signal Separation Evaluation Campaign. In Tichavsky, P., Babaie-Zadeh, M., Michel, O. J., and Thirion-Moreau, N., editors, Latent Variable Analysis and Signal Separation, pages 323–332, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-53547-0_31
Open DOI Search in Google Scholar Back to article
Luo, Y. and Yu, J. (2023). Music source separation with band-split RNN. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 1893–1901. DOI: 10.1109/TASLP.2023.3271145
Open DOI Search in Google Scholar Back to article
Mahanta, S. K., Khilji, A. F. U. R., and Pakray, P. (2021). Deep neural network for musical instrument recognition using MFCCs. Computacion y Sistemas, 25(2): 351–360. DOI: 10.13053/cys-25-2-3946
Open DOI Search in Google Scholar Back to article
Manilow, E., Wichern, G., and Roux, J. L. (2020). Hierarchical musical instrument separation. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR).
Search in Google Scholar Back to article
Manilow, E., Wichern, G., Seetharaman, P., and Le Roux, J. (2019). Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 45–49. IEEE. DOI: 10.1109/WASPAA.2019.8937170
Open DOI Search in Google Scholar Back to article
Mitsufuji, Y., Fabbro, G., Uhlich, S., Stoter, F.-R., Defossez, A., Kim, M., Choi, W., Yu, C.-Y., and Cheuk, K.- W. (2022). Music Demixing Challenge 2021. Frontiers in Signal Processing, 1: 18. DOI: 10.3389/frsip.2021.808395
Open DOI Search in Google Scholar Back to article
Mukherjee, D., Guha, A., Solomon, J. M., Sun, Y., and Yurochkin, M. (2021). Outlier-robust optimal transport. In International Conference on Machine Learning, pages 7850–7860. PMLR.
Search in Google Scholar Back to article
Ono, N., Rafii, Z., Kitamura, D., Ito, N., and Liutkus, A. (2015). The 2015 Signal Separation Evaluation Campaign. In Vincent, E., Yeredor, A., Koldovský, Z., and Tichavský, P., editors, Latent Variable Analysis and Signal Separation, pages 387–395, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-22482-4_45
Open DOI Search in Google Scholar Back to article
Pereira, I., Araújo, F., Korzeniowski, F., and Vogl, R. (2023). MoisesDB: A dataset for source separation beyond 4-stems. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR), pages 619–626.
Search in Google Scholar Back to article
Petermann, D., Wichern, G., Wang, Z.-Q., and Roux, J. L. (2022). The cocktail fork problem: Threestem audio separation for real-world soundtracks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP43922.2022.9746005
Open DOI Search in Google Scholar Back to article
Prätzlich, T., Müller, M., Bohl, B. W., and Veit, J. (2015). Freischutz Digital: Demos of audio-related contributions. In Demos and Late Breaking News of the International Society for Music Information Retrieval Conference (ISMIR).
Search in Google Scholar Back to article
Rafii, Z., Liutkus, A., Stöter, F.-R., Mimilakis, S. I., and Bittner, R. (2017). The MUSDB18 corpus for music separation. DOI: 10.5281/zenodo.1117372.
Open DOI Search in Google Scholar Back to article
Rouard, S., Massa, F., and Défossez, A. (2023). Hybrid transformers for music source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP49357.2023.10096956
Open DOI Search in Google Scholar Back to article
Sawata, R., Takahashi, N., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2023). The whole is greater than the sum of its parts: Improving DNNbased music source separation. arXiv preprint arXiv:2305.07855.
Search in Google Scholar Back to article
Sawata, R., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2021). All for one and one for all: Improving music separation by bridging networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 51–55. DOI: 10.1109/ICASSP39728.2021.9414044
Open DOI Search in Google Scholar Back to article
Solovyev, R., Stempkovskiy, A., and Habruseva, T. (2023). Benchmarks and leaderboards for sound demixing tasks. arXiv preprint arXiv:2305.07489.
Search in Google Scholar Back to article
Stöter, F.-R., Liutkus, A., and Ito, N. (2018). The 2018 Signal Separation Evaluation Campaign. In Deville, Y., Gannot, S., Mason, R., Plumbley, M. D., and Ward, D., editors, Latent Variable Analysis and Signal Separation, pages 293–305, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-93764-9_28
Open DOI Search in Google Scholar Back to article
Stöter, F.-R., Uhlich, S., Liutkus, A., and Mitsufuji, Y. (2019). Open-unmix – A reference implementation for music source separation. Journal of Open Source Software, 4(41): 1667. DOI: 10.21105/joss.01667
Open DOI Search in Google Scholar Back to article
Tarvainen, A. and Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30.
Search in Google Scholar Back to article
Torcoli, M., Kastner, T., and Herre, J. (2021). Objective measures of perceptual audio quality reviewed: An evaluation of their application domain dependence. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29: 1530–1541. DOI: 10.1109/TASLP.2021.3069302
Open DOI Search in Google Scholar Back to article
Uhlich, S., Fabbro, G., Masato, H., Takahashi, S., Wichern, G., Le Roux, J., Chakraborty, D., Mohanty, S., Li, K., Luo, Y., Yu, J., Gu, R., Solovyev, R., Stempkovskiy, A., Habruseva, T., Sukhovei, M., and Mitsufuji, Y. (2024). The Sound Demixing Challenge 2023 – Cinematic Demixing Track. Transactions of the International Society for Music Information Retrieval, 7(1): 44–62. DOI: 10.5334/tismir.172.
Open DOI Search in Google Scholar Back to article
Uhlich, S., Porcu, M., Giron, F., Enenkl, M., Kemp, T., Takahashi, N., and Mitsufuji, Y. (2017). Improving music source separation based on deep neural networks through data augmentation and network blending. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 261–265. DOI: 10.1109/ICASSP.2017.7952158
Open DOI Search in Google Scholar Back to article
Wang, H., Xiao, R., Dong, Y., Feng, L., and Zhao, J. (2022). Promix: Combating label noise via maximizing clean sample utility. arXiv preprint arXiv:2207.10276.
Search in Google Scholar Back to article
Wang, Z., Giri, R., Isik, U., Valin, J.-M., and Krishnaswamy, A. (2021). Semi-supervised singing voice separation with noisy self-training. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 31–35. DOI: 10.1109/ICASSP39728.2021.9413723
Open DOI Search in Google Scholar Back to article
Wisdom, S., Hershey, J. R., Wilson, K., Thorpe, J., Chinen, M., Patton, B., and Saurous, R. A. (2019). Differentiable consistency constraints for improved deep speech enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 900–904. DOI: 10.1109/ICASSP.2019.8682783
Open DOI Search in Google Scholar Back to article
Wisdom, S., Tzinis, E., Erdogan, H., Weiss, R., Wilson, K., and Hershey, J. (2020). Unsupervised sound separation using mixture invariant training. Advances in Neural Information Processing Systems, 33: 3846–3857.
Search in Google Scholar Back to article
Zhang, Z., Liu, Q., and Wang, Y. (2017). Road extraction by deep residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15: 749–753. DOI: 10.1109/LGRS.2018.2802944
Open DOI Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/tismir.171 | Journal eISSN: 2514-3298

Journal RSS Feed

Language: English

Submitted on: Aug 22, 2023

Accepted on: Feb 13, 2024

Published on: Apr 18, 2024

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Music Source Separation,

Sound,

© 2024 Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 7 (2024): Issue 1