
The Sound Demixing Challenge 2023 – Music Demixing Track
By: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu and Yuki Mitsufuji
References
- Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., and Bello, J. P. (2014). MedleyDB: A multitrack dataset for annotation-intensive MIR research. In Wang, H., Yang, Y., and Lee, J. H., editors, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pages 155–160.
- Braun, S. and Tashev, I. J. (2020). A consolidated view of loss functions for supervised deep learningbased speech enhancement. 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pages 72–76. DOI: 10.1109/TSP52935.2021.9522648
- Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., and Liu, Y. (2020). Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347.
- Choi, W., Kim, M., Chung, J., Lee, D., and Jung, S. (2020). Investigating U-Nets with various intermediate blocks for spectrogram-based singing voice separation. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pages 192–198.
- Défossez, A. (2021). Hybrid spectrogram and waveform source separation. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation.
- García, H. F., Aguilar, A., Manilow, E., and Pardo, B. (2021). Leveraging hierarchical structures for fewshot musical instrument recognition. In Lee, J. H., Lerch, A., Duan, Z., Nam, J., Rao, P., van Kranenburg, P., and Srinivasamurthy, A., editors, Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pages 220–228.
- Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems, 31.
- Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., et al. (2021). The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8340–8349. DOI: 10.1109/ICCV48922.2021.00823
- Hennequin, R., Khlif, A., Voituret, F., and Moussallam, M. (2020). Spleeter: A fast and efficient music source separation tool with pre-trained models. Journal of Open Source Software, 5(50): 2154. DOI: 10.21105/joss.02154
- Herbrich, R., Minka, T., and Graepel, T. (2007).
Trueskilltm: A Bayesian skill rating system . In Scholkopf, B., Platt, J. C., and Hoffman, T., editors, Advances in Neural Information Processing Systems 19 (NIPS-06), pages 569–576. MIT Press. DOI: 10.7551/mitpress/7503.003.0076 - Hinton, G. E., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv, abs/1503.02531.
- Kang, D. and Hashimoto, T. B. (2020). Improved natural language generation via loss truncation. ArXiv, abs/2004.14589. DOI: 10.18653/v1/2020.acl-main.66
- Kim, M., Choi, W., Chung, J., Lee, D., and Jung, S. (2021). KUIELab-MDX-Net: A two-stream neural network for music demixing. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation.
- Kim, M. and Lee, J. H. (2023). Sound Demixing Challenge 2023 – Music Demixing Track technical report. arXiv preprint arXiv:2306.09382.
- Kong, Q., Cao, Y., Iqbal, T., Wang, Y., Wang, W., and Plumbley, M. D. (2020). PANNs: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 2880–2894. DOI: 10.1109/TASLP.2020.3030497
- Koo, J., Chae, Y., Jeon, C.-B., and Lee, K. (2023). Self-refining of pseudo labels for music source separation with noisy labeled data. arXiv preprint arXiv:2307.12576.
- Lai, C.-H., Zou, D., and Lerman, G. (2019). Robust subspace recovery layer for unsupervised anomaly detection. arXiv preprint arXiv:1904.00152.
- Lai, C.-H., Zou, D., and Lerman, G. (2023). Robust variational autoencoding with Wasserstein penalty for novelty detection. In International Conference on Artificial Intelligence and Statistics, pages 3538–3567.
PMLR . - Li, J., Socher, R., and Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394.
- Liu, H., Xie, L., Wu, J., and Yang, G. (2020). Channelwise subband input for better voice and accompaniment separation on high resolution music. In Interspeech. DOI: 10.21437/Interspeech.2020-2555
- Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., Ono, N., and Fontecave, J. (2017).
The 2016 Signal Separation Evaluation Campaign . In Tichavsky, P., Babaie-Zadeh, M., Michel, O. J., and Thirion-Moreau, N., editors, Latent Variable Analysis and Signal Separation, pages 323–332, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-53547-0_31 - Luo, Y. and Yu, J. (2023). Music source separation with band-split RNN. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 1893–1901. DOI: 10.1109/TASLP.2023.3271145
- Mahanta, S. K., Khilji, A. F. U. R., and Pakray, P. (2021). Deep neural network for musical instrument recognition using MFCCs. Computacion y Sistemas, 25(2): 351–360. DOI: 10.13053/cys-25-2-3946
- Manilow, E., Wichern, G., and Roux, J. L. (2020). Hierarchical musical instrument separation. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR).
- Manilow, E., Wichern, G., Seetharaman, P., and Le Roux, J. (2019).
Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity . In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 45–49. IEEE. DOI: 10.1109/WASPAA.2019.8937170 - Mitsufuji, Y., Fabbro, G., Uhlich, S., Stoter, F.-R., Defossez, A., Kim, M., Choi, W., Yu, C.-Y., and Cheuk, K.- W. (2022). Music Demixing Challenge 2021. Frontiers in Signal Processing, 1: 18. DOI: 10.3389/frsip.2021.808395
- Mukherjee, D., Guha, A., Solomon, J. M., Sun, Y., and Yurochkin, M. (2021). Outlier-robust optimal transport. In International Conference on Machine Learning, pages 7850–7860.
PMLR . - Ono, N., Rafii, Z., Kitamura, D., Ito, N., and Liutkus, A. (2015).
The 2015 Signal Separation Evaluation Campaign . In Vincent, E., Yeredor, A., Koldovský, Z., and Tichavský, P., editors, Latent Variable Analysis and Signal Separation, pages 387–395, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-22482-4_45 - Pereira, I., Araújo, F., Korzeniowski, F., and Vogl, R. (2023). MoisesDB: A dataset for source separation beyond 4-stems. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR), pages 619–626.
- Petermann, D., Wichern, G., Wang, Z.-Q., and Roux, J. L. (2022). The cocktail fork problem: Threestem audio separation for real-world soundtracks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP43922.2022.9746005
- Prätzlich, T., Müller, M., Bohl, B. W., and Veit, J. (2015). Freischutz Digital: Demos of audio-related contributions. In Demos and Late Breaking News of the International Society for Music Information Retrieval Conference (ISMIR).
- Rafii, Z., Liutkus, A., Stöter, F.-R., Mimilakis, S. I., and Bittner, R. (2017). The MUSDB18 corpus for music separation. DOI: 10.5281/zenodo.1117372.
- Rouard, S., Massa, F., and Défossez, A. (2023). Hybrid transformers for music source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP49357.2023.10096956
- Sawata, R., Takahashi, N., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2023). The whole is greater than the sum of its parts: Improving DNNbased music source separation. arXiv preprint arXiv:2305.07855.
- Sawata, R., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2021). All for one and one for all: Improving music separation by bridging networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 51–55. DOI: 10.1109/ICASSP39728.2021.9414044
- Solovyev, R., Stempkovskiy, A., and Habruseva, T. (2023). Benchmarks and leaderboards for sound demixing tasks. arXiv preprint arXiv:2305.07489.
- Stöter, F.-R., Liutkus, A., and Ito, N. (2018).
The 2018 Signal Separation Evaluation Campaign . In Deville, Y., Gannot, S., Mason, R., Plumbley, M. D., and Ward, D., editors, Latent Variable Analysis and Signal Separation, pages 293–305, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-93764-9_28 - Stöter, F.-R., Uhlich, S., Liutkus, A., and Mitsufuji, Y. (2019). Open-unmix – A reference implementation for music source separation. Journal of Open Source Software, 4(41): 1667. DOI: 10.21105/joss.01667
- Tarvainen, A. and Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30.
- Torcoli, M., Kastner, T., and Herre, J. (2021). Objective measures of perceptual audio quality reviewed: An evaluation of their application domain dependence. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29: 1530–1541. DOI: 10.1109/TASLP.2021.3069302
- Uhlich, S., Fabbro, G., Masato, H., Takahashi, S., Wichern, G., Le Roux, J., Chakraborty, D., Mohanty, S., Li, K., Luo, Y., Yu, J., Gu, R., Solovyev, R., Stempkovskiy, A., Habruseva, T., Sukhovei, M., and Mitsufuji, Y. (2024). The Sound Demixing Challenge 2023 – Cinematic Demixing Track. Transactions of the International Society for Music Information Retrieval, 7(1): 44–62. DOI: 10.5334/tismir.172.
- Uhlich, S., Porcu, M., Giron, F., Enenkl, M., Kemp, T., Takahashi, N., and Mitsufuji, Y. (2017). Improving music source separation based on deep neural networks through data augmentation and network blending. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 261–265. DOI: 10.1109/ICASSP.2017.7952158
- Wang, H., Xiao, R., Dong, Y., Feng, L., and Zhao, J. (2022). Promix: Combating label noise via maximizing clean sample utility. arXiv preprint arXiv:2207.10276.
- Wang, Z., Giri, R., Isik, U., Valin, J.-M., and Krishnaswamy, A. (2021). Semi-supervised singing voice separation with noisy self-training. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 31–35. DOI: 10.1109/ICASSP39728.2021.9413723
- Wisdom, S., Hershey, J. R., Wilson, K., Thorpe, J., Chinen, M., Patton, B., and Saurous, R. A. (2019). Differentiable consistency constraints for improved deep speech enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 900–904. DOI: 10.1109/ICASSP.2019.8682783
- Wisdom, S., Tzinis, E., Erdogan, H., Weiss, R., Wilson, K., and Hershey, J. (2020). Unsupervised sound separation using mixture invariant training. Advances in Neural Information Processing Systems, 33: 3846–3857.
- Zhang, Z., Liu, Q., and Wang, Y. (2017). Road extraction by deep residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15: 749–753. DOI: 10.1109/LGRS.2018.2802944
DOI: https://doi.org/10.5334/tismir.171 | Journal eISSN: 2514-3298
Language: English
Submitted on: Aug 22, 2023
Accepted on: Feb 13, 2024
Published on: Apr 18, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year
Keywords:
© 2024 Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.