Skip to main content
Have a personal or library account? Click to login

References

  1. Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., and Bello, J. P. (2014). MedleyDB: A multitrack dataset for annotation-intensive MIR research. In Wang, H., Yang, Y., and Lee, J. H., editors, Proceedings of the 15th International Society for Music Information Retrieval Conference (ISMIR), pages 155160.
  2. Braun, S. and Tashev, I. J. (2020). A consolidated view of loss functions for supervised deep learningbased speech enhancement. 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pages 7276. DOI: 10.1109/TSP52935.2021.9522648
  3. Cheng, H., Zhu, Z., Li, X., Gong, Y., Sun, X., and Liu, Y. (2020). Learning with instance-dependent label noise: A sample sieve approach. arXiv preprint arXiv:2010.02347.
  4. Choi, W., Kim, M., Chung, J., Lee, D., and Jung, S. (2020). Investigating U-Nets with various intermediate blocks for spectrogram-based singing voice separation. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), pages 192198.
  5. Défossez, A. (2021). Hybrid spectrogram and waveform source separation. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation.
  6. García, H. F., Aguilar, A., Manilow, E., and Pardo, B. (2021). Leveraging hierarchical structures for fewshot musical instrument recognition. In Lee, J. H., Lerch, A., Duan, Z., Nam, J., Rao, P., van Kranenburg, P., and Srinivasamurthy, A., editors, Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR), pages 220228.
  7. Han, B., Yao, Q., Yu, X., Niu, G., Xu, M., Hu, W., Tsang, I., and Sugiyama, M. (2018). Co-teaching: Robust training of deep neural networks with extremely noisy labels. Advances in Neural Information Processing Systems, 31.
  8. Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F., Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M., et al. (2021). The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 83408349. DOI: 10.1109/ICCV48922.2021.00823
  9. Hennequin, R., Khlif, A., Voituret, F., and Moussallam, M. (2020). Spleeter: A fast and efficient music source separation tool with pre-trained models. Journal of Open Source Software, 5(50): 2154. DOI: 10.21105/joss.02154
  10. Herbrich, R., Minka, T., and Graepel, T. (2007). Trueskilltm: A Bayesian skill rating system. In Scholkopf, B., Platt, J. C., and Hoffman, T., editors, Advances in Neural Information Processing Systems 19 (NIPS-06), pages 569576. MIT Press. DOI: 10.7551/mitpress/7503.003.0076
  11. Hinton, G. E., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv, abs/1503.02531.
  12. Kang, D. and Hashimoto, T. B. (2020). Improved natural language generation via loss truncation. ArXiv, abs/2004.14589. DOI: 10.18653/v1/2020.acl-main.66
  13. Kim, M., Choi, W., Chung, J., Lee, D., and Jung, S. (2021). KUIELab-MDX-Net: A two-stream neural network for music demixing. In Proceedings of the ISMIR 2021 Workshop on Music Source Separation.
  14. Kim, M. and Lee, J. H. (2023). Sound Demixing Challenge 2023 – Music Demixing Track technical report. arXiv preprint arXiv:2306.09382.
  15. Kong, Q., Cao, Y., Iqbal, T., Wang, Y., Wang, W., and Plumbley, M. D. (2020). PANNs: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 28802894. DOI: 10.1109/TASLP.2020.3030497
  16. Koo, J., Chae, Y., Jeon, C.-B., and Lee, K. (2023). Self-refining of pseudo labels for music source separation with noisy labeled data. arXiv preprint arXiv:2307.12576.
  17. Lai, C.-H., Zou, D., and Lerman, G. (2019). Robust subspace recovery layer for unsupervised anomaly detection. arXiv preprint arXiv:1904.00152.
  18. Lai, C.-H., Zou, D., and Lerman, G. (2023). Robust variational autoencoding with Wasserstein penalty for novelty detection. In International Conference on Artificial Intelligence and Statistics, pages 35383567. PMLR.
  19. Li, J., Socher, R., and Hoi, S. C. (2020). Dividemix: Learning with noisy labels as semi-supervised learning. arXiv preprint arXiv:2002.07394.
  20. Liu, H., Xie, L., Wu, J., and Yang, G. (2020). Channelwise subband input for better voice and accompaniment separation on high resolution music. In Interspeech. DOI: 10.21437/Interspeech.2020-2555
  21. Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., Ono, N., and Fontecave, J. (2017). The 2016 Signal Separation Evaluation Campaign. In Tichavsky, P., Babaie-Zadeh, M., Michel, O. J., and Thirion-Moreau, N., editors, Latent Variable Analysis and Signal Separation, pages 323332, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-53547-0_31
  22. Luo, Y. and Yu, J. (2023). Music source separation with band-split RNN. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 18931901. DOI: 10.1109/TASLP.2023.3271145
  23. Mahanta, S. K., Khilji, A. F. U. R., and Pakray, P. (2021). Deep neural network for musical instrument recognition using MFCCs. Computacion y Sistemas, 25(2): 351360. DOI: 10.13053/cys-25-2-3946
  24. Manilow, E., Wichern, G., and Roux, J. L. (2020). Hierarchical musical instrument separation. In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR).
  25. Manilow, E., Wichern, G., Seetharaman, P., and Le Roux, J. (2019). Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 4549. IEEE. DOI: 10.1109/WASPAA.2019.8937170
  26. Mitsufuji, Y., Fabbro, G., Uhlich, S., Stoter, F.-R., Defossez, A., Kim, M., Choi, W., Yu, C.-Y., and Cheuk, K.- W. (2022). Music Demixing Challenge 2021. Frontiers in Signal Processing, 1: 18. DOI: 10.3389/frsip.2021.808395
  27. Mukherjee, D., Guha, A., Solomon, J. M., Sun, Y., and Yurochkin, M. (2021). Outlier-robust optimal transport. In International Conference on Machine Learning, pages 78507860. PMLR.
  28. Ono, N., Rafii, Z., Kitamura, D., Ito, N., and Liutkus, A. (2015). The 2015 Signal Separation Evaluation Campaign. In Vincent, E., Yeredor, A., Koldovský, Z., and Tichavský, P., editors, Latent Variable Analysis and Signal Separation, pages 387395, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-22482-4_45
  29. Pereira, I., Araújo, F., Korzeniowski, F., and Vogl, R. (2023). MoisesDB: A dataset for source separation beyond 4-stems. In Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR), pages 619626.
  30. Petermann, D., Wichern, G., Wang, Z.-Q., and Roux, J. L. (2022). The cocktail fork problem: Threestem audio separation for real-world soundtracks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP43922.2022.9746005
  31. Prätzlich, T., Müller, M., Bohl, B. W., and Veit, J. (2015). Freischutz Digital: Demos of audio-related contributions. In Demos and Late Breaking News of the International Society for Music Information Retrieval Conference (ISMIR).
  32. Rafii, Z., Liutkus, A., Stöter, F.-R., Mimilakis, S. I., and Bittner, R. (2017). The MUSDB18 corpus for music separation. DOI: 10.5281/zenodo.1117372.
  33. Rouard, S., Massa, F., and Défossez, A. (2023). Hybrid transformers for music source separation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). DOI: 10.1109/ICASSP49357.2023.10096956
  34. Sawata, R., Takahashi, N., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2023). The whole is greater than the sum of its parts: Improving DNNbased music source separation. arXiv preprint arXiv:2305.07855.
  35. Sawata, R., Uhlich, S., Takahashi, S., and Mitsufuji, Y. (2021). All for one and one for all: Improving music separation by bridging networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5155. DOI: 10.1109/ICASSP39728.2021.9414044
  36. Solovyev, R., Stempkovskiy, A., and Habruseva, T. (2023). Benchmarks and leaderboards for sound demixing tasks. arXiv preprint arXiv:2305.07489.
  37. Stöter, F.-R., Liutkus, A., and Ito, N. (2018). The 2018 Signal Separation Evaluation Campaign. In Deville, Y., Gannot, S., Mason, R., Plumbley, M. D., and Ward, D., editors, Latent Variable Analysis and Signal Separation, pages 293305, Cham. Springer International Publishing. DOI: 10.1007/978-3-319-93764-9_28
  38. Stöter, F.-R., Uhlich, S., Liutkus, A., and Mitsufuji, Y. (2019). Open-unmix – A reference implementation for music source separation. Journal of Open Source Software, 4(41): 1667. DOI: 10.21105/joss.01667
  39. Tarvainen, A. and Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, 30.
  40. Torcoli, M., Kastner, T., and Herre, J. (2021). Objective measures of perceptual audio quality reviewed: An evaluation of their application domain dependence. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29: 15301541. DOI: 10.1109/TASLP.2021.3069302
  41. Uhlich, S., Fabbro, G., Masato, H., Takahashi, S., Wichern, G., Le Roux, J., Chakraborty, D., Mohanty, S., Li, K., Luo, Y., Yu, J., Gu, R., Solovyev, R., Stempkovskiy, A., Habruseva, T., Sukhovei, M., and Mitsufuji, Y. (2024). The Sound Demixing Challenge 2023 – Cinematic Demixing Track. Transactions of the International Society for Music Information Retrieval, 7(1): 4462. DOI: 10.5334/tismir.172.
  42. Uhlich, S., Porcu, M., Giron, F., Enenkl, M., Kemp, T., Takahashi, N., and Mitsufuji, Y. (2017). Improving music source separation based on deep neural networks through data augmentation and network blending. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 261265. DOI: 10.1109/ICASSP.2017.7952158
  43. Wang, H., Xiao, R., Dong, Y., Feng, L., and Zhao, J. (2022). Promix: Combating label noise via maximizing clean sample utility. arXiv preprint arXiv:2207.10276.
  44. Wang, Z., Giri, R., Isik, U., Valin, J.-M., and Krishnaswamy, A. (2021). Semi-supervised singing voice separation with noisy self-training. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3135. DOI: 10.1109/ICASSP39728.2021.9413723
  45. Wisdom, S., Hershey, J. R., Wilson, K., Thorpe, J., Chinen, M., Patton, B., and Saurous, R. A. (2019). Differentiable consistency constraints for improved deep speech enhancement. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 900904. DOI: 10.1109/ICASSP.2019.8682783
  46. Wisdom, S., Tzinis, E., Erdogan, H., Weiss, R., Wilson, K., and Hershey, J. (2020). Unsupervised sound separation using mixture invariant training. Advances in Neural Information Processing Systems, 33: 38463857.
  47. Zhang, Z., Liu, Q., and Wang, Y. (2017). Road extraction by deep residual U-Net. IEEE Geoscience and Remote Sensing Letters, 15: 749753. DOI: 10.1109/LGRS.2018.2802944
DOI: https://doi.org/10.5334/tismir.171 | Journal eISSN: 2514-3298
Language: English
Submitted on: Aug 22, 2023
Accepted on: Feb 13, 2024
Published on: Apr 18, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.