Real World Music Object Recognition

Lukas Tuggener; Raphael Emberger; Adhiraj Ghosh; Pascal Sager; Yvan Putra Satyawan; Javier Montoya; Simon Goldschagg; Florian Seibold; Urs Gut; Philipp Ackermann; Jürgen Schmidhuber; Thilo Stadelmann

doi:10.5334/tismir.157

Real World Music Object Recognition

Transactions of the International Society for Music Information Retrieval

Volume 7 (2024): Issue 1

By: Lukas Tuggener , Raphael Emberger , Adhiraj Ghosh, Pascal Sager , Yvan Putra Satyawan , Javier Montoya , Simon Goldschagg, Florian Seibold, Urs Gut, Philipp Ackermann , Jürgen Schmidhuber and Thilo Stadelmann

Open Access

|Jan 2024

Baró, A., Riba, P., and Fornés, A. (2018). A starting point for handwritten music recognition. In 1st International Workshop on Reading Music Systems. DOI: 10.1016/j.patrec.2019.02.029
Open DOI Search in Google Scholar Back to article
Bellini, P., Bruno, I., and Nesi, P. (2001). Optical music sheet segmentation. In Proceedings First International Conference on WEB Delivering of Music. WEDELMUSIC 2001, pages 183–190. DOI: 10.1109/WDM.2001.990175
Open DOI Search in Google Scholar Back to article
Calvo-Zaragoza, J., Hajič Jr, J., and Pacha, A. (2020). Understanding optical music recognition. ACM Computing Surveys (CSUR), 53(4):1–35. DOI: 10.1145/3397499
Open DOI Search in Google Scholar Back to article
Calvo-Zaragoza, J. and Rizo, D. (2018a). Camera-primus: Neural end-to-end optical music recognition on realistic monophonic scores. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 248–255. DOI: 10.3390/app8040606
Open DOI Search in Google Scholar Back to article
Calvo-Zaragoza, J. and Rizo, D. (2018b). End-to-end neural optical music recognition of monophonic scores. Applied Sciences, 8(4):606. DOI: 10.3390/app8040606
Open DOI Search in Google Scholar Back to article
Castellanos, F. J., Gallego, A.-J., and Calvo-Zaragoza, J. (2021). Unsupervised domain adaptation for document analysis of music score images. In 22nd International Society for Music Information Retrieval Conference (ISMIR), pages 81–87.
Search in Google Scholar Back to article
Chen, Y., Li, W., Sakaridis, C., Dai, D., and Van Gool, L. (2018). Domain adaptive faster R-CNN for object detection in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3339–3348. DOI: 10.1109/CVPR.2018.00352
Open DOI Search in Google Scholar Back to article
Chiu, C.-C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R. J., Rao, K., Gonina, K., Jaitly, N., Li, B., Chorowski, J., and Bacchiani, M. (2018). State-of-the-art speech recognition with sequence-to-sequence models. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4774–4778. DOI: 10.1109/ICASSP.2018.8462105
Open DOI Search in Google Scholar Back to article
Chowdhury, A. and Vig, L. (2018). An efficient end-to-end neural model for handwritten text recognition. In British Machine Vision Conference 2018 (BMVC), page 202.
Search in Google Scholar Back to article
Ciregan, D., Meier, U., and Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3642–3649. IEEE. DOI: 10.1109/CVPR.2012.6248110
Open DOI Search in Google Scholar Back to article
Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. (2019). Autoaugment: Learning augmentation strategies from data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 113–123. DOI: 10.1109/CVPR.2019.00020
Open DOI Search in Google Scholar Back to article
Dalitz, C., Droettboom, M., Pranzas, B., and Fujinaga, I. (2008). A comparative study of staff removal algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5):753–766. DOI: 10.1109/TPAMI.2007.70749
Open DOI Search in Google Scholar Back to article
Dietterich, T. G. (2000). Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems, pages 1–15. DOI: 10.1007/3-540-45014-9_1
Open DOI Search in Google Scholar Back to article
Durasov, N., Bagautdinov, T., Baque, P., and Fua, P. (2021). Masksembles for uncertainty estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13539–13548. DOI: 10.1109/CVPR46437.2021.01333
Open DOI Search in Google Scholar Back to article
Dürr, O., Sick, B., and Murina, E. (2020). Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability. Manning Publications.
Search in Google Scholar Back to article
Elezi, I., Tuggener, L., Pelillo, M., and Stadelmann, T. (2018). Deepscores and deep watershed detection: Current state and open issues. In 1st International Workshop on Reading Music Systems.
Search in Google Scholar Back to article
Fujinaga, I. (2004). Staff detection and removal. In George, S. E., editor, Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 1–39. IGI Global. DOI: 10.4018/978-1-59140-298-5.ch001
Open DOI Search in Google Scholar Back to article
Gallego, A.-J. and Calvo-Zaragoza, J. (2017). Staff-line removal with selectional auto-encoders. Expert Systems with Applications, 89:138–148. DOI: 10.1016/j.eswa.2017.07.002
Open DOI Search in Google Scholar Back to article
Ganin, Y. and Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning (ICML), pages 1180–1189.
Search in Google Scholar Back to article
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1):2096–2030.
Search in Google Scholar Back to article
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11):139–144. DOI: 10.1145/3422622
Open DOI Search in Google Scholar Back to article
Gustafsson, F. K., Danelljan, M., and Schon, T. B. (2020). Evaluating scalable Bayesian deep learning methods for robust computer vision. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 318–319. DOI: 10.1109/CVPRW50498.2020.00167
Open DOI Search in Google Scholar Back to article
Hajič Jr, J., Dorfer, M., Widmer, G., and Pecina, P. (2018). Towards full-pipeline handwritten OMR with musical symbol detection by U-Nets. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 225–232.
Search in Google Scholar Back to article
Hajič Jr, J. and Pecina, P. (2017). The muscima++ dataset for handwritten optical music recognition. In 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pages 39–46. DOI: 10.1109/ICDAR.2017.16
Open DOI Search in Google Scholar Back to article
Han, J., Ding, J., Li, J., and Xia, G.-S. (2021). Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11. DOI: 10.1109/TGRS.2021.3062048
Open DOI Search in Google Scholar Back to article
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. DOI: 10.1109/CVPR.2016.90
Open DOI Search in Google Scholar Back to article
Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J. E., and Weinberger, K. Q. (2017). Snapshot ensembles: Train 1, get M for free. In 5th International Conference on Learning Representations (ICLR).
Search in Google Scholar Back to article
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), pages 448–456.
Search in Google Scholar Back to article
Kingma, D. P. and Ba, J. (2021). Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR).
Search in Google Scholar Back to article
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90. DOI: 10.1145/3065386
Open DOI Search in Google Scholar Back to article
Lehner, A., Gasperini, S., Marcos-Ramiro, A., Schmidt, M., Mahani, M.-A. N., Navab, N., Busam, B., and Tombari, F. (2022). 3D-VField: Adversarial augmentation of point clouds for domain generalization in 3D object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17295–17304. DOI: 10.1109/CVPR52688.2022.01678
Open DOI Search in Google Scholar Back to article
Li, Y.-J., Dai, X., Ma, C.-Y., Liu, Y.-C., Chen, K., Wu, B., He, Z., Kitani, K., and Vajda, P. (2022). Cross-domain adaptive teacher for object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7581–7590. DOI: 10.1109/CVPR52688.2022.00743
Open DOI Search in Google Scholar Back to article
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017). Feature pyramid networks for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2125. DOI: 10.1109/CVPR.2017.106
Open DOI Search in Google Scholar Back to article
Mateiu, T. N., Gallego, A.-J., and Calvo-Zaragoza, J. (2019). Domain adaptation for handwritten symbol recognition: A case of study in old music manuscripts. In Iberian Conference on Pattern Recognition and Image Analysis, pages 135–146. DOI: 10.1007/978-3-030-31321-0_12
Open DOI Search in Google Scholar Back to article
Nguyen, A., Yosinski, J., and Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 427–436. DOI: 10.1109/CVPR.2015.7298640
Open DOI Search in Google Scholar Back to article
Pacha, A. and Calvo-Zaragoza, J. (2018). Optical music recognition in mensural notation with region-based convolutional neural networks. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 240–247.
Search in Google Scholar Back to article
Pacha, A., Choi, K.-Y., Coüasnon, B., Ricquebourg, Y., Zanibbi, R., and Eidenberger, H. (2018a). Handwritten music object detection: Open issues and baseline results. In 13th IAPR International Workshop on Document Analysis Systems (DAS), pages 163–168. IEEE. DOI: 10.1109/DAS.2018.51
Open DOI Search in Google Scholar Back to article
Pacha, A., Hajič Jr, J., and Calvo-Zaragoza, J. (2018b). A baseline for general music object detection with deep learning. Applied Sciences, 8(9):1488. DOI: 10.3390/app8091488
Open DOI Search in Google Scholar Back to article
Pugin, L. (2006). Optical music recognitoin of early typographic prints using hidden Markov models. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pages 53–56.
Search in Google Scholar Back to article
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R., Guedes, C., and Cardoso, J. S. (2012). Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1:173–190. DOI: 10.1007/s13735-012-0004-6
Open DOI Search in Google Scholar Back to article
Sato, I., Nishimura, H., and Yokoi, K. (2015). Apac: Augmented pattern classification with neural networks. arXiv preprint arXiv:1505.03229.
Search in Google Scholar Back to article
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61:85–117. DOI: 10.1016/j.neunet.2014.09.003
Open DOI Search in Google Scholar Back to article
Simmler, N., Sager, P., Andermatt, P., Chavarriaga, R., Schilling, F.-P., Rosenthal, M., and Stadelmann, T. (2021). A survey of un-, weakly-, and semi-supervised learning methods for noisy, missing and partial labels in industrial vision applications. In 2021 8th Swiss Conference on Data Science (SDS), pages 26–31. DOI: 10.1109/SDS51136.2021.00012
Open DOI Search in Google Scholar Back to article
Smith, L. N. (2017). Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. DOI: 10.1109/WACV.2017.58
Open DOI Search in Google Scholar Back to article
Solovyev, R., Wang, W., and Gabruseva, T. (2021). Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing, 107:104117. DOI: 10.1016/j.imavis.2021.104117
Open DOI Search in Google Scholar Back to article
Stadelmann, T., Amirian, M., Arabaci, I., Arnold, M., Duivesteijn, G. F., Elezi, I., Geiger, M., Lorwald, S., Meier, B. B., Rombach, K., et al. (2018). Deep learning in the wild. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pages 17–38. DOI: 10.1007/978-3-319-99978-4_2
Open DOI Search in Google Scholar Back to article
Toyama, F., Shoji, K., and Miyamichi, J. (2006). Symbol recognition of printed piano scores with touching symbols. In 18th International Conference on Pattern Recognition (ICPR), volume 2, pages 480–483. DOI: 10.1109/ICPR.2006.1099
Open DOI Search in Google Scholar Back to article
Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., and Stadelmann, T. (2018a). DeepScores: A dataset for segmentation, detection and classification of tiny objects. In 24th International Conference on Pattern Recognition (ICPR), pages 3704–3709. DOI: 10.1109/ICPR.2018.8545307
Open DOI Search in Google Scholar Back to article
Tuggener, L., Elezi, I., Schmidhuber, J., and Stadelmann, T. (2018b). Deep watershed detector for music object recognition. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 271–278.
Search in Google Scholar Back to article
Tuggener, L., Satyawan, Y. P., Pacha, A., Schmidhuber, J., and Stadelmann, T. (2021). The DeepScoresV2 dataset and benchmark for music object detection. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 9188–9195. DOI: 10.1109/ICPR48806.2021.9412290
Open DOI Search in Google Scholar Back to article
Tzeng, E., Hoffman, J., Saenko, K., and Darrell, T. (2017). Adversarial discriminative domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7167–7176. DOI: 10.1109/CVPR.2017.316
Open DOI Search in Google Scholar Back to article
van der Wel, E. and Ullrich, K. (2017). Optical music recognition with convolutional sequence-to-sequence models. In 18th International Society for Music Information Retrieval Conference (ISMIR), pages 731–737.
Search in Google Scholar Back to article
von Oswald, J., Kobayashi, S., Sacramento, J., Meulemans, A., Henning, C., and Grewe, B. F. (2021). Neural networks with late-phase weights. In 9th International Conference on Learning Representations (ICLR).
Search in Google Scholar Back to article
Wen, Y., Tran, D., and Ba, J. (2020). BatchEnsemble: An alternative approach to efficient ensemble and lifelong learning. In 8th International Conference on Learning Representations (ICLR).
Search in Google Scholar Back to article
Wenzel, F., Snoek, J., Tran, D., and Jenatton, R. (2020). Hyperparameter ensembles for robustness and uncertainty quantification. In 34th International Conference on Neural Information Processing Systems (NeurIPS), pages 6514–6527.
Search in Google Scholar Back to article
Xia, Y., Zhang, J., Jiang, T., Gong, Z., Yao, W., and Feng, L. (2021). HatchEnsemble: An efficient and practical uncertainty quantification method for deep neural networks. Complex & Intelligent Systems, 7:2855–2869. DOI: 10.1007/s40747-021-00463-1
Open DOI Search in Google Scholar Back to article
Zhu, X., Liu, Y., Qin, Z., and Li, J. (2017). Data augmentation in emotion classification using generative adversarial networks. arXiv preprint arXiv:1711.00648. DOI: 10.1007/978-3-319-93040-4_28
Open DOI Search in Google Scholar Back to article
Zhu, X., Pang, J., Yang, C., Shi, J., and Lin, D. (2019). Adapting object detectors via selective cross-domain alignment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 687–696. DOI: 10.1109/CVPR.2019.00078
Open DOI Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/tismir.157 | Journal eISSN: 2514-3298

Journal RSS Feed

Language: English

Submitted on: Dec 6, 2022

Accepted on: Jul 31, 2023

Published on: Jan 11, 2024

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

Optical Music Recognition,

Deep Learning,

Data Augmentation,

Adversarial Training,

Model Ensembles,

Open Data

© 2024 Lukas Tuggener, Raphael Emberger, Adhiraj Ghosh, Pascal Sager, Yvan Putra Satyawan, Javier Montoya, Simon Goldschagg, Florian Seibold, Urs Gut, Philipp Ackermann, Jürgen Schmidhuber, Thilo Stadelmann, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 7 (2024): Issue 1