
Real World Music Object Recognition
References
- Baró, A., Riba, P., and Fornés, A. (2018). A starting point for handwritten music recognition. In 1st International Workshop on Reading Music Systems. DOI: 10.1016/j.patrec.2019.02.029
- Bellini, P., Bruno, I., and Nesi, P. (2001). Optical music sheet segmentation. In Proceedings First International Conference on WEB Delivering of Music.
WEDELMUSIC 2001 , pages 183–190. DOI: 10.1109/WDM.2001.990175 - Calvo-Zaragoza, J., Hajič
Jr , J., and Pacha, A. (2020). Understanding optical music recognition. ACM Computing Surveys (CSUR), 53(4):1–35. DOI: 10.1145/3397499 - Calvo-Zaragoza, J. and Rizo, D. (2018a). Camera-primus: Neural end-to-end optical music recognition on realistic monophonic scores. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 248–255. DOI: 10.3390/app8040606
- Calvo-Zaragoza, J. and Rizo, D. (2018b). End-to-end neural optical music recognition of monophonic scores. Applied Sciences, 8(4):606. DOI: 10.3390/app8040606
- Castellanos, F. J., Gallego, A.-J., and Calvo-Zaragoza, J. (2021). Unsupervised domain adaptation for document analysis of music score images. In 22nd International Society for Music Information Retrieval Conference (ISMIR), pages 81–87.
- Chen, Y., Li, W., Sakaridis, C., Dai, D., and Van Gool, L. (2018). Domain adaptive faster R-CNN for object detection in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3339–3348. DOI: 10.1109/CVPR.2018.00352
- Chiu, C.-C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R. J., Rao, K., Gonina, K., Jaitly, N., Li, B., Chorowski, J., and Bacchiani, M. (2018). State-of-the-art speech recognition with sequence-to-sequence models. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4774–4778. DOI: 10.1109/ICASSP.2018.8462105
- Chowdhury, A. and Vig, L. (2018). An efficient end-to-end neural model for handwritten text recognition. In British Machine Vision Conference 2018 (BMVC), page 202.
- Ciregan, D., Meier, U., and Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3642–3649.
IEEE . DOI: 10.1109/CVPR.2012.6248110 - Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., and Le, Q. V. (2019). Autoaugment: Learning augmentation strategies from data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 113–123. DOI: 10.1109/CVPR.2019.00020
- Dalitz, C., Droettboom, M., Pranzas, B., and Fujinaga, I. (2008). A comparative study of staff removal algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5):753–766. DOI: 10.1109/TPAMI.2007.70749
- Dietterich, T. G. (2000). Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems, pages 1–15. DOI: 10.1007/3-540-45014-9_1
- Durasov, N., Bagautdinov, T., Baque, P., and Fua, P. (2021). Masksembles for uncertainty estimation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13539–13548. DOI: 10.1109/CVPR46437.2021.01333
- Dürr, O., Sick, B., and Murina, E. (2020). Probabilistic Deep Learning: With Python, Keras and TensorFlow Probability. Manning Publications.
- Elezi, I., Tuggener, L., Pelillo, M., and Stadelmann, T. (2018). Deepscores and deep watershed detection: Current state and open issues. In 1st International Workshop on Reading Music Systems.
- Fujinaga, I. (2004).
Staff detection and removal . In George, S. E., editor, Visual Perception of Music Notation: On-Line and Off Line Recognition, pages 1–39. IGI Global. DOI: 10.4018/978-1-59140-298-5.ch001 - Gallego, A.-J. and Calvo-Zaragoza, J. (2017). Staff-line removal with selectional auto-encoders. Expert Systems with Applications, 89:138–148. DOI: 10.1016/j.eswa.2017.07.002
- Ganin, Y. and Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning (ICML), pages 1180–1189.
- Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1):2096–2030.
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11):139–144. DOI: 10.1145/3422622
- Gustafsson, F. K., Danelljan, M., and Schon, T. B. (2020). Evaluating scalable Bayesian deep learning methods for robust computer vision. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 318–319. DOI: 10.1109/CVPRW50498.2020.00167
- Hajič
Jr , J., Dorfer, M., Widmer, G., and Pecina, P. (2018). Towards full-pipeline handwritten OMR with musical symbol detection by U-Nets. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 225–232. - Hajič
Jr , J. and Pecina, P. (2017). The muscima++ dataset for handwritten optical music recognition. In 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pages 39–46. DOI: 10.1109/ICDAR.2017.16 - Han, J., Ding, J., Li, J., and Xia, G.-S. (2021). Align deep features for oriented object detection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–11. DOI: 10.1109/TGRS.2021.3062048
- He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778. DOI: 10.1109/CVPR.2016.90
- Huang, G., Li, Y., Pleiss, G., Liu, Z., Hopcroft, J. E., and Weinberger, K. Q. (2017). Snapshot ensembles: Train 1, get M for free. In 5th International Conference on Learning Representations (ICLR).
- Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), pages 448–456.
- Kingma, D. P. and Ba, J. (2021). Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR).
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90. DOI: 10.1145/3065386
- Lehner, A., Gasperini, S., Marcos-Ramiro, A., Schmidt, M., Mahani, M.-A. N., Navab, N., Busam, B., and Tombari, F. (2022). 3D-VField: Adversarial augmentation of point clouds for domain generalization in 3D object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 17295–17304. DOI: 10.1109/CVPR52688.2022.01678
- Li, Y.-J., Dai, X., Ma, C.-Y., Liu, Y.-C., Chen, K., Wu, B., He, Z., Kitani, K., and Vajda, P. (2022). Cross-domain adaptive teacher for object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7581–7590. DOI: 10.1109/CVPR52688.2022.00743
- Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017). Feature pyramid networks for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2125. DOI: 10.1109/CVPR.2017.106
- Mateiu, T. N., Gallego, A.-J., and Calvo-Zaragoza, J. (2019). Domain adaptation for handwritten symbol recognition: A case of study in old music manuscripts. In Iberian Conference on Pattern Recognition and Image Analysis, pages 135–146. DOI: 10.1007/978-3-030-31321-0_12
- Nguyen, A., Yosinski, J., and Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 427–436. DOI: 10.1109/CVPR.2015.7298640
- Pacha, A. and Calvo-Zaragoza, J. (2018). Optical music recognition in mensural notation with region-based convolutional neural networks. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 240–247.
- Pacha, A., Choi, K.-Y., Coüasnon, B., Ricquebourg, Y., Zanibbi, R., and Eidenberger, H. (2018a).
Handwritten music object detection: Open issues and baseline results . In 13th IAPR International Workshop on Document Analysis Systems (DAS), pages 163–168. IEEE. DOI: 10.1109/DAS.2018.51 - Pacha, A., Hajič
Jr , J., and Calvo-Zaragoza, J. (2018b). A baseline for general music object detection with deep learning. Applied Sciences, 8(9):1488. DOI: 10.3390/app8091488 - Pugin, L. (2006). Optical music recognitoin of early typographic prints using hidden Markov models. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), pages 53–56.
- Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R., Guedes, C., and Cardoso, J. S. (2012). Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1:173–190. DOI: 10.1007/s13735-012-0004-6
- Sato, I., Nishimura, H., and Yokoi, K. (2015). Apac: Augmented pattern classification with neural networks. arXiv preprint
arXiv:1505.03229 . - Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61:85–117. DOI: 10.1016/j.neunet.2014.09.003
- Simmler, N., Sager, P., Andermatt, P., Chavarriaga, R., Schilling, F.-P., Rosenthal, M., and Stadelmann, T. (2021). A survey of un-, weakly-, and semi-supervised learning methods for noisy, missing and partial labels in industrial vision applications. In 2021 8th Swiss Conference on Data Science (SDS), pages 26–31. DOI: 10.1109/SDS51136.2021.00012
- Smith, L. N. (2017). Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 464–472. DOI: 10.1109/WACV.2017.58
- Solovyev, R., Wang, W., and Gabruseva, T. (2021). Weighted boxes fusion: Ensembling boxes from different object detection models. Image and Vision Computing, 107:
104117 . DOI: 10.1016/j.imavis.2021.104117 - Stadelmann, T., Amirian, M., Arabaci, I., Arnold, M., Duivesteijn, G. F., Elezi, I., Geiger, M., Lorwald, S., Meier, B. B., Rombach, K., et al. (2018). Deep learning in the wild. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition, pages 17–38. DOI: 10.1007/978-3-319-99978-4_2
- Toyama, F., Shoji, K., and Miyamichi, J. (2006). Symbol recognition of printed piano scores with touching symbols. In 18th International Conference on Pattern Recognition (ICPR), volume 2, pages 480–483. DOI: 10.1109/ICPR.2006.1099
- Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., and Stadelmann, T. (2018a). DeepScores: A dataset for segmentation, detection and classification of tiny objects. In 24th International Conference on Pattern Recognition (ICPR), pages 3704–3709. DOI: 10.1109/ICPR.2018.8545307
- Tuggener, L., Elezi, I., Schmidhuber, J., and Stadelmann, T. (2018b). Deep watershed detector for music object recognition. In 19th International Society for Music Information Retrieval Conference (ISMIR), pages 271–278.
- Tuggener, L., Satyawan, Y. P., Pacha, A., Schmidhuber, J., and Stadelmann, T. (2021). The DeepScoresV2 dataset and benchmark for music object detection. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 9188–9195. DOI: 10.1109/ICPR48806.2021.9412290
- Tzeng, E., Hoffman, J., Saenko, K., and Darrell, T. (2017). Adversarial discriminative domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7167–7176. DOI: 10.1109/CVPR.2017.316
- van der Wel, E. and Ullrich, K. (2017). Optical music recognition with convolutional sequence-to-sequence models. In 18th International Society for Music Information Retrieval Conference (ISMIR), pages 731–737.
- von Oswald, J., Kobayashi, S., Sacramento, J., Meulemans, A., Henning, C., and Grewe, B. F. (2021). Neural networks with late-phase weights. In 9th International Conference on Learning Representations (ICLR).
- Wen, Y., Tran, D., and Ba, J. (2020). BatchEnsemble: An alternative approach to efficient ensemble and lifelong learning. In 8th International Conference on Learning Representations (ICLR).
- Wenzel, F., Snoek, J., Tran, D., and Jenatton, R. (2020). Hyperparameter ensembles for robustness and uncertainty quantification. In 34th International Conference on Neural Information Processing Systems (NeurIPS), pages 6514–6527.
- Xia, Y., Zhang, J., Jiang, T., Gong, Z., Yao, W., and Feng, L. (2021). HatchEnsemble: An efficient and practical uncertainty quantification method for deep neural networks. Complex & Intelligent Systems, 7:2855–2869. DOI: 10.1007/s40747-021-00463-1
- Zhu, X., Liu, Y., Qin, Z., and Li, J. (2017). Data augmentation in emotion classification using generative adversarial networks. arXiv preprint
arXiv:1711.00648 . DOI: 10.1007/978-3-319-93040-4_28 - Zhu, X., Pang, J., Yang, C., Shi, J., and Lin, D. (2019). Adapting object detectors via selective cross-domain alignment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 687–696. DOI: 10.1109/CVPR.2019.00078
DOI: https://doi.org/10.5334/tismir.157 | Journal eISSN: 2514-3298
Language: English
Submitted on: Dec 6, 2022
Accepted on: Jul 31, 2023
Published on: Jan 11, 2024
Published by: Ubiquity Press
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year
Keywords:
© 2024 Lukas Tuggener, Raphael Emberger, Adhiraj Ghosh, Pascal Sager, Yvan Putra Satyawan, Javier Montoya, Simon Goldschagg, Florian Seibold, Urs Gut, Philipp Ackermann, Jürgen Schmidhuber, Thilo Stadelmann, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.