SAMannot: A Memory-Efficient, Local, Open-Source Framework for Interactive Video Instance Segmentation Based on SAM2

Gergely Dinya; András Gelencsér; Krisztina Kupán; Clemens Küpper; Kristóf Karacs; Anna Gelencsér-Horváth

doi:10.5334/jors.680

SAMannot: A Memory-Efficient, Local, Open-Source Framework for Interactive Video Instance Segmentation Based on SAM2

Journal of Open Research Software

Volume 14 (2026): Issue 1

By: Gergely Dinya , András Gelencsér , Krisztina Kupán , Clemens Küpper , Kristóf Karacs and Anna Gelencsér-Horváth

Open Access

|Apr 2026

Jocher G, Qiu J. Ultralytics YOLO11 (Version 11.0.0) [Computer software]. GitHub; 2024. https://github.com/ultralytics/ultralytics (Accessed 16 January 2026).
Search in Google Scholar Back to article
Lauer J, Zhou M, Ye S, Menegas W, Schneider S, Nath T, Rahman MM, Di Santo V, Soberanes D, Feng G, Murthy VN, Lauder G, Dulac C, Mathis M, Mathis A. Multi-animal pose estimation, identification and tracking with deeplabcut. Nature Methods. 2022;19:496–504. DOI: 10.1038/s41592-022-01443-0
Open DOI Search in Google Scholar Back to article
Dwyer B, Nelson J, Hansen T, et al. Roboflow (version 1.0) [software]. Computer vision platform; 2025.
Search in Google Scholar Back to article
Encord Technologies, Inc. Encord [Computer Software]. (n.d). https://encord.com (Accessed 16 January 2026).
Search in Google Scholar Back to article
Tensor Matics, Inc. Labellerr [Computer software]. (n.d.). https://www.labellerr.com/ (Accessed 16 January 2026).
Search in Google Scholar Back to article
Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W-Y, Dollár P, Girshick R. Segment anything; 2023. DOI: 10.1109/ICCV51070.2023.00371
Open DOI Search in Google Scholar Back to article
Computer Vision Annotation Tool (CVAT) (Version 2.25.0) [Computer software]. https://cvat.ai/ (Accessed 16 January 2026).
Search in Google Scholar Back to article
Kanbertay O, Vogg R, Karakoc E, Kappeler PM, Fichtel C, Ecker AS. Silvi: Simple interface for labeling video interactions; 2025. Accessed 21 December 2025.
Search in Google Scholar Back to article
Dutta A, Zisserman A. The VIA annotation software for images, audio and video. In Proceedings of the 27th ACM International Conference on Multimedia, MM ‘19. New York, NY, USA: ACM; 2019. DOI: 10.1145/3343031.3350535
Open DOI Search in Google Scholar Back to article
Ravi N, Gabeur V, Hu Y-T, Hu R, Ryali C, Ma T, Khedr H, Rädle R, Rolland C, Gustafson L, Mintun E, Pan J, Alwala KV, Carion N, Wu C-Y, Girshick R, Dollár P, Feichtenhofer C. Sam 2: Segment anything in images and videos. arXiv preprint; 2024.
Search in Google Scholar Back to article
Python Software Foundation. tkinter — Python interface to Tcl/Tk. Python 3 Documentation. Accessed 20 December 2025.
Search in Google Scholar Back to article
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 2010;88(2):303–338. DOI: 10.1007/s11263-009-0275-4
Open DOI Search in Google Scholar Back to article
aza1200. Are there any method for reducing gpu memory overhead? (issue #196). GitHub issue in facebookresearch/sam2; August 2024. Accessed 21 December 2025.
Search in Google Scholar Back to article
aendrs. Sam2 for segmenting a 2 hour video? (issue #264). GitHub issue in facebookresearch/sam2; August 2024. Accessed 21 December 2025.
Search in Google Scholar Back to article
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. DOI: 10.1109/CVPR.2016.85
Open DOI Search in Google Scholar Back to article
Pont-Tuset J, Perazzi F, Caelles S, Arbeláez P, Sorkine-Hornung A, Van Gool L. The 2017 davis challenge on video object segmentation. arXiv:1704.00675; 2017.
Search in Google Scholar Back to article
Hong L, Liu Z, Chen W, Tan C, Feng Y, Zhou X, Guo P, Li J, Chen Z, Gao S, Zhang W, Zhang W. Lvos: A benchmark for large-scale long-term video object segmentation. arXiv preprint arXiv:2404.19326; 2024. DOI: 10.1109/ICCV51070.2023.01240
Open DOI Search in Google Scholar Back to article
Van Rijsbergen CJ. Information retrieval. 2nd ed. Oxford: Butterworth-Heinemann; 1979.
Search in Google Scholar Back to article
Rand WM. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association. 1971;66(336):846–850. DOI: 10.1080/01621459.1971.10482356
Open DOI Search in Google Scholar Back to article
NVIDIA Corporation. NVIDIA System Management Interface (nvidia-smi) (Version 535.183.01) [Computer software]; 2024.
Search in Google Scholar Back to article
Carion N, Gustafson L, Hu Y-T, Debnath S, Hu R, Suris D, Ryali C, Alwala KV, Khedr H, Huang A, Lei J, Ma T, Guo B, Kalla A, Marks M, Greer J, Wang M, Sun P, Rädle R, Afouras T, Mavroudi E, Xu K, Wu T-H, Zhou Y, Momeni L, Hazra R, Ding S, Vaze S, Porcher F, Li F, Li S, Kamath A, Cheng HK, Dollár P, Ravi N, Saenko K, Zhang P, Feichtenhofer C. Sam 3: Segment anything with concepts; 2025.
Search in Google Scholar Back to article

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/jors.680 | Journal eISSN: 2049-9647

Journal RSS Feed

Language: English

Submitted on: Jan 16, 2026

Accepted on: Mar 26, 2026

Published on: Apr 20, 2026

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

instance segmentation,

video annotation,

instance tracking,

instance labeling

© 2026 Gergely Dinya, András Gelencsér, Krisztina Kupán, Clemens Küpper, Kristóf Karacs, Anna Gelencsér-Horváth, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 14 (2026): Issue 1