
Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python
Abstract
Fast-ER is a Python package for GPU-accelerated record linkage and deduplication. These tasks involve computing string similarity metrics for all pairs of values within or between datasets. While these individual calculations are simple, the number of comparisons grows quadratically with dataset size, making record linkage and deduplication prohibitively expensive even for moderately sized datasets. Fast-ER addresses this challenge by harnessing the computational power of CUDA-enabled graphics processing units (GPUs) to accelerate string similarity metrics calculations. Developed in Python, our software library relies heavily on CuPy, an open-source library for array-based numerical computations on GPUs [1]. Fast-ER is readily available on GitHub (https://github.com/jacobmorrier/fast-er) and the Python Package Index (fast-er-link). By enabling fast and efficient linking and deduplication of datasets without consistent identifiers, Fast-ER has wide-ranging applications across fields such as social and health sciences.
© 2026 Jacob Morrier, Sulekha Kishore, R. Michael Alvarez, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.