
WikiTextGraph: A Python Tool for Parsing Multilingual Wikipedia Text and Graph Extraction
Abstract
WikiTextGraph is an open-source Python package designed to extract and process text from Wikipedia dumps and construct internal link networks across multiple language editions. It uses efficient parsing, redirect resolution, and multilingual graph-building techniques to tackle the challenges of Wikipedia’s scale, structure, and inherent noise. With a modular architecture and a simple graphical user interface (GUI), it is suitable for both technical and non-technical users. Built for scalability and reproducibility, WikiTextGraph supports interdisciplinary research in network science, computational linguistics, and digital humanities. Its flexible design enables easy adaptation for tasks involving low-resource or cross-lingual language studies.1
© 2025 Paschalis Agapitos, Juan-Luis Suárez, Gustavo Ariel Schwartz, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.