Machine learning-based classification of DNA sequences for diabetes mellitus type prediction

Albegli Ahmed Hasan Ahmed; Kusum Yadav

doi:10.2478/jsiot-2025-0003

.blurhash-client-img { display: none !important; }

Machine learning-based classification of DNA sequences for diabetes mellitus type prediction

Journal of Smart Internet of Things

Volume 2025 (2025): Issue 1 (June 2025)

By: Albegli Ahmed Hasan Ahmed and Kusum Yadav

Open Access

|Jun 2025

Abstract

A machine learning (ML) algorithm is used to classify DNA sequences and predict diabetes risk using the results of this study. Researchers use the INS Insulin Dataset to explore multiple preprocessing strategies such as k-mer representations, ordinal encodings, oversamplings, and min-max normalizations of DNA sequences from diabetic and non-diabetic subjects. The performance of the model was enhanced by using feature selection techniques such as F-regressors and Mutual Information. A study based on accuracy, precision, recall, and F1-score values has been done on four bioinformatics classifiers, including Random Forest, Gaussian Naive Bayes, and Support Vector Machines (SVM). Results demonstrated that Random Forest achieved the highest accuracy (0.89 with F-regressor), followed by SVM and Decision Tree, while Gaussian Naïve Bayes showed moderate performance. The findings highlight the effectiveness of machine learning in uncovering genetic patterns associated with diabetes and emphasize the potential of DNA-based predictive modeling in precision medicine. This work contributes to advancing computational genomics and provides a foundation for early diagnosis and personalized treatment strategies for diabetes mellitus

References

D. S. W. Ho, W. Schierding, M. Wake, R. Saffery, and J. O’Sullivan, “Machine Learning SNP Based Prediction for Precision Medicine,” Front. Genet., vol. 10, p. 267, Mar. 2019, doi: 10.3389/fgene.2019.00267.
Open DOI Search in Google Scholar Back to article
R. A. DeFronzo et al., “Type 2 diabetes mellitus,” Nat Rev Dis Primers, vol. 1, no. 1, p. 15019, Jul. 2015, doi: 10.1038/nrdp.2015.19.
Open DOI Search in Google Scholar Back to article
A. Mahajan et al., “Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps,” Nat Genet, vol. 50, no. 11, pp. 1505–1513, Nov. 2018, doi: 10.1038/s41588-018-0241-6.
Open DOI Search in Google Scholar Back to article
J. W. Kleinberger and T. I. Pollin, “Personalized medicine in diabetes mellitus: current opportunities and future prospects,” Annals of the New York Academy of Sciences, vol. 1346, no. 1, pp. 45–56, Jun. 2015, doi: 10.1111/nyas.12757.
Open DOI Search in Google Scholar Back to article
E. Capobianco, “Systems and precision medicine approaches to diabetes heterogeneity: a Big Data perspective,” Clinical & Translational Med, vol. 6, no. 1, p. e23, Dec. 2017, doi: 10.1186/s40169-017-0155-4.
Open DOI Search in Google Scholar Back to article
M. Massi-Benedetti, “Changing targets in the treatment of type 2 diabetes,” Current Medical Research and Opinion, vol. 22, no. sup2, pp. S5–S13, Aug. 2006, doi: 10.1185/030079906X112714.
Open DOI Search in Google Scholar Back to article
I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Computational and Structural Biotechnology Journal, vol. 15, pp. 104–116, 2017, doi: 10.1016/j.csbj.2016.12.005.
Open DOI Search in Google Scholar Back to article
S. J. Al’Aref et al., “Clinical applications of machine learning in cardiovascular disease and its relevance to cardiac imaging,” European Heart Journal, vol. 40, no. 24, pp. 1975–1986, Jun. 2019, doi: 10.1093/eurheartj/ehy404.
Open DOI Search in Google Scholar Back to article
K. Shameer, K. W. Johnson, B. S. Glicksberg, J. T. Dudley, and P. P. Sengupta, “Machine learning in cardiovascular medicine: are we there yet?,” Heart, vol. 104, no. 14, pp. 1156–1164, Jul. 2018, doi: 10.1136/heartjnl-2017-311198.
Open DOI Search in Google Scholar Back to article
Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, “Predicting Diabetes Mellitus With Machine Learning Techniques,” Front. Genet., vol. 9, p. 515, Nov. 2018, doi: 10.3389/fgene.2018.00515.
Open DOI Search in Google Scholar Back to article
H. M. Deberneh and I. Kim, “Prediction of Type 2 Diabetes Based on Machine Learning Algorithm,” IJERPH, vol. 18, no. 6, p. 3317, Mar. 2021, doi: 10.3390/ijerph18063317.
Open DOI Search in Google Scholar Back to article
A. Arshad and Y. D. Khan, “DNA Computing A Survey,” in 2019 International Conference on Innovative Computing (ICIC), Lahore, Pakistan: IEEE, Nov. 2019, pp. 1–5. doi: 10.1109/ICIC48496.2019.8966707.
Open DOI Search in Google Scholar Back to article
H.-C. So and P. C. Sham, “Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits,” Bioinformatics, vol. 33, no. 6, pp. 886–892, Mar. 2017, doi: 10.1093/bioinformatics/btw745.
Open DOI Search in Google Scholar Back to article
S. A. Salloum, K. M. Alomari, and A. Salloum, “DNA sequence classification for diabetes mellitus using NuSVC and XGBoost: A comparative,” PLoS One, vol. 20, no. 7, p. e0328253, Jul. 2025, doi: 10.1371/journal.pone.0328253.
Open DOI Search in Google Scholar Back to article
Y. Jiang et al., “Immunomarker support vector machine classifier for prediction of gastric cancer survival and adjuvant chemotherapeutic benefit,” Clinical Cancer Research, vol. 24, no. 22, pp. 5574–5584, 2018.
Search in Google Scholar Back to article
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machines,” Machine Learning, vol. 46, no. 1–3, pp. 389–422, Jan. 2002, doi: 10.1023/A:1012487302797.
Open DOI Search in Google Scholar Back to article
T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California USA: ACM, Aug. 2016, pp. 785–794. doi: 10.1145/2939672.2939785.
Open DOI Search in Google Scholar Back to article
B. López, F. Torrent-Fontbona, R. Viñas, and J. M. Fernández-Real, “Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction,” Artificial Intelligence in Medicine, vol. 85, pp. 43–49, Apr. 2018, doi: 10.1016/j.artmed.2017.09.005.
Open DOI Search in Google Scholar Back to article
Y.-J. Huang, C. Chen, and H.-C. Yang, “AI-driven Integration of Multimodal Imaging Pixel Data and Genome-wide Genotype Data Enhances Precision Health for Type 2 Diabetes: Insights from a Large-scale Biobank Study,” Jul. 26, 2024. doi: 10.1101/2024.07.25.24310650.
Open DOI Search in Google Scholar Back to article
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” jair, vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.
Open DOI Search in Google Scholar Back to article
V. López, A. Fernández, S. García, V. Palade, and F. Herrera, “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics,” Information Sciences, vol. 250, pp. 113–141, Nov. 2013, doi: 10.1016/j.ins.2013.07.007.
Open DOI Search in Google Scholar Back to article
T. Rönn, A. Perfilyev, N. Oskolkov, and C. Ling, “Predicting type 2 diabetes via machine learning integration of multiple omics from human pancreatic islets,” Sci Rep, vol. 14, no. 1, p. 14637, Jun. 2024, doi: 10.1038/s41598-024-64846-3.
Open DOI Search in Google Scholar Back to article
R. M. Krauss, “Lipids and Lipoproteins in Patients With Type 2 Diabetes,” Diabetes Care, vol. 27, no. 6, pp. 1496–1504, Jun. 2004, doi: 10.2337/diacare.27.6.1496.
Open DOI Search in Google Scholar Back to article
for the Botnia Study Group et al., “Heritability and familiality of type 2 diabetes and related quantitative traits in the Botnia Study,” Diabetologia, vol. 54, no. 11, pp. 2811–2819, Nov. 2011, doi: 10.1007/s00125-011-2267-5.
Open DOI Search in Google Scholar Back to article
“Accurate Genomic Prediction of Human Height,” Genetics, vol. 214, no. 1, pp. 231–231, Jan. 2020, doi: 10.1534/genetics.119.302946.
Open DOI Search in Google Scholar Back to article
“National Library of Medicine.” [Online]. Available: https://www. ncbi.nlm.nih.gov
Search in Google Scholar Back to article
L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.
Open DOI Search in Google Scholar Back to article
B. Ingre, A. Yadav, and A. K. Soni, “Decision Tree Based Intrusion Detection System for NSL-KDD Dataset,” in Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 2, vol. 84, S. C. Satapathy and A. Joshi, Eds., in Smart Innovation, Systems and Technologies, vol. 84., Cham: Springer International Publishing, 2018, pp. 207–218. doi: 10.1007/978-3-319-63645-0_23.
Open DOI Search in Google Scholar Back to article
G. E. Hinton, S. Osindero, and Y.-W. Teh, “A Fast Learning Algorithm for Deep Belief Nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, Jul. 2006, doi: 10.1162/neco.2006.18.7.1527.
Open DOI Search in Google Scholar Back to article

Articles in this issue

DOI: https://doi.org/10.2478/jsiot-2025-0003 | Journal eISSN: 2956-8323

Journal RSS Feed

Language: English

Page range: 23 - 30

Submitted on: Mar 21, 2025

Accepted on: May 11, 2025

Published on: Jun 15, 2025

Published by: Future Sciences For Digital Publishing

In partnership with: Paradigm Publishing Services

Publication frequency: 2 issues per year

Keywords:

Machine Learning,

DNA Sequence Classification,

Diabetes Mellitus Prediction,

Genomic Data Analysis,

Random Forest / Support Vector Machine (SVM)

Related subjects:

Engineering,

Introductions and overviews,

History of engineering,

Electrical engineering,

Fundamentals of electrical engineering,

Electronics,

Information technology

© 2025 Albegli Ahmed Hasan Ahmed, Kusum Yadav, published by Future Sciences For Digital Publishing
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 2025 (2025): Issue 1 (June 2025)