Data Modeling

Mathematical and Computational Data Modeling
0
Citations
1.7k
Views
6
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
Figures and Tables
RESEARCH ARTICLE   (Open Access)

Machine Learning–Driven Classification of Thyroid Disorders: An XGBoost-Based Approach for Improved Diagnostic Accuracy

Sayed Rokibul Hossain1*, Kamruzzaman Mithu 1, Md. Nesar Uddin1, Md. Ataur Rahman1, Khondaker Abdullah Al Mamun1

+ Author Affiliations

Data Modeling 5 (1) 1-8 https://doi.org/10.25163/data.5110753

Submitted: 10 June 2024 Revised: 13 August 2024  Published: 14 August 2024 


Abstract

Thyroid disorders, though often clinically manageable, remain surprisingly difficult to diagnose in their early stages, largely due to overlapping and nonspecific symptoms. This diagnostic ambiguity, combined with subtle variations in biochemical markers, has created a growing need for more reliable and data-driven approaches. In this context, machine learning techniques have begun to offer a promising pathway—though their practical effectiveness, particularly under conditions of data imbalance and variability, still requires careful evaluation. This study explores the application of a machine learning framework for the classification of thyroid disease into three clinically relevant categories: hyperthyroidism, hypothyroidism, and normal function. Using the publicly available UCI thyroid dataset, a structured preprocessing pipeline was implemented, including feature selection, handling of missing values, and exploratory data analysis to understand underlying patterns. Among several potential models, Extreme Gradient Boosting (XGBoost) was selected due to its robustness in handling missing data and imbalanced class distributions. The model was evaluated across multiple configurations, including baseline, optimized, and imbalance-aware approaches. The results, at first glance, appear highly promising, with classification accuracy reaching approximately 99% and balanced accuracy improving to around 94% in optimized settings. Feature importance analysis further revealed that key hormonal indicators—such as TSH, TT4, and FTI—play a dominant role in prediction, aligning with established clinical understanding. Despite these encouraging findings, it is important to approach the results with measured caution. The controlled nature of the dataset may not fully reflect real-world clinical variability. Nevertheless, the study underscores the potential of machine learning as a supportive diagnostic tool, offering a step toward more efficient and data-informed clinical decision-making.

Keywords: Thyroid disease; Machine learning; XGBoost; Clinical prediction; Class imbalance

References


Ashman, R., & Molina, P. E. (2013). Endocrine physiology (4th ed.). McGraw-Hill Medical.

Asif, M. A. A. R., Nishat, M. M., Faisal, F., Shikder, M. F., Udoy, M. H., Dip, R. R., & Ahsan, R. (2020). Computer-aided diagnosis of thyroid disease using machine learning algorithms. In Proceedings of the International Conference on Electrical and Computer Engineering (pp. 222–225).

Atasayar, S., & Demir, S. G. (2019). Determination of the problems experienced by patients post-thyroidectomy. Clinical Nursing Research, 28(5), 615–635.

Boelaert, K., Visser, W. E., Taylor, P. N., Moran, C., Léger, J., & Persani, L. (2020). Endocrinology in the time of COVID-19: Management of hyperthyroidism and hypothyroidism. European Journal of Endocrinology, 183(1), G33–G39.

Chaubey, G., Bisen, D., Arjaria, S., & Yadav, V. (2021). Thyroid disease prediction using machine learning approaches. National Academy Science Letters, 44(3), 233–238.

de Morais, N. S., Stuart, J., Guan, H., Wang, Z., Cibas, E. S., Frates, M. C., Benson, C. B., Cho, N. L., Nehs, M. A., Alexander, C. A., Marqusee, E., Kim, M. I., Lorch, J. H., Barletta, J. A., Angell, T. E., & Alexander, E. K. (2019). The impact of Hashimoto thyroiditis on thyroid nodule cytology and risk of thyroid cancer. Journal of the Endocrine Society, 3(4), 791–800.

Duggal, P., & Shukla, S. (2020). Prediction of thyroid disorders using advanced machine learning techniques. In Proceedings of the International Conference on Cloud Computing, Data Science & Engineering (pp. 670–675).

Gyuricsko, E. (2020). The “slightly” abnormal thyroid test: What is the pediatrician to do? Current Problems in Pediatric and Adolescent Health Care, 50(4), 100770.

Hammer, G. D., & McPhee, S. J. (2018). Pathophysiology of disease: An introduction to clinical medicine (8th ed.). McGraw-Hill Medical.

Pan, Q., Zhang, Y., Zuo, M., Xiang, L., & Chen, D. (2016). Improved ensemble classification method of thyroid disease based on random forest. In Proceedings of the International Conference on Information Technology in Medicine and Education (pp. 567–571).

Razia, S., Kumar, P. S., & Rao, A. S. (2020). Modern approaches in machine learning and cognitive science: A walkthrough. In V. K. Gunjan, J. M. Zurada, B. Raman, & G. R. Gangadharan (Eds.), Advances in intelligent systems and computing (Vol. 885). Springer.

Saiti, F., Naini, A. A., Shoorehdeli, M. A., & Teshnehlab, M. (2009). Thyroid disease diagnosis based on genetic algorithms using probabilistic neural network and support vector machine. In Proceedings of the International Conference on Bioinformatics and Biomedical Engineering (pp. 1–4).

Taylor, P. N., Albrecht, D., Scholz, A., Gutierrez-Buey, G., Lazarus, J. H., Dayan, C. M., & Okosieme, O. E. (2018). Global epidemiology of hyperthyroidism and hypothyroidism. Nature Reviews Endocrinology, 14(5), 301–316.

Temurtas, F. (2009). A comparative study on thyroid disease diagnosis using neural networks. Expert Systems with Applications, 36(1), 944–949.


Article metrics
View details
0
Downloads
0
Citations
26
Views

View Dimensions


View Plumx


View Altmetric



0
Save
0
Citation
26
View
0
Share