Machine Learning-Driven Water Quality Index Prediction: Enhancing Accuracy with Gradient Boosting and Explainable AI for Sustainable Water Monitoring
Md. Jahidul Islam1, Siraj Us Salekin2, Asif Anzum3, Nafis Zaman1, Abdullah Al Ahad Khan4, Dilip Sarkar5, Md. Liton Rabbani6, Md. Tarek Hossain6
Applied Agriculture Sciences 2(1) 1-14 https://doi.org/10.25163/agriculture.2110031
Submitted: 12 August 2024 Revised: 06 October 2024 Published: 07 October 2024
This study demonstrates advanced machine learning and Explainable AI techniques for accurate, interpretable Water Quality Index prediction.
Abstract
Background: Water is fundamental to the survival of all life forms, yet access to clean and safe water remains a critical challenge worldwide. Contaminated water is a significant contributor to waterborne diseases, highlighting the need for effective water quality monitoring. The Water Quality Index (WQI) is a standard tool for assessing water quality; however, traditional WQI methods are often constrained by inconsistencies, laboratory inaccuracies, and human error. Methods: This study aimed to overcome these limitations by integrating advanced machine learning (ML) techniques into WQI prediction. Physicochemical parameters, including pH, chloride (Cl), sulfate (SO4²), sodium (Na), potassium (K), calcium (Ca²), magnesium (Mg²), total hardness, and total dissolved solids, were collected from diverse water sources to form a robust dataset. ML algorithms such as Gradient Boosting, Random Forest, and XGBoost, augmented with explainable AI (XAI), were employed to enhance prediction accuracy. The dataset was split into training (70%), testing (15%), and validation (15%) subsets, and model performance was assessed using RMSE, MSE, MAE, and R² metrics. Results: Gradient Boosting outperformed other models, achieving 96% accuracy on the test dataset after fine-tuning. It demonstrated superior predictive capabilities, as evidenced by its performance metrics. These results indicate the potential for ML techniques to address the limitations of traditional WQI methods. Conclusion: This study demonstrates the effectiveness of ML-driven approaches in improving water quality assessments. The integration of Gradient Boosting and explainable AI provides a reliable framework for WQI prediction, enabling better decision-making in environmental health policies and water resource management. This approach offers a pathway to more efficient and accurate water quality monitoring systems.
Keywords: Water Quality Index (WQI), Water Quality Monitoring, Machine Learning Algorithms, Explainable AI (XAI), Predictive Modelling
References
Abdullah, M. S., Islam, M. J., Hasan, M. M., Sarkar, D., Rana, M. S., Das, S. S., & Hossian, M. (2024). Impact of waste management on infectious disease control: Evaluating strategies to mitigate dengue transmission and mosquito breeding sites – A systematic review. Journal of Angiotherapy, 8(8), 1–12. https://doi.org/10.25163/angiotherapy.889850
Agrawal, K. K., Panda, C., & Bhuyan, M. K. (2021). Impact of urbanization on water quality. In S. K. Acharya & D. P. Mishra (Eds.), Current advances in mechanical engineering (pp. 665–673). Springer. https://doi.org/10.1007/978-981-33-4795-3_60
Ahmed, M., Mumtaz, R., & Anwar, Z. (2022). An enhanced water quality index for water quality monitoring using remote sensing and machine learning. Applied Sciences, 12(24), Article 24. https://doi.org/10.3390/app122412787
Ahmed, U., Mumtaz, R., Anwar, H., Shah, A. A., Irfan, R., & García-Nieto, J. (2019). Efficient water quality prediction using supervised machine learning. Water, 11(11), 2210. https://doi.org/10.3390/w11112210
Albert, J., & Rizzo, M. (2012). Exploratory data analysis. In J. Albert & M. Rizzo (Eds.), R by example: Concepts to code (pp. 133–151). Springer. https://doi.org/10.1007/978-1-4614-1365-3_5
Azad, A., Karami, H., Farzin, S., Saeedian, A., Kashi, H., & Sayyahi, F. (2018). Prediction of water quality parameters using ANFIS optimized by intelligence algorithms (Case study: Gorganrood River). KSCE Journal of Civil Engineering, 22(7), 2206–2213. https://doi.org/10.1007/s12205-017-1703-6
Brown, R. M., McClelland, N. I., Deininger, R. A., & O’Connor, M. F. (1972). A water quality index—Crashing the psychological barrier. In W. A. Thomas (Ed.), Indicators of environmental quality (pp. 173–182). Springer US. https://doi.org/10.1007/978-1-4684-2856-8_15
Bui, D. T., Khosravi, K., Tiefenbacher, J., Nguyen, H., & Kazakis, N. (2020). Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Science of The Total Environment, 721, 137612. https://doi.org/10.1016/j.scitotenv.2020.137612
Chen, S. S., Kimirei, I. A., Yu, C., Shen, Q., & Gao, Q. (2022). Assessment of urban river water pollution with urbanization in East Africa. Environmental Science and Pollution Research, 29(27), 40812–40825. https://doi.org/10.1007/s11356-021-18082-1
Hou, R., Lo, J. Y., Marks, J. R., Hwang, E. S., & Grimm, L. J. (2023). Classification performance bias between training and test sets in a limited mammography dataset (p. 2023.02.15.23285985). medRxiv. https://doi.org/10.1101/2023.02.15.23285985
Islam, M. J. (2024). A study on seasonal variations in water quality parameters of Dhaka rivers. Iranica Journal of Energy and Environment, 15(1), Article 1. https://doi.org/10.5829/IJEE.2024.15.01.09
Islam, Md. J., Abdullah, M. S., & Alam, M. (2024). Flooding crisis in Bangladesh: Urgent measures required. Biodiversity, 25(2), 95–98. https://doi.org/10.1080/14888386.2024.2330385
Juwana, I., Muttil, N., & Perera, B. J. C. (2016). Uncertainty and sensitivity analysis of West Java Water Sustainability Index – A case study on Citarum catchment in Indonesia. Ecological Indicators, 61, 170–178. https://doi.org/10.1016/j.ecolind.2015.08.034
Khan, I., Zakwan, M., & Mohanty, B. (2022). Water quality assessment for sustainable environmental management. ECS Transactions, 107(1), 10133. https://doi.org/10.1149/10701.10133ecst
Khoi, D. N., Quan, N. T., Linh, D. Q., Nhi, P. T. T., & Thuy, N. T. D. (2022). Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water, 14(10), 1552. https://doi.org/10.3390/w14101552
Kiliç, Z. (2020). The importance of water and conscious use of water. International Journal of Hydrology. https://doi.org/10.15406/ijh.2020.04.00250
Lamrini, M., Quevy, Q. A., Yassin Chkouri, M., & Touhafi, A. (2022). Data integrity analysis of water quality sensors and water quality assessment. IECON 2022 – 48th Annual Conference of the IEEE Industrial Electronics Society, 1–6. https://doi.org/10.1109/IECON49645.2022.9968643
Lap, B. Q., Phan, T.-T.-H., Nguyen, H. D., Quang, L. X., Hang, P. T., Phi, N. Q., Hoang, V. T., Linh, P. G., & Hang, B. T. T. (2023). Predicting water quality index (WQI) by feature selection and machine learning: A case study of An Kim Hai irrigation system. Ecological Informatics, 74, 101991. https://doi.org/10.1016/j.ecoinf.2023.101991
Lee, S. (2021). Water quality management. In S. Lee (Ed.), China’s water resources management: A long march to sustainability (pp. 191–228). Springer International Publishing. https://doi.org/10.1007/978-3-030-78779-0_6
Li, X., Ding, J., & Ilyas, N. (2021). Machine learning method for quick identification of water quality index (WQI) based on Sentinel-2 MSI data: Ebinur Lake case study. Water Supply, 21(3), 1291–1312. https://doi.org/10.2166/ws.2020.381
Ling, Q. (2023). Machine learning algorithms review. Applied and Computational Engineering, ACE, 4, 91–98. https://doi.org/10.54254/2755-2721/4/20230355
Mim, F. I., Islam, Md. J., & Abdullah, M. S. (n.d.). Plastic tsunami: Bangladesh’s maritime ecosystem under siege. Environmental Forensics, 0(0), 1–3. https://doi.org/10.1080/15275922.2024.2330026
Mogane, L. K., Masebe, T., Msagati, T. A. M., & Ncube, E. (2023). A comprehensive review of water quality indices for lotic and lentic ecosystems. Environmental Monitoring and Assessment, 195(8), 926. https://doi.org/10.1007/s10661-023-11512-2
Mueller, J., Varadharajan, C., Wu, Y., & Siirila-Woodburn, E. (2021). Machine learning to enable efficient uncertainty quantification, data assimilation, and informed data acquisition (AI4ESP1097). Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). https://doi.org/10.2172/1769743
Oreški, D., Pihir, I., & Višnjiu, D. (2023). Comparative analysis of machine learning algorithms on data sets of different characteristics for digital transformation. 2023 46th MIPRO ICT and Electronics Convention (MIPRO), 1428–1433. https://doi.org/10.23919/MIPRO57284.2023.10159910
Rahman, H., Easha, A. A., Fatema, N., Islam, Md. J., & Alam, M. (2024). Climate change adaptation strategy of the coastal indigenous community of Bangladesh. Advances in Civil Engineering, 2024(1), 5395870. https://doi.org/10.1155/2024/5395870
Ren, Z., & Du, C. (2023). A review of machine learning state-of-charge and state-of-health estimation algorithms for lithium-ion batteries. Energy Reports, 9, 2993–3021. https://doi.org/10.1016/j.egyr.2023.01.108
Rezaie-Balf, M., Attar, N. F., Mohammadzadeh, A., Murti, M. A., Ahmed, A. N., Fai, C. M., Nabipour, N., Alaghmand, S., & El-Shafie, A. (2020). Physicochemical parameters data assimilation for efficient improvement of water quality index prediction: Comparative assessment of a noise suppression hybridization approach. Journal of Cleaner Production, 271, 122576. https://doi.org/10.1016/j.jclepro.2020.122576
Schweitzer, R. W., Harvey, B., & Burt, M. (2020). Using innovative smart water management technologies to monitor water provision to refugees. Water International, 45(6), 651–659. https://doi.org/10.1080/02508060.2020.1786309
Shadabi, L., & Ward, F. A. (2022). Predictors of access to safe drinking water: Policy implications. Water Policy, 24(6), 1034–1060. https://doi.org/10.2166/wp.2022.037
Sillberg, C., Kullavanijaya, P., & Chavalparit, O. (2021). Water quality classification by integration of attribute-realization and support vector machine for the Chao Phraya River. Journal of Ecological Engineering, 22(9), 70–86. https://doi.org/10.12911/22998993/141364
Sutadian, A. D., Muttil, N., Yilmaz, A. G., & Perera, B. J. C. (2015). Development of river water quality indices—A review. Environmental Monitoring and Assessment, 188(1), 58. https://doi.org/10.1007/s10661-015-5050-0
Tabassum, S., Kotnala, C. B., Masih, R. K., Shuaib, M., Alam, S., & Alar, T. M. (2023). Performance analysis of machine learning techniques for predicting water quality index using physiochemical parameters. 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), 372–377. https://doi.org/10.1109/ICSCSS57650.2023.10169408
To, T. C. (2020). Water quality assessment of Saigon River for public water supply based on water quality index. Vietnam Journal of Science and Technology, 58(5A), 85. https://doi.org/10.15625/2525-2518/58/5A/15203
Uddin, M. G., Nash, S., Mahammad Diganta, M. T., Rahman, A., & Olbert, A. I. (2022). Robust machine learning algorithms for predicting coastal water quality index. Journal of Environmental Management, 321, 115923. https://doi.org/10.1016/j.jenvman.2022.115923
Uddin, Md. G., Nash, S., & Olbert, A. I. (2021). A review of water quality index models and their use for assessing surface water quality. Ecological Indicators, 122, 107218. https://doi.org/10.1016/j.ecolind.2020.107218
Wang, L., Zhu, Z., Sassoubre, L., Yu, G., Liao, C., Hu, Q., & Wang, Y. (2021). Improving the robustness of beach water quality modeling using an ensemble machine learning approach. Science of The Total Environment, 765, 142760. https://doi.org/10.1016/j.scitotenv.2020.142760
Yilma, M., Kiflie, Z., Windsperger, A., & Gessese, N. (2018). Application of artificial neural network in water quality index prediction: A case study in Little Akaki River, Addis Ababa, Ethiopia. Modeling Earth Systems and Environment, 4(1), 175–187. https://doi.org/10.1007/s40808-018-0437-x
Zhai, C., Sui, Y., & Wu, W. (2023). Machine learning-assisted correlations of heat/mass transfer and pressure drop of microchannel membrane-based desorber/absorber for compact absorption cycles. International Journal of Heat and Mass Transfer, 214, 124431. https://doi.org/10.1016/j.ijheatmasstransfer.2023.124431
Zhang, Y., Gao, X., Smith, K., Inial, G., Liu, S., Conil, L. B., & Pan, B. (2019). Integrating water quality and operation into prediction of water production in drinking water treatment plants by genetic algorithm enhanced artificial neural network. Water Research, 164, 114888. https://doi.org/10.1016/j.watres.2019.114888
View Dimensions
View Altmetric
Save
Citation
View
Share