Agriculture and food sciences
RESEARCH ARTICLE   (Open Access)

Machine Learning-Driven Water Quality Index Prediction: Enhancing Accuracy with Gradient Boosting and Explainable AI for Sustainable Water Monitoring

Md. Jahidul Islam1, Siraj Us Salekin2, Asif Anzum3, Nafis Zaman1, Abdullah Al Ahad Khan4, Dilip Sarkar5, Md. Liton Rabbani6, Md. Tarek Hossain6

+ Author Affiliations

Applied Agriculture Sciences 2(1) 1-14 https://doi.org/10.25163/agriculture.2110031

Submitted: 12 August 2024  Revised: 06 October 2024  Published: 07 October 2024 

Abstract

Background: Water is fundamental to the survival of all life forms, yet access to clean and safe water remains a critical challenge worldwide. Contaminated water is a significant contributor to waterborne diseases, highlighting the need for effective water quality monitoring. The Water Quality Index (WQI) is a standard tool for assessing water quality; however, traditional WQI methods are often constrained by inconsistencies, laboratory inaccuracies, and human error. Methods: This study aimed to overcome these limitations by integrating advanced machine learning (ML) techniques into WQI prediction. Physicochemical parameters, including pH, chloride (Cl), sulfate (SO4²), sodium (Na), potassium (K), calcium (Ca²), magnesium (Mg²), total hardness, and total dissolved solids, were collected from diverse water sources to form a robust dataset. ML algorithms such as Gradient Boosting, Random Forest, and XGBoost, augmented with explainable AI (XAI), were employed to enhance prediction accuracy. The dataset was split into training (70%), testing (15%), and validation (15%) subsets, and model performance was assessed using RMSE, MSE, MAE, and R² metrics. Results: Gradient Boosting outperformed other models, achieving 96% accuracy on the test dataset after fine-tuning. It demonstrated superior predictive capabilities, as evidenced by its performance metrics. These results indicate the potential for ML techniques to address the limitations of traditional WQI methods. Conclusion: This study demonstrates the effectiveness of ML-driven approaches in improving water quality assessments. The integration of Gradient Boosting and explainable AI provides a reliable framework for WQI prediction, enabling better decision-making in environmental health policies and water resource management. This approach offers a pathway to more efficient and accurate water quality monitoring systems.

Keywords: Water Quality Index (WQI), Water Quality Monitoring, Machine Learning Algorithms, Explainable AI (XAI), Predictive Modelling

References

Abdullah, M. S., Islam, M. J., Hasan, M. M., Sarkar, D., Rana, M. S., Das, S. S., & Hossian, M. (2024). Impact of waste management on infectious disease control: Evaluating strategies to mitigate dengue transmission and mosquito breeding sites – A systematic review. Journal of Angiotherapy, 8(8), 1–12. https://doi.org/10.25163/angiotherapy.889850

Agrawal, K. K., Panda, C., & Bhuyan, M. K. (2021). Impact of urbanization on water quality. In S. K. Acharya & D. P. Mishra (Eds.), Current advances in mechanical engineering (pp. 665–673). Springer. https://doi.org/10.1007/978-981-33-4795-3_60

Ahmed, M., Mumtaz, R., & Anwar, Z. (2022). An enhanced water quality index for water quality monitoring using remote sensing and machine learning. Applied Sciences, 12(24), Article 24. https://doi.org/10.3390/app122412787

Ahmed, U., Mumtaz, R., Anwar, H., Shah, A. A., Irfan, R., & García-Nieto, J. (2019). Efficient water quality prediction using supervised machine learning. Water, 11(11), 2210. https://doi.org/10.3390/w11112210

Albert, J., & Rizzo, M. (2012). Exploratory data analysis. In J. Albert & M. Rizzo (Eds.), R by example: Concepts to code (pp. 133–151). Springer. https://doi.org/10.1007/978-1-4614-1365-3_5

Azad, A., Karami, H., Farzin, S., Saeedian, A., Kashi, H., & Sayyahi, F. (2018). Prediction of water quality parameters using ANFIS optimized by intelligence algorithms (Case study: Gorganrood River). KSCE Journal of Civil Engineering, 22(7), 2206–2213. https://doi.org/10.1007/s12205-017-1703-6

Brown, R. M., McClelland, N. I., Deininger, R. A., & O’Connor, M. F. (1972). A water quality index—Crashing the psychological barrier. In W. A. Thomas (Ed.), Indicators of environmental quality (pp. 173–182). Springer US. https://doi.org/10.1007/978-1-4684-2856-8_15

Bui, D. T., Khosravi, K., Tiefenbacher, J., Nguyen, H., & Kazakis, N. (2020). Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Science of The Total Environment, 721, 137612. https://doi.org/10.1016/j.scitotenv.2020.137612

Chen, S. S., Kimirei, I. A., Yu, C., Shen, Q., & Gao, Q. (2022). Assessment of urban river water pollution with urbanization in East Africa. Environmental Science and Pollution Research, 29(27), 40812–40825. https://doi.org/10.1007/s11356-021-18082-1

Hou, R., Lo, J. Y., Marks, J. R., Hwang, E. S., & Grimm, L. J. (2023). Classification performance bias between training and test sets in a limited mammography dataset (p. 2023.02.15.23285985). medRxiv. https://doi.org/10.1101/2023.02.15.23285985

Islam, M. J. (2024). A study on seasonal variations in water quality parameters of Dhaka rivers. Iranica Journal of Energy and Environment, 15(1), Article 1. https://doi.org/10.5829/IJEE.2024.15.01.09

Islam, Md. J., Abdullah, M. S., & Alam, M. (2024). Flooding crisis in Bangladesh: Urgent measures required. Biodiversity, 25(2), 95–98. https://doi.org/10.1080/14888386.2024.2330385

Juwana, I., Muttil, N., & Perera, B. J. C. (2016). Uncertainty and sensitivity analysis of West Java Water Sustainability Index – A case study on Citarum catchment in Indonesia. Ecological Indicators, 61, 170–178. https://doi.org/10.1016/j.ecolind.2015.08.034

Khan, I., Zakwan, M., & Mohanty, B. (2022). Water quality assessment for sustainable environmental management. ECS Transactions, 107(1), 10133. https://doi.org/10.1149/10701.10133ecst

Khoi, D. N., Quan, N. T., Linh, D. Q., Nhi, P. T. T., & Thuy, N. T. D. (2022). Using machine learning models for predicting the water quality index in the La Buong River, Vietnam. Water, 14(10), 1552. https://doi.org/10.3390/w14101552

Kiliç, Z. (2020). The importance of water and conscious use of water. International Journal of Hydrology. https://doi.org/10.15406/ijh.2020.04.00250

Lamrini, M., Quevy, Q. A., Yassin Chkouri, M., & Touhafi, A. (2022). Data integrity analysis of water quality sensors and water quality assessment. IECON 2022 – 48th Annual Conference of the IEEE Industrial Electronics Society, 1–6. https://doi.org/10.1109/IECON49645.2022.9968643

Lap, B. Q., Phan, T.-T.-H., Nguyen, H. D., Quang, L. X., Hang, P. T., Phi, N. Q., Hoang, V. T., Linh, P. G., & Hang, B. T. T. (2023). Predicting water quality index (WQI) by feature selection and machine learning: A case study of An Kim Hai irrigation system. Ecological Informatics, 74, 101991. https://doi.org/10.1016/j.ecoinf.2023.101991

Lee, S. (2021). Water quality management. In S. Lee (Ed.), China’s water resources management: A long march to sustainability (pp. 191–228). Springer International Publishing. https://doi.org/10.1007/978-3-030-78779-0_6

Li, X., Ding, J., & Ilyas, N. (2021). Machine learning method for quick identification of water quality index (WQI) based on Sentinel-2 MSI data: Ebinur Lake case study. Water Supply, 21(3), 1291–1312. https://doi.org/10.2166/ws.2020.381

Ling, Q. (2023). Machine learning algorithms review. Applied and Computational Engineering, ACE, 4, 91–98. https://doi.org/10.54254/2755-2721/4/20230355

Mim, F. I., Islam, Md. J., & Abdullah, M. S. (n.d.). Plastic tsunami: Bangladesh’s maritime ecosystem under siege. Environmental Forensics, 0(0), 1–3. https://doi.org/10.1080/15275922.2024.2330026

Mogane, L. K., Masebe, T., Msagati, T. A. M., & Ncube, E. (2023). A comprehensive review of water quality indices for lotic and lentic ecosystems. Environmental Monitoring and Assessment, 195(8), 926. https://doi.org/10.1007/s10661-023-11512-2

Mueller, J., Varadharajan, C., Wu, Y., & Siirila-Woodburn, E. (2021). Machine learning to enable efficient uncertainty quantification, data assimilation, and informed data acquisition (AI4ESP1097). Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). https://doi.org/10.2172/1769743

Oreški, D., Pihir, I., & Višnjiu, D. (2023). Comparative analysis of machine learning algorithms on data sets of different characteristics for digital transformation. 2023 46th MIPRO ICT and Electronics Convention (MIPRO), 1428–1433. https://doi.org/10.23919/MIPRO57284.2023.10159910

Rahman, H., Easha, A. A., Fatema, N., Islam, Md. J., & Alam, M. (2024). Climate change adaptation strategy of the coastal indigenous community of Bangladesh. Advances in Civil Engineering, 2024(1), 5395870. https://doi.org/10.1155/2024/5395870

Ren, Z., & Du, C. (2023). A review of machine learning state-of-charge and state-of-health estimation algorithms for lithium-ion batteries. Energy Reports, 9, 2993–3021. https://doi.org/10.1016/j.egyr.2023.01.108

Rezaie-Balf, M., Attar, N. F., Mohammadzadeh, A., Murti, M. A., Ahmed, A. N., Fai, C. M., Nabipour, N., Alaghmand, S., & El-Shafie, A. (2020). Physicochemical parameters data assimilation for efficient improvement of water quality index prediction: Comparative assessment of a noise suppression hybridization approach. Journal of Cleaner Production, 271, 122576. https://doi.org/10.1016/j.jclepro.2020.122576

Schweitzer, R. W., Harvey, B., & Burt, M. (2020). Using innovative smart water management technologies to monitor water provision to refugees. Water International, 45(6), 651–659. https://doi.org/10.1080/02508060.2020.1786309

Shadabi, L., & Ward, F. A. (2022). Predictors of access to safe drinking water: Policy implications. Water Policy, 24(6), 1034–1060. https://doi.org/10.2166/wp.2022.037

Sillberg, C., Kullavanijaya, P., & Chavalparit, O. (2021). Water quality classification by integration of attribute-realization and support vector machine for the Chao Phraya River. Journal of Ecological Engineering, 22(9), 70–86. https://doi.org/10.12911/22998993/141364

Sutadian, A. D., Muttil, N., Yilmaz, A. G., & Perera, B. J. C. (2015). Development of river water quality indices—A review. Environmental Monitoring and Assessment, 188(1), 58. https://doi.org/10.1007/s10661-015-5050-0

Tabassum, S., Kotnala, C. B., Masih, R. K., Shuaib, M., Alam, S., & Alar, T. M. (2023). Performance analysis of machine learning techniques for predicting water quality index using physiochemical parameters. 2023 International Conference on Sustainable Computing and Smart Systems (ICSCSS), 372–377. https://doi.org/10.1109/ICSCSS57650.2023.10169408

To, T. C. (2020). Water quality assessment of Saigon River for public water supply based on water quality index. Vietnam Journal of Science and Technology, 58(5A), 85. https://doi.org/10.15625/2525-2518/58/5A/15203

Uddin, M. G., Nash, S., Mahammad Diganta, M. T., Rahman, A., & Olbert, A. I. (2022). Robust machine learning algorithms for predicting coastal water quality index. Journal of Environmental Management, 321, 115923. https://doi.org/10.1016/j.jenvman.2022.115923

Uddin, Md. G., Nash, S., & Olbert, A. I. (2021). A review of water quality index models and their use for assessing surface water quality. Ecological Indicators, 122, 107218. https://doi.org/10.1016/j.ecolind.2020.107218

Wang, L., Zhu, Z., Sassoubre, L., Yu, G., Liao, C., Hu, Q., & Wang, Y. (2021). Improving the robustness of beach water quality modeling using an ensemble machine learning approach. Science of The Total Environment, 765, 142760. https://doi.org/10.1016/j.scitotenv.2020.142760

Yilma, M., Kiflie, Z., Windsperger, A., & Gessese, N. (2018). Application of artificial neural network in water quality index prediction: A case study in Little Akaki River, Addis Ababa, Ethiopia. Modeling Earth Systems and Environment, 4(1), 175–187. https://doi.org/10.1007/s40808-018-0437-x

Zhai, C., Sui, Y., & Wu, W. (2023). Machine learning-assisted correlations of heat/mass transfer and pressure drop of microchannel membrane-based desorber/absorber for compact absorption cycles. International Journal of Heat and Mass Transfer, 214, 124431. https://doi.org/10.1016/j.ijheatmasstransfer.2023.124431

Zhang, Y., Gao, X., Smith, K., Inial, G., Liu, S., Conil, L. B., & Pan, B. (2019). Integrating water quality and operation into prediction of water production in drinking water treatment plants by genetic algorithm enhanced artificial neural network. Water Research, 164, 114888. https://doi.org/10.1016/j.watres.2019.114888

PDF
Full Text
Export Citation

View Dimensions


View Plumx



View Altmetric



13
Save
0
Citation
150
View
3
Share