Detecting Tax Evasion and Financial Crimes in The United States Using Advanced Data Mining Technique
Md Rakibul Haque Pranto1*, Ismoth Zerine2, Md Mainul Islam2, Morium Akter3, Tauhedur Rahman4
Business and Social Sciences 1 (1) 1-11 https://doi.org/10.25163/business.1110337
Submitted: 01 January 2023 Revised: 05 March 2023 Published: 07 March 2023
Abstract
Background: Financial crimes and tax evasion pose a major threat to the financial security and integrity of the United States. While regulatory mechanisms exist, detecting fraudulent activity within massive and complex financial datasets remains a significant challenge. Advanced data mining techniques offer the potential to improve the identification of irregularities and strengthen oversight systems. Methods: This study applied advanced data mining methods—including decision trees, neural networks, logistic regression, and K-means clustering—to identify patterns of financial crime and tax evasion. The dataset consisted of one million financial records obtained from publicly available IRS and SEC reports between 2015 and 2020. Stratified random sampling ensured broad representation across industries. Analytical approaches included logistic regression, K-means clustering, and correlation analysis, with performance evaluated through p-values, sensitivity, specificity, and area under the ROC curve (AUC). Results: The analysis showed that lower tax payments and higher anomaly scores were strong indicators of tax evasion. A significant positive correlation was observed between transaction volume and anomaly score (r = 0.52, p < 0.0001). Logistic regression identified tax paid, income, transaction volume, and anomaly score as significant predictors of tax evasion, achieving excellent classification accuracy with an AUC of 0.91. K-means clustering further classified entities into three risk categories, with the high-risk cluster exhibiting substantially higher values for income, transaction volume, and anomaly scores compared to other groups. Conclusion: The findings demonstrate the effectiveness of advanced data mining techniques for detecting tax evasion and financial crimes in large-scale datasets. By identifying key predictive indicators and stratifying risk, these methods show strong potential for integration into financial oversight systems. Their implementation could significantly enhance the capacity of regulatory frameworks to detect, prevent, and mitigate financial misconduct in real time.
Keywords: Tax evasion, financial crimes, data mining, logistic regression, anomaly detection.
References
Afriyie, J. K., Tawiah, K., Pels, W. A., Addai-Henne, S., Dwamena, H. A., Owiredu, E. O., ... & Eshun, J. (2023). A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions. Decision Analytics Journal, 6, 100163.
Alonge, E. O., Eyo-Udo, N. L., Ubanadu, B. C., Daraojimba, A. I., Balogun, E. D., & Ogunsola, K. O. (2021). Enhancing data security with machine learning: A study on fraud detection algorithms. Journal of Data Security and Fraud Prevention, 7(2), 105-118.
Ashtiani, M. N., & Raahemi, B. (2022). Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. Ieee Access, 10, 72504-72525.
Ashtiani, M. N., & Raahemi, B. (2023). News-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review. Expert Systems with Applications, 217, 119509.
Ball, R., Krüger, H., & Drevin, L. (2023). Anomaly detection using autoencoders with network analysis features. ORiON, 39(1), 1-44.
Belozyorov, S. A., & Sokolovska, O. V. (2018). Personal income taxation and income inequality in Asia-Pacific: a cross-country analysis. Journal of Tax Reform, 4(3), 236-249.
Botwright, R. (2023). Malware Analysis: Digital Forensics, Cybersecurity, And Incident Response. Rob Botwright.
Brun, J. P., Gomez, A., Julien, R., Ndubai, J., Owens, J., Rao, S., & Soto, Y. (2022). Taxing crime: a whole-of-government approach to fighting corruption, money laundering, and tax crimes. World Bank Publications.
Chan, H. F., Dulleck, U., Fooken, J., Moy, N., & Torgler, B. (2023). Cash and the hidden economy: Experimental evidence on fighting tax evasion in small business transactions. Journal of Business Ethics, 185(1), 89-114.
Claudine, M. (2020). Value added tax fraud detection using Naive Bayes Data Mining approach case study: Rwanda 2016-2019 (Doctoral dissertation, University of Rwanda).
De Cristofaro, J. (2023). Cluster Analysis of Financial Transaction Data (Doctoral dissertation, Politecnico di Torino).
De Roux, D., Perez, B., Moreno, A., Villamil, M. D. P., & Figueroa, C. (2018, July). Tax fraud detection for under-reporting declarations using an unsupervised machine learning approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 215-222).
Dong, D., & Zhang, J. (2023). Discrimination methods of mine inrush water source. Water, 15(18), 3237.
Elumilade, O. O., Ogundeji, I. A., Achumie, G. O., Omokhoa, H. E., & Omowole, B. M. (2021). Enhancing fraud detection and forensic auditing through data-driven techniques for financial integrity and security. Journal of Advanced Education and Sciences, 1(2), 55-63.
Filip, L. (2021). Tax evasion and financial fraud in the current digital context. The Annals of The University of Oradea. Economic Sciences.
Gaston, S. (2019). Producing race disparities: A study of drug arrests across place and race. Criminology, 57(3), 424-451.
Gavira-Durón, N., Gutierrez-Vargas, O., & Cruz-Aké, S. (2021). Markov chain K-means cluster models and their use for companies’ credit quality and default probability estimation. Mathematics, 9(8), 879.
González, F. A. I. (2020). Self-reported income data: are people telling the truth?. Journal of Financial Crime, 27(4), 1349-1359.
Jimoh, L. A. (2022). Forensic Accounting Techniques, Perceived Institutional Quality, Taxpayers' Behaviours and Tax Fraud Management in South West Nigeria (Doctoral dissertation, Kwara State University (Nigeria)).
Khadka, S. (2018). Contribution of Income Tax from Employment to Government Revenue (Doctoral dissertation, Tribhuvan University).
Kumsta, R., & Vivian, A. (2020). The financial strength anomaly in the UK: information uncertainty or liquidity?. The European Journal of Finance, 26(10), 925-957.
Massi, M. C., Ieva, F., & Lettieri, E. (2020). Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases. BMC medical informatics and decision making, 20(1), 160.
Mekonnen, E. (2021). Data Mining for Detection of Tax Evasion: The Case of Tax Payers in Addis Ababa (Doctoral dissertation, St. Mary’s University).
Moturi, C. A. (2019). Use Of Data Mining To Detect Fraud Health Insurance Claims (Doctoral dissertation, UoN).
Nicholls, J., Kuppa, A., & Le-Khac, N. A. (2021). Financial cybercrime: A comprehensive survey of deep learning approaches to tackle the evolving financial crime landscape. Ieee Access, 9, 163965-163986.
Oishi, S., Kushlev, K., & Schimmack, U. (2018). Progressive taxation, income inequality, and happiness. American psychologist, 73(2), 157.
Olayinka, O. H. (2019). Leveraging predictive analytics and machine learning for strategic business decision-making and competitive advantage. International Journal of Computer Applications Technology and Research, 8(12), 473-486.
Omolara, A. E., Jantan, A., Abiodun, O. I., Singh, M. M., Anbar, M., & Kemi, D. V. (2018). State-of-the-art in big data application techniques to financial crime: a survey. International Journal of Computer Science and Network Security, 18(7), 6-16.
Ozili, P. K. (2020). Tax evasion and financial instability. Journal of Financial Crime, 27(2), 531-539.
Pamisetty, V. (2019). Machine Learning Models for Real-Time Tax Fraud Detection and Risk Assessment in Digital Government Systems. Global Research Development (GRD) ISSN: 2455-5703, 4(12).
Pavlidis, G. (2023). Deploying artificial intelligence for anti-money laundering and asset recovery: the dawn of a new era. Journal of Money Laundering Control, 26(7), 155-166.
Plakandaras, V., Gogas, P., Papadimitriou, T., & Tsamardinos, I. (2022). Credit card fraud detection with automated machine learning systems. Applied Artificial Intelligence, 36(1), 2086354.
Popoola, N. T. (2023). Big data-driven financial fraud detection and anomaly detection systems for regulatory compliance and market stability. Int. J. Comput. Appl. Technol. Res, 12(09), 32-46.
Reurink, A. (2019). Financial fraud: A literature review. Contemporary topics in finance: A collection of literature surveys, 79-115.
SAMUEL, A. (2023). Enhancing financial fraud detection with AI and cloud-based big data analytics: Security implications. Available at SSRN 5273292.
Savic, M., Atanasijevic, J., Jakovetic, D., & Krejic, N. (2022). Tax evasion risk management using a Hybrid Unsupervised Outlier Detection method. Expert Systems with Applications, 193, 116409.
Sun, P., Doh, J. P., Rajwani, T., & Siegel, D. (2021). Navigating cross-border institutional complexity: A review and assessment of multinational nonmarket strategy research. Journal of International Business Studies, 52(9), 1818.
Thiprungsri, S. (2019). Cluster analysis for anomaly detection in accounting. In Rutgers Studies in Accounting Analytics: Audit Analytics in the Financial Industry (pp. 87-110). Emerald Publishing Limited.
van Brederode, R. F. (2019). Countermeasures to tax fraud, evasion and avoidance: A critical review. Ethics and Taxation, 323-358.
Wei, L., Li, G., Zhu, X., & Li, J. (2019). Discovering bank risk factors from financial statements based on a new semi-supervised text mining algorithm. Accounting & Finance, 59(3), 1519-1552.
Wu, C., & Dull, R. B. (2021). Accessing cloud data to expand research and analytical opportunities: An example using IRS/AWS data for nonprofit organizations. Journal of Emerging Technologies in Accounting, 18(2), 171-183.
Young, S. D. (2020). Financial statement fraud: motivation, methods, and detection. In Corporate fraud exposed: A comprehensive and holistic approach (pp. 321-339). Emerald Publishing Limited.