Bioinfo Chem

System biology and Infochemistry
1
Citations
9.9k
Views
11
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
REVIEWS   (Open Access)

Artificial Intelligence in Drug Discovery: Systematic Review and Meta-Analysis of Predictive Performance, Structural Modeling, and Translational Reliability

Abstract 1. Introduction 2.Materials and Methods 3. MLOps and Engineering Standards for Reproducibility 4. Results 5. Discussion 6. Limitations 7. Conclusion References

Shunqi Liu 1*, Han Qiu 2

+ Author Affiliations

Bioinfo Chem 7 (1) 1-8 https://doi.org/10.25163/bioinformatics.7110594

Submitted: 25 October 2025 Revised: 10 December 2025  Accepted: 18 December 2025  Published: 20 December 2025 


Abstract

Artificial intelligence (AI) has rapidly transformed drug discovery by accelerating the identification, optimization, and substantiation of therapeutic candidates. Advances in deep learning, protein structure modeling, molecular simulation, and reproductive models have created unprecedented opportunities to decode the complexity of biological systems and design novel compounds with improved safety and efficacy profiles. Despite significant progress, the extent to which AI raises predictive truth, reduces experimental burdens, and amends drug-likeness across diverse therapeutic domains remains incompletely understood. This systematic review and meta-analysis synthesize evidence from studies employing AI-based structural modeling, molecular property forecasting, virtual screening, and de novo drug design. Databases including PubMed, Scopus, Web of Science, and IEEE Xplore were searched for articles published between 2005 and 2024. Eligible studies assessed AI performance in tasks such as protein structure prognostication, drug-target interaction inference, constipation affinity prediction, ADMET modeling, or molecular generation. Meta-uninflected pooling of performance indicators such as AUROC, RMSE, precision, recall, and top-k hit pace was done utilizing random-effects models. Overall, AI methodology significantly outperformed traditional computational approaches, yielding higher prognosticative accuracy, reduced false-positive rates, and improved structural generalization across chemical space. Deep erudition architectures, particularly graph neural networks and transformer-based models, achieve the highest carrying out gains. However, heterogeneity arose from differences in datasets, model training strategies, and the lack of standardized benchmarks. This review highlights the strengths, limitations, and translational potential of AI-driven drug discovery and provides recommendations for improving reproducibility, validation practice sessions, and clinical relevance in future studies.

Keywords: artificial intelligence, drug discovery, deep learning, molecular modeling, virtual screening, ADMET prediction, protein structure, meta-analysis

References

Almagro Armenteros, J. J., Sønderby, C. K., Sønderby, S. K., Nielsen, H., & Winther, O. (2017). DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics, 33(21), 3387–3395. https://doi.org/10.1093/bioinformatics/btx431

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene Ontology: Tool for the unification of biology. Nature Genetics, 25(1), 25–29. https://doi.org/10.1038/75556

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. Wiley. https://doi.org/10.1002/9780470743386

Cao, Y., & Shen, Y. (2021). TALE: Transformer-based protein function annotation with joint sequence-label embedding. Bioinformatics, 37(17), 2825–2833. https://doi.org/10.1093/bioinformatics/btab198

DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188. https://doi.org/10.1016/0197-2456(86)90046-2

Dessimoz, C., & Škunca, N. (2017). The Gene Ontology handbook. Springer. https://doi.org/10.1007/978-1-4939-3743-1

Egger, M., Davey Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315(7109), 629–634. https://doi.org/10.1136/bmj.315.7109.629

Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., Steinegger, M., et al. (2021). ProtTrans: Towards cracking the language of life's code through self-supervised deep learning and high performance computing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 7112–7127. https://doi.org/10.1109/TPAMI.2021.3095381

Fan, K., Guan, Y., & Zhang, Y. (2020). Graph2GO: A multi-modal attributed network embedding method for inferring protein functions. GigaScience, 9(7), giaa081. https://doi.org/10.1093/gigascience/giaa081

Fatima Zohra Smaili, Xin Gao, Robert Hoehndorf, Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations, Bioinformatics, Volume 34, Issue 13, July 2018, Pages i52–i60, https://doi.org/10.1093/bioinformatics/bty259

Gillis, J., & Pavlidis, P. (2013). Characterizing the state of the art in computational gene function prediction. BMC Bioinformatics, 14(Suppl 3), S15. https://doi.org/10.1186/1471-2105-14-S3-S15

Gu, Z., Luo, X., Chen, J., Deng, M., & Lai, L. (2023). Hierarchical graph transformer with contrastive learning for protein function prediction. Bioinformatics, 39(7), btad410. https://doi.org/10.1093/bioinformatics/btad410

Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558. https://doi.org/10.1002/sim.1186

Higgins, J. P. T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M. J., & Welch, V. A. (2022). Cochrane handbook for systematic reviews of interventions (Version 6.3). Cochrane. http://www.training.cochrane.org/handbook

Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557–560. https://doi.org/10.1136/bmj.327.7414.557

Jiang, Y., Oron, T. R., Clark, W. T., Bankapur, A. R., D'Andrea, D., Lepore, R., Funk, C. S., Kahanda, I., Verspoor, K. M., Ben-Hur, A., Koo, D. C. E., Penfold-Brown, D., Shasha, D. E., Youngs, N., Bonneau, R., Lin, A., Sahraeian, S. M. E., Martelli, P. L., Profiti, G., … Radivojac, P. (2016). An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biology, 17, 184. https://doi.org/10.1186/s13059-016-1037-6

Jiao, P., Wang, B., Wang, X., Liu, B., Wang, Y., & Li, J. (2023). Struct2GO: Protein function prediction based on graph pooling algorithm and AlphaFold2 structure information. Bioinformatics, 39(10), btad637. https://doi.org/10.1093/bioinformatics/btad637

Kulmanov, M., & Hoehndorf, R. (2019). DeepGOPlus: Improved protein function prediction from sequence. Bioinformatics, 36(2), 422–429. https://doi.org/10.1093/bioinformatics/btz595

Kulmanov, M., Khan, M. A., & Hoehndorf, R. (2018). DeepGO: Predicting protein functions from sequence and interactions using deep ontology-aware classifiers. Bioinformatics, 34(4), 660–668. https://doi.org/10.1093/bioinformatics/btx624

Li, W., Wang, B., Dai, J., Kou, Y., Chen, X., Pan, Y., Hu, S., & Xu, Z. Z. (2024). Partial order relation-based gene ontology embedding improves protein function prediction. Briefings in Bioinformatics, 25(2), bbae077. https://doi.org/10.1093/bib/bbae077                       

Li, Y., Liu, S., Tong, R., Zhang, P., Bian, J., Wang, T., & Gu, P. (2025). Revolutionizing Healthcare: The Role of Artificial Intelligence in Drug Discovery and Delivery. Integrative Biomedical Research, 9(1), 1-8. https://doi.org/10.25163/biomedical.9110452

Setu, S. N., Amin, R. B., & Mia, R. (2025). Benchmarking the Omics Revolution: A Comprehensive Review of Methodological Consistency and Clinical Readiness. Journal of Precision Biosciences, 7(1), 1-11. https://doi.org/10.25163/biosciences.7110539

Liu, Y., Wang, B., Yan, B., Jiang, H., & Dai, Y. (2025). POSA-GO: Fusion of hierarchical gene ontology and protein language models for protein function prediction. International Journal of Molecular Sciences, 26(13), 6362. https://doi.org/10.3390/ijms26136362.             

Mao, Y., Xu, W., Shun, Y. et al. A multimodal model for protein function prediction. Sci Rep 15, 10465 (2025). https://doi.org/10.1038/s41598-025-94612-y

Mistry, J., Chuguransky, S., Williams, L., Qureshi, M., Salazar, G. A., Sonnhammer, E. L. L., Tosatto, S. C. E., Paladin, L., Raj, S., Richardson, L. J., Finn, R. D., & Bateman, A. (2021). Pfam: The protein families database in 2021. Nucleic Acids Research, 49(D1), D412–D419. https://doi.org/10.1093/nar/gkaa913

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., et al. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71

Pearson, W. R. (2013). An introduction to sequence similarity (“homology”) searching. Current Protocols in Bioinformatics, 42(1), 3.1.1–3.1.8. https://doi.org/10.1002/0471250953.bi0301s42

Radivojac, P., Clark, W. T., Oron, T. R., Schnoes, A. M., Wittkop, T., Sokolov, A., Graim, K., Funk, C., Verspoor, K., Ben-Hur, A., Pandey, G., Yunes, J. M., Talwalkar, A. S., Repo, S., Souza, M. L., Piovesan, D., Casadio, R., Wang, Z., Cheng, J., … Friedberg, I. (2013). A large-scale evaluation of computational protein function prediction. Nature Methods, 10(3), 221–227. https://doi.org/10.1038/nmeth.2340

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences of the United States of America, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118

Sillitoe, I., Dawson, N., Lewis, T. E., Das, S., Lees, J. G., Ashford, P., Tolulope, A., Scholes, H. M., Senatorov, I., Bujan, A., Ceballos Rodriguez-Conde, F., Dowling, B., Thornton, J. M., & Orengo, C. A. (2019). CATH: Expanding the horizons of structure-based functional annotations. Nucleic Acids Research, 47(D1), D280–D284. https://doi.org/10.1093/nar/gky1097

Smaili, F. Z., Gao, X., & Hoehndorf, R. (2019). OPA2Vec: Combining formal and informal content of biomedical ontologies for improved similarity-based predictions. Bioinformatics, 35(12), 2133–2140. https://doi.org/10.1093/bioinformatics/bty933

Thumuluri V, Almagro Armenteros JJ, Johansen AR, Nielsen H, Winther O. DeepLoc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic Acids Res. 2022 Jul 5;50(W1):W228-W234. doi: 10.1093/nar/gkac278. PMID: 35489069; PMCID: PMC9252801.          

Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., Yuan, D., Stroe, O., Wood, G., Laydon, A., Zidek, A., Green, T., Tunyasuvunakool, K., Petersen, S., Jumper, J., Clancy, E., Green, R., Vora, A., Luttrell, J., … Velankar, S. (2022). AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space. Nucleic Acids Research, 50(D1), D439–D444. https://doi.org/10.1093/nar/gkab1061

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 6000–6010).

You, R., Yao, S., Mamitsuka, H., & Zhu, S. (2021). DeepGraphGO: Graph neural network for large-scale, multispecies protein function prediction. Bioinformatics, 37(Supplement_1), i262–i271. https://doi.org/10.1093/bioinformatics/btab270

You, R., Yao, S., Xiong, T., Huang, X., Sun, F., & Mamitsuka, H. (2018). NetGO: Improving protein function prediction using large-scale protein–protein interaction data and deep learning. Bioinformatics, 34(18), 3119–3128. https://doi.org/10.1093/nar/gkz388

Zhou, N., Jiang, Y., Bergquist, T. R., Lee, A. J., Kacsoh, B. Z., Crocker, A. W., Lewis, K. A., Georghiou, G., Nguyen, H. N., Hamid, M. N., Davis, L., Dogan, T., Atalay, V., Rifaioglu, A. S., Dalkiran, A., Cetin-Atalay, R., Zhang, C., Hurto, R. L., Freddolino, P. L., … Radivojac, P. (2019). The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes. Genome Biology, 20, 244. https://doi.org/10.1186/s13059-019-1835-8


Article metrics
View details
0
Downloads
0
Citations
16
Views

View Dimensions


View Plumx


View Altmetric



0
Save
0
Citation
16
View
0
Share