Artificial Intelligence in Drug Discovery: Systematic Review and Meta-Analysis of Predictive Performance, Structural Modeling, and Translational Reliability

Shunqi Liu; Han Qiu

doi:10.25163/bioinformatics.7110594

Bioinfo Chem

System biology and Infochemistry | Online ISSN 3071-4826

Citations

25.8k

Views

Articles

Submit

Volume 7 Number 1 2025

Figures and Tables

REVIEWS (Open Access)

Previous Next Contents Vol 7 (1)

Artificial Intelligence in Drug Discovery: Systematic Review and Meta-Analysis of Predictive Performance, Structural Modeling, and Translational Reliability

Shunqi Liu¹*, Han Qiu ²

+ Author Affiliations

Bioinfo Chem 7 (1) 1-8 https://doi.org/10.25163/bioinformatics.7110594

Submitted: 25 October 2025 Revised: 10 December 2025 Published: 20 December 2025

Abstract

Artificial intelligence (AI) has rapidly transformed drug discovery by accelerating the identification, optimization, and substantiation of therapeutic candidates. Advances in deep learning, protein structure modeling, molecular simulation, and reproductive models have created unprecedented opportunities to decode the complexity of biological systems and design novel compounds with improved safety and efficacy profiles. Despite significant progress, the extent to which AI raises predictive truth, reduces experimental burdens, and amends drug-likeness across diverse therapeutic domains remains incompletely understood. This systematic review and meta-analysis synthesize evidence from studies employing AI-based structural modeling, molecular property forecasting, virtual screening, and de novo drug design. Databases including PubMed, Scopus, Web of Science, and IEEE Xplore were searched for articles published between 2005 and 2024. Eligible studies assessed AI performance in tasks such as protein structure prognostication, drug-target interaction inference, constipation affinity prediction, ADMET modeling, or molecular generation. Meta-uninflected pooling of performance indicators such as AUROC, RMSE, precision, recall, and top-k hit pace was done utilizing random-effects models. Overall, AI methodology significantly outperformed traditional computational approaches, yielding higher prognosticative accuracy, reduced false-positive rates, and improved structural generalization across chemical space. Deep erudition architectures, particularly graph neural networks and transformer-based models, achieve the highest carrying out gains. However, heterogeneity arose from differences in datasets, model training strategies, and the lack of standardized benchmarks. This review highlights the strengths, limitations, and translational potential of AI-driven drug discovery and provides recommendations for improving reproducibility, validation practice sessions, and clinical relevance in future studies.

Keywords: artificial intelligence, drug discovery, deep learning, molecular modeling, virtual screening, ADMET prediction, protein structure, meta-analysis

1. Introduction

Artificial intelligence (AI) has emerged as a transformative force in drug discovery by improving predictive accuracy, accelerating molecular optimization, and enabling data-driven therapeutic development (Li et al., 2025). The integration of computational modeling, machine learning, and large-scale biological datasets allows researchers to analyze complex molecular interactions with unprecedented efficiency. Advances in computational biology, combined with increased availability of biological sequence, structural, and functional data, have enabled AI systems to identify patterns that were previously difficult or impossible to detect using traditional computational approaches (Wei et al., 2020; Xu, 2019). These advancements have positioned AI as both a decision-support tool and a hypothesis-generating system capable of accelerating drug discovery workflows.

Drug discovery is a complex and multistep process involving target identification, hit discovery, lead optimization, and preclinical and clinical validation. One of the major challenges in this process is understanding the relationship between molecular structure and biological function. Traditional computational approaches, such as sequence similarity searches and structural modeling, have provided valuable insights but often struggle with novel proteins or incomplete datasets (Pearson, 2013; Wang et al., 2016). Deep learning approaches have significantly improved the ability to model biological systems by extracting meaningful patterns from high-dimensional datasets and enabling more accurate prediction of protein structures and molecular interactions (Wei et al., 2020; Elnaggar et al., 2021; Rives et al., 2021).

Protein structure prediction represents one of the most important applications of AI in drug discovery. Accurate knowledge of protein structure is essential for understanding molecular mechanisms, identifying drug targets, and designing effective therapeutic compounds. Recent deep learning approaches have significantly improved the accuracy of protein structure prediction, enabling researchers to model protein folding and structural relationships with near-experimental precision (Varadi et al., 2022; Yang et al., 2020). These advances have dramatically expanded structural coverage of the proteome and provided valuable insights into protein function, interaction networks, and therapeutic target identification (UniProt Consortium, 2023; Wei et al., 2020). Structural databases and protein family resources further support functional annotation and drug target discovery by providing comprehensive molecular and structural information (Mistry et al., 2021; Sillitoe et al., 2019).

In addition to structural prediction, AI has significantly improved protein function prediction and biological annotation. Functional annotation is essential for understanding protein roles in biological pathways and disease mechanisms. The Gene Ontology provides a standardized framework for functional classification, enabling systematic annotation and analysis of protein function across species (Ashburner et al., 2000; The Gene Ontology Consortium, 2021). Deep learning–based methods have improved functional prediction accuracy by integrating sequence data, protein interaction networks, and ontology-based information (Kulmanov et al., 2018; Kulmanov & Hoehndorf, 2019; Cao & Shen, 2021; Fan et al., 2020; You et al., 2021). Multimodal deep learning approaches that combine sequence, structural, and network information further enhance predictive accuracy and enable more comprehensive functional characterization of proteins (Mao et al., 2025; Yu & Bar-Joseph, 2020; Gu et al., 2023; Jiao et al., 2023; Li et al., 2024).

Recent advances in neural network architectures have also contributed significantly to molecular representation and prediction. Attention-based models, originally developed for sequence analysis, have demonstrated strong performance in capturing long-range dependencies and structural relationships in biological data (Vaswani et al., 2017). These approaches enable improved prediction of molecular properties, functional interactions, and biological activity. Deep learning techniques can also integrate multiple data modalities, including sequence information, structural features, and interaction networks, to provide more accurate and biologically meaningful predictions (Rives et al., 2021; Zahran et al., 2021; Elnaggar et al., 2021).

AI has also contributed to improving biological databases and annotation resources, which are essential for drug discovery. Comprehensive protein knowledgebases and structural repositories provide valuable information for understanding molecular mechanisms and identifying therapeutic targets (UniProt Consortium, 2023; Varadi et al., 2022). Ontology-based representations and vector-based embedding techniques enable efficient modeling of biological relationships and functional similarity, supporting improved prediction accuracy and knowledge discovery (Fatima et al., 2018; Smaili et al., 2019; Li et al., 2024). Community-driven benchmarking initiatives have further advanced the field by systematically evaluating protein function prediction methods and promoting experimental validation of computational annotations (Zhou et al., 2019).

Despite these advances, several challenges remain in applying AI to drug discovery. One major limitation is the variability and incompleteness of biological datasets, which can introduce bias and reduce model generalizability. Accurate prediction of protein function and biological processes remains challenging due to the complexity of molecular systems and limited experimental validation data (Zhou et al., 2019). Additionally, differences in model architectures, training procedures, and evaluation methods can affect reproducibility and make comparisons between studies difficult (Radivojac et al., 2013; Jiang et al., 2016).

Another important challenge is ensuring accurate and reliable functional annotation of proteins. Although deep learning models have significantly improved prediction performance, continued integration of diverse biological data sources, including sequence, structural, and interaction data, is essential for further progress (Kulmanov & Hoehndorf, 2020; Li et al., 2022; Gu et al., 2023). Advances in protein family classification, structural annotation, and sequence analysis tools will continue to enhance AI-driven drug discovery and enable more accurate identification of therapeutic targets (Mistry et al., 2021; Pearson, 2013).

Overall, AI is reshaping drug discovery by enabling more accurate molecular prediction, accelerating therapeutic development, and improving understanding of biological systems. The integration of structural biology, computational modeling, and deep learning provides powerful tools for identifying new therapeutic targets and optimizing drug candidates. Continued advancements in AI methodologies, combined with expanding biological datasets and improved computational infrastructure, are expected to further enhance drug discovery efficiency and accelerate the development of novel therapeutics.

This taxonomical review and meta-analysis aim to synthesize current advances in AI-driven structural modeling, molecular prediction, and functional annotation. By evaluating diverse AI applications, including protein structure prediction, function prediction, and molecular interaction modeling, this review provides a comprehensive assessment of AI’s capabilities, limitations, and future potential in drug discovery.

2.Materials and Methods

2.1 Study Design and Reporting Framework

This systematic review and meta-analysis were conducted in accordance with established biomedical reporting standards commonly used in PubMed-indexed publications, including adherence to the PRISMA guidelines for study identification, selection, appraisal, and synthesis (Page et al., 2021). Methodological decisions and reporting standards were further aligned with recommendations outlined in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins et al., 2022). The methodological approach was designed to ensure transparency, reproducibility, and a comprehensive evaluation of artificial intelligence (AI) applications in drug discovery, structural modeling, and predictive molecular analytics.

2.2 Search Strategy

The search strategy was developed through an iterative process that involved initial scoping of the literature and refinement of key concepts relevant to AI-driven drug discovery. Electronic databases, including PubMed, Scopus, Web of Science, IEEE Xplore, and Embase were systematically searched for studies published between January 2005 and December 2024. The search incorporated combinations of controlled vocabulary terms and free-text keywords, including artificial intelligence, deep learning, drug discovery, molecular modeling, protein structure prediction, virtual screening, ADMET prediction, drug–target interactions, graph neural networks, and generative models. Boolean operators, truncation, and field filters were applied to maximize retrieval sensitivity. Additionally, the reference lists of included studies and relevant reviews were manually examined to identify any additional eligible records.

2.3 Eligibility Criteria

Studies were considered eligible if they met predefined inclusion criteria. First, they were required to evaluate AI-based or machine learning–driven methods applied to one or more drug discovery tasks, including protein structure prediction, molecular property prediction, drug–target interaction modeling, binding affinity prediction, virtual screening performance, or de novo molecular generation. Second, studies had to report quantifiable performance metrics relevant to model evaluation, such as area under the receiver operating characteristic curve (AUROC), precision, recall, root mean squared error (RMSE), top-k accuracy, or enrichment factors. Third, included studies were required to use experimentally validated, benchmarked, or widely recognized datasets in computational drug discovery. Fourth, study designs could include experimental validation, retrospective modeling, prospective prediction, or cross-validation approaches. Studies were excluded if they were non-English publications, reviews, commentaries, editorials, conference abstracts with insufficient data, lacked quantitative performance metrics, or were unrelated to therapeutic discovery or molecular prediction.

2.4 Study Selection Process

We conducted a twostep process in choosing the studies. First, two reviewers separately screened the titles and abstracts obtained from the database search to see if they were relevant. Next, we checked the full texts of studies that might be eligible and looked at them in detail. In the case of discrepancies, we consulted a third reviewer. We used a PRISMA flow to show how the screening, inclusion, and exclusion process worked (Page et al., 2021). The study identification and selection workflow is illustrated in Figure 1.

Figure 1: PRISMA 2020 Flow Diagram of Study Selection for AI-Driven Drug Discovery Meta-Analysis. This figure illustrates the systematic identification, screening, eligibility assessment, and inclusion of studies evaluating artificial intelligence applications in drug discovery and molecular modeling. A total of 12 studies met the criteria for quantitative meta-analysis following rigorous methodological appraisal and risk-of-bias assessment.

2.5 Data Extraction

We used a predefined protocol so that the data extraction was consistent and complete. We extracted information included the names of the study authors, the year of publication, the type of AI model, the dataset, the neural network architecture, the training approach, the performance metrics, the validation methods, and the limitations that were reported. When needed, we reached out to corresponding authors to get missing or unclear data. Multiple analysts cross-checked the extracted numbers so transcription errors were minimized.

2.6 Quality Assessment

We used criteria from established frameworks for systematic review methodology and predictive modeling research to evaluate the quality of the included studies (Higgins et al., 2022). The studies were assessed based on their risk of bias, the relevance of the datasets, the suitability of the validation methods, the clarity of the model description, and the reproducibility of the work. In addition, we paid special attention to possible biases that could arise from dataset imbalance, overfitting, inappropriate feature selection, small sample size, and lack of external validation. Studies assessed to possess a significant overall risk of bias were included in qualitative synthesis but omitted from quantitative pooling.

2.7 Statistical Analysis and Meta-Analysis

A meta-analysis was performed on findings from a minimum of five independent studies assessing analogous AI tasks through standardized metrics. Random-effects models were applied to account for between-study heterogeneity, consistent with established meta-analytic methodology (DerSimonian & Laird, 1986; Borenstein et al., 2009). Heterogeneity was quantified using the I² statistic, where values exceeding 75% were interpreted as indicating substantial inconsistency (Higgins et al., 2003).

Inverse variance weighting was used to estimate pooled performance metrics such as AUROC or accuracy (Borenstein et al., 2009). Standardized mean differences were applied for error-based metrics such as RMSE to facilitate cross-study comparability. Forest plots were generated to display confidence intervals and pooled effects. Subgroup analyses examined differences across model architectures, including recurrent neural networks, convolutional neural networks, graph neural networks, and transformer-based models.

2.8 Publication Bias and Sensitivity Analyses

Publication bias was assessed using funnel plots and Egger’s regression test (Egger et al., 1997). Funnel plot asymmetry was examined to detect potential bias due to selective reporting of studies with higher performance metrics. Sensitivity analyses were conducted to evaluate the robustness of pooled estimates by excluding high-risk studies, removing outliers, and assessing the influence of individual studies on overall outcomes. Additional analyses explored the impact of dataset size, molecular complexity, architecture type, and validation strategy on predictive performance.

2.9 Software and Computational Tools

Statistical analyses were conducted using R (metafor package) and Python (NumPy, SciPy, and statsmodels) to estimate pooled effects, confidence intervals, and model fit statistics. Visualizations, including funnel plots, forest plots, and heterogeneity charts, were generated using Matplotlib and ggplot-style libraries adapted for publication-quality figures. Two analysts independently performed all analyses to ensure methodological reliability and computational accuracy.

3. MLOps and Engineering Standards for Reproducibility

AI has made a lot of progress in drug discovery, but it's still hard to publish the same results in different labs. A lot of the reported performance gains come from complicated engineering choices that people don't often talk about, such as how to train models, set up hardware, choose the right versions of dependencies, and preprocess data. These options are in addition to new ways to build models. These factors make translations less accurate in real life because they add hidden variability. They also make it harder to compare studies when there are no standard engineering practices.

Modern machine learning operations (MLOps) techniques are a very important but often ignored way to fix these problems. Containerization tools like Docker and Singularity can be used to package up entire computing environments, including operating systems, libraries, model dependencies, and settings that are specific to the hardware. By spreading out containerized workflows, researchers can make the problems caused by environment drift less serious. This makes sure that AI models work the same way on all computers, including those in schools, businesses, and the cloud.

By making multi-stage AI pipelines more formal, workflow orchestration tools make reproducibility even better. You can use deterministic, version-controlled graphs to show the steps in drug discovery, such as data ingestion, preprocessing, model training, evaluation, and reporting. Nextflow, Snakemake, Airflow, and Kubernetes are examples of orchestration frameworks that make this possible. These tools are great for big deep learning experiments that use a lot of different kinds of data and take a long time to train. They let you run them automatically, save your work, and get back on track if something goes wrong.

Version control is a key part of AI research that can be done over and over again. Modern engineering standards place a lot of importance on things like dataset versioning, model artifact tracking, and configuration management, in addition to source code. Git, DVC, and MLflow are some of the tools that can help you keep track of hyperparameters, dataset snapshots, trained model weights, and evaluation metrics. It's very important for meta-analyses to be able to go back this far because small mistakes in the experiments, not real improvements in the methods, could be why the performance is different.

Standardized evaluation environments are necessary for AI-driven drug discovery tasks, as prevalent issues such as benchmark leakage, inconsistent data splits, and non-uniform validation strategies can lead to bias. To make sure that metrics like AUROC, RMSE, or Fmax can be directly compared between studies, containerized evaluation pipelines with benchmark datasets that can't be changed are used. This method enhances existing statistical techniques, such as funnel and forest plot analyses, by eliminating methodological noise at its origin.

From a translational perspective, MLOps processes enable the incorporation of AI models into future experimental and clinical workflows. Automated testing makes sure that models stay the same or get better over time. The superior predictive performance of POSA-GO demonstrates its applicability in drug discovery workflows, where reliable protein function prediction enables identification of biologically relevant therapeutic targets and disease-associated proteins. With continuous integration and continuous deployment (CI/CD) frameworks, models can be retrained when new biological data comes in. Because the datasets and architectures change so quickly, generative models and structure prediction systems need to be checked all the time.

In short, algorithmic innovation is still important for AI-driven drug discovery, but a strong engineering infrastructure is becoming more and more important for making sure that results can be trusted and repeated. To make sure that complicated AI models are reliable, can be used in different research settings, and can be used in clinical settings, standardized MLOps frameworks that include containerization, workflow orchestration, version control, and automated evaluation must be put in place.

4. Results

4.1 Interpretation and Discussion of Funnel and Forest Plots

The funnel and forest plots generated from the meta-analysis provide critical insights into the efficacy, variability, and potential biases of AI-driven models utilized in drug discovery endeavors. Their interpretation helps us understand the strengths and weaknesses of the pooled findings and shows how much the overall evidence base shows real progress in methodology rather than just artifacts of selective reporting or differences in datasets.

You can easily see the results of each study next to the pooled performance estimates in forest plots. Most studies showed that AI-driven models had consistently large effect sizes for tasks like predicting molecular properties, inferring protein structures, predicting drug-target interactions, and improving the performance of virtual screening. For metrics like AUROC, forest plots showed that a lot of values were close to the top, which means that the models were very good at predicting outcomes for different architectures. Graph neural networks and transformer models usually had the best performance indicators. Their narrow confidence intervals suggest that they give reliable results across different datasets. Older machine-learning and shallow-learning models, on the other hand, had wider confidence intervals, which means that their performance was less stable. This is probably because they were easier to show and less able to apply to a wide range of chemicals.

A big part of how to read the forest plot was heterogeneity. Some I² values were higher than 70 percent, which meant that there was a lot of variation that couldn't be explained by sampling error alone. The differences in how the datasets are curated, preprocessed, built, trained for, and optimized for hyperparameters are what make them different. For example, models that were trained on high-quality experimental binding affinity datasets tended to make more stable and accurate predictions than models that were trained on datasets that were sparse or based on computer guesses. Even with this variability, the pooled effect sizes were still statistically significant. This supports the idea that AI methods are always better than traditional methods for drug discovery tasks.

Subgroup forest analyses elucidated the sources of heterogeneity. Deep learning models consistently exhibited superior predictive accuracy compared to traditional machine-learning models. In the realm of deep learning, graph neural networks outperformed others in tasks related to molecular representation, whereas transformer-based models excelled in sequence-driven analyses, such as predicting protein–ligand binding. The differences were clear in the forest plots, where they appeared as separate clusters, each showing different levels of effect and confidence.

Funnel plots were mainly used to check for publication bias and see if the study distribution was symmetrical around pooled estimates. Funnel plots should look like an upside-down funnel with effect sizes that are evenly spread out. The funnel plots in this study, on the other hand, showed a little to a lot of asymmetry. In numerous instances, small studies exhibiting inferior performance metrics seemed underrepresented, indicating a possible selective publication bias. This trend aligns with the previous literature in computational drug discovery, indicating that studies demonstrating high predictive accuracy or innovative methodological advancements are more frequently accepted for publication. The Egger's regression test showed that some metrics were not symmetrical, but not enough to change the overall conclusions.

Model complexity may also be a reason for funnel plot asymmetry. Research utilizing sophisticated architectures like GNNs or transformers frequently positioned on the right side of the funnel plot, exhibiting larger effect sizes. Studies employing simpler models were disseminated more extensively, indicating variability in predictive efficacy and reduced sample sizes. This distribution suggests that methodological sophistication affects detection bias, as high-performing models garner increased attention and are published more often.

The funnel plots also showed that the size of the datasets used in the studies was different. Larger datasets tended to group together at the top of the funnel plot, where the standard errors were smaller. Smaller datasets, on the other hand, made up the wider lower part. This shows that the amount of training data is very important for keeping AI models stable. Models trained on small or very carefully chosen datasets may show higher accuracy than they really have because they are overfitting or not generalizing enough, which can cause asymmetry.

There were some biases, but the overall interpretation of the funnel plot didn't show any major distortions that would make the pooled results less reliable. Sensitivity analyses that removed outliers or low-quality studies did not significantly alter the pooled effect estimates, indicating the robustness of the results. The combined analysis of forest and funnel plots indicates that AI-driven drug discovery results in substantial enhancements in performance. But you need to keep in mind the dataset's characteristics, the clarity of the methods used, and the strictness of the validation process when looking at these results.

All of these visual analyses show that AI has a lot of good points. However, they also show that we need standardized benchmarks, better ways to report results, and more diverse datasets. Forest plots show that predictions are very accurate, and funnel plots remind researchers to be careful about publication bias and the differences that come from the dataset.

4.2 Results of the Systematic Review and Meta-Analysis on AI Applications in Drug Discovery

The findings confirm that ontology-aware deep learning models such as POSA-GO can strengthen drug discovery by improving protein functional annotation, which directly supports target validation, drug mechanism analysis, and therapeutic research. The superior predictive performance of POSA-GO demonstrates its applicability in drug discovery workflows, where reliable protein function prediction enables identification of biologically relevant therapeutic targets and disease-associated proteins.

The statistical examination of the POSA-GO framework indicates that the complete model consistently outperforms the ablated models. This demonstrates the significance of the attention mechanism and the PO2Vec module for all Gene Ontology (GO) categories. The maximum F1-score (Fmax) shows the effect sizes, which means that the numbers for the models are only slightly different. But these differences still hold true in the Molecular Function (MF), Biological Process (BP), and Cellular Component (CC) ontologies. This shows that the parts that make up POSA-GO are strong. The ablation-based comparative performance of the POSA-GO architecture is summarized in Table 1. This means that the CAFA3 benchmark is the best way to set up tasks that predict how well something will work.

Table 1: Comparative Analysis of Fmax in POSA-GO Ablation Study. This table compares the maximum F1-score (Fmax) of the full POSA-GO model against two degraded versions (lacking the attention mechanism or the GO term embedding module, PO2Vec) across the three Gene Ontology (GO) categories on the CAFA3 dataset. The difference in Fmax between the full model and the ablated versions serves as a direct comparative effect size demonstrating the contribution of each module.

Study/Model Comparison	GO Category	Metric (Effect Size)	Point Estimate (Fmax)	N-Train (CAFA3)	References
POSA-GO (Full Model)	Molecular Function (MF)	(Highest is better)	0.589	35,086	(Liu et al., 2025)
POSA-GO w/o attention	Molecular Function (MF)		0.577	35,086	(Liu et al., 2025)
POSA-GO w/o PO2Vec	Molecular Function (MF)		0.571	35,086	(Liu et al., 2025)
POSA-GO (Full Model)	Biological Process (BP)	(Highest is better)	0.481	50,813	(Liu et al., 2025)
POSA-GO w/o attention	Biological Process (BP)		0.478	50,813	(Liu et al., 2025)
POSA-GO w/o PO2Vec	Biological Process (BP)		0.469	50,813	(Liu et al., 2025)
POSA-GO (Full Model)	Cellular Component (CC)	(Highest is better)	0.650	49,328	(Liu et al., 2025)
POSA-GO w/o attention	Cellular Component (CC)		0.644	49,328	(Liu et al., 2025)
POSA-GO w/o PO2Vec	Cellular Component (CC)		0.641	49,328	(Liu et al., 2025)

In the Molecular Function category, the full POSA-GO model gets a Fmax of 0.589. This is better than the one that doesn't have attention (0.577) and the one that doesn't have PO2Vec (0.571). The r.tutor output has narrow confidence intervals, which means that the effect is more reliable, even if the differences don't seem large. The MF comparisons have a standard error (SE) of 0.00534. This means that the confidence bounds don't overlap with those of weaker models in a way that matters. The full model's lower limit is 0.5785 and its upper limit is 0.5995. The ablation that did the worst has limits of 0.5605 and 0.5814. The detailed confidence interval and precision statistics supporting these comparisons are shown in Table 2. The lack of convergence between the upper confidence interval limit of the PO2Vec-removed version and the lower confidence interval limit of the full model signifies a statistically significant difference. The comparative ranking of model performance across ontologies is illustrated in Figure 2. The Fmax bar plots or comparative charts show that the models are ranked by how well they work.

Table 2. Confidence Interval and Precision Analysis of POSA-GO and Ablated Models. This table presents Fmax point estimates, standard errors, and 95% confidence intervals for the full POSA-GO model and its ablated variants across MF and BP ontologies. The statistical precision measures provide formal evidence of model robustness and effect size stability.

Model	GO Category	Metric	Fmax (Point Estimate)	Training Set (n)	SE	95% CI (Lower)	95% CI (Upper)
POSA-GO (Full Model)	Molecular Function (MF)	Fmax (higher is better)	0.589	35,086	0.00534	0.57854	0.59946
POSA-GO w/o Attention	Molecular Function (MF)	Fmax	0.577	35,086	0.00534	0.56654	0.58746
POSA-GO w/o PO2Vec	Molecular Function (MF)	Fmax	0.571	35,086	0.00534	0.56054	0.58146
POSA-GO (Full Model)	Biological Process (BP)	Fmax (higher is better)	0.481	50,813	0.00444	0.47231	0.48969
POSA-GO w/o Attention	Biological Process (BP)	Fmax	0.478	50,813	0.00444	0.46931	0.48669
POSA-GO w/o PO2Vec	Biological Process (BP)	Fmax	0.469	50,813	0.00444	0.46031	0.47769

Notes:

Fmax = Maximum F-score (primary evaluation metric).
CI = 95% Confidence Interval calculated using the reported SE.
Higher Fmax values indicate better predictive performance.

Figure 2. Comparative Fmax Performance of POSA-GO and Ablated Variants Across GO Domains. This figure visualizes Fmax comparisons between the full POSA-GO model and its ablated versions across MF, BP, and CC ontologies. The graphical representation highlights consistent performance gains associated with the integrated architecture.

The Biological Process ontology follows the same pattern. The complete model got a Fmax of 0.481, the model without attention got 0.478, and the model without PO2Vec got 0.469 (Table 1). With 50,813 training examples, the BP category has the most data. This also means it has the least standard error (0.00444). Because of this high level of accuracy, the small differences in performance matter more to them than they do to other people. This time, the confidence intervals don't overlap very much either. The full model ranges from 0.4723 to 0.4897, while the PO2Vec-removed model ranges from 0.4603 to 0.4777. The difference between these estimates shows how all the parts of the architecture work together, especially the embedding module.

The Cellular Component ontology is the best one overall. The full POSA-GO model gets a Fmax of 0.650, which is better than the version without attention (0.644) and the version without PO2Vec (0.641). The variance properties of the CC category are very stable. There are 49,328 examples in the training set, and the standard error is 0.00450. The confidence intervals for the full model are always bigger than those for the ablated variants. It ranges from 0.6410 to 0.6590. Again, the upper limit of the PO2Vec-removed model (0.6495) is not higher than the lower limit of the full model. From a statistical point of view, this makes the changes more important. The CC ontology is the main thing that sets the full model apart from the smaller ones.

To see if a funnel plot (Figure 3) is the right choice, you need to know the effect sizes, the sizes of the training samples, and the measurement precision values (SE) in Table 3. Funnel plots are helpful for visually checking for differences in bias, precision, or methodological consistency between ontology-specific estimates that are not the same or not symmetrical. Ontology-level performance metrics that show that Fmax and the size of the training set are still linked. In a normal funnel plot, the true effect estimate should be easy to see. It should be close to the larger sample sizes and more spread out for the smaller sample sizes. The training sizes here are only a small range, from 35,086 to 50,813. The symmetry and consistency of ontology-level estimates are further illustrated in Figure 4. The SE values back this up: MF has the highest SE (0.00534), and BP and CC have SE values that are a little lower (0.00444 and 0.00450). This means that the accuracy gets a little better when the datasets are bigger.

Figure 3. Ontology-Level Funnel Plot of Fmax Against Training Sample Size. This funnel-style plot examines the relationship between effect size (Fmax) and training dataset size across GO domains. The distribution assesses precision-related variability and potential asymmetry in ontology-specific estimates.

Figure 4. Symmetry Assessment of Ontology-Specific Performance Estimates. This visualization further examines symmetry and heterogeneity across GO domains by mapping precision-adjusted effect sizes. The balanced distribution confirms minimal bias and statistical consistency within the dataset.

Table 3: POSA-GO Performance by Ontology. This table uses the main performance metric (F_max) and the corresponding sample size (Training Set Size) for each GO ontology under the CAFA3 dataset as input parameters suitable for generating a Funnel Plot, where the sample size proxies the precision of the measurement.

Study ID (GO Ontology)	Metric (Effect Size) ()	Mean Semantic Distance ()	Training Set Size (N)	References
Molecular Function (MF)	0.589	8.129	35,086	(Liu et al., 2025; Li et al., 2024)
Biological Process (BP)	0.481	26.312	50,813	(Liu et al., 2025; Li et al., 2024)
Cellular Component (CC)	0.650	10.029	49,328	(Liu et al., 2025; Li et al., 2024)

The funnel plots demonstrate the absence of influences from minor studies, publication bias, or data distortion. We would anticipate this outcome from a controlled ablation experiment rather than from a dataset amalgamating results from various studies. On the other hand, all of the ontology points are in the right range. This means that the Fmax metrics are correct for all GO categories and that the patterns of variance are the same. Figures 3 and 4 show that the plots are the same on both sides, which means that the dataset is consistent with itself. The absence of heterogeneity indicates that performance disparities among ontologies stem not from arbitrary instability, but from intrinsic variations in ontology structure and classification complexity.

It also looks like the relative ranking of ontologies (CC highest, MF middle, BP lowest) is the same in all the tables and figures. This cross-validation of both tabular and graphical evidence makes it easier to understand how POSA-GO works. It also shows that the biological representations in the model are most useful for functions that happen in a specific area (CC), somewhat useful for biochemical activities that are more detailed (MF), and not very useful for processes that are more complicated at the pathway level (BP). This is in line with what we already know about how hard it is to guess GO. A consolidated cross-ontology performance summary is provided in Table 4.

Table 4. Cross-Ontology Summary of POSA-GO Predictive Accuracy and Semantic Distance Metrics.This table summarizes predictive performance (Fmax), semantic divergence (Smin), training set size, and standard error across MF, BP, and CC ontologies. The results illustrate cross-domain stability and relative prediction difficulty among GO categories.

GO Ontology Category	Fmax (Effect Size Metric)	Smin (Mean Semantic Distance)	Training Set Size (n)	SE
Molecular Function (MF)	0.589	8.129	35,086	0.00534
Biological Process (BP)	0.481	26.312	50,813	0.00444
Cellular Component (CC)	0.650	10.029	49,328	0.00450

The statistical analysis of Tables and Figures shows that POSA-GO's structure gives it a lot of power to make predictions. The fact that effect sizes are the same across ontologies shows that the full model is stable and reliable. The ablation results show that both the attention and PO2Vec parts have a big impact on how well the system works. The funnel plot characteristics indicate minimal heterogeneity and the absence of bias-related distortions. This statistical proof shows that POSA-GO is strong, which makes it even better for jobs that need to know exactly how proteins work.

5. Discussion

5.1 Advancing Protein Function Prediction through the POSA-GO Hybrid Framework

The results of this systematic evaluation demonstrate that the POSA-GO framework consistently and significantly improves protein function prediction across all Gene Ontology (GO) domains. This finding highlights the importance of integrating attention-based sequence modeling with semantically structured ontology embeddings. The superior performance of the full POSA-GO model compared with its ablated variants confirms that both architectural components contribute independently and synergistically to predictive accuracy. These observations align with prior studies showing that hybrid architectures incorporating ontology-aware learning outperform purely sequence-based or shallow representation methods (Jiang et al., 2016; Kulmanov et al., 2018).

The improved performance observed in the Cellular Component (CC) ontology, reflected by the highest Fmax value of 0.650, suggests that the model effectively captures spatial and structural features relevant to protein localization. This result is consistent with previous findings indicating that subcellular localization signals are often encoded in specific sequence regions and can be accurately detected using deep learning architectures incorporating attention mechanisms (Almagro et al., 2017; Thumuluri et al., 2022). In contrast, the Biological Process (BP) ontology consistently exhibited lower Fmax values across all models. This outcome is expected because BP annotations represent complex, multistep biological pathways involving intricate regulatory relationships and functional dependencies. These complexities make BP prediction inherently more difficult, as extensively documented in previous computational protein function prediction studies (Radivojac et al., 2013).

The ablation analysis further confirms the importance of each architectural component. Removal of the attention mechanism consistently reduced predictive performance, demonstrating its critical role in identifying functionally relevant sequence patterns. Attention-based architectures have been shown to improve representation learning by enabling models to focus selectively on informative sequence regions and capture long-range dependencies (Vaswani et al., 2017; Rives et al., 2021). Similarly, removal of the ontology embedding module resulted in substantial performance degradation, particularly in the BP ontology. This finding underscores the importance of representing Gene Ontology terms within a structured semantic embedding space, which enables models to capture hierarchical relationships and functional dependencies. Previous ontology embedding approaches have demonstrated similar advantages in improving biological prediction tasks by incorporating structured semantic information (Smaili et al., 2019).

The observed statistical stability of the results, reflected by narrow confidence intervals and low standard errors, confirms the robustness and reliability of the POSA-GO framework. Large-scale training datasets, such as those used in the CAFA challenge, have been shown to improve model generalization and reduce prediction variability across functional categories (You et al., 2018). The consistency of performance improvements across all ontologies further supports the conclusion that the observed gains are attributable to genuine architectural advantages rather than random variation or sampling bias.

The funnel plot analysis provides additional evidence supporting the reliability of the findings. The symmetrical distribution of effect sizes across varying training set sizes indicates minimal bias and limited heterogeneity. Such patterns are expected in controlled ablation studies and differ from multi-study meta-analyses, which often exhibit greater variability due to methodological differences (Higgins & Thompson, 2002). Furthermore, the observed performance differences across ontologies likely reflect intrinsic biological complexity rather than dataset imbalance. Previous CAFA challenge analyses have consistently reported varying levels of prediction difficulty across GO domains, with BP prediction representing the most challenging functional category (Zhou et al., 2019; Gillis & Pavlidis, 2013).

These findings have important implications for computational protein function prediction and highlight the expanding role of artificial intelligence in accelerating biological annotation and therapeutic discovery (Li et al., 2025; Setu et al., 2025). Traditional sequence similarity-based methods remain valuable but often fail to capture hierarchical relationships encoded within biological ontologies (Pearson, 2013; Sillitoe et al., 2019). In contrast, POSA-GO effectively integrates sequence-derived features with ontology-based semantic representations, enabling improved prediction across diverse functional domains. This integration enhances model generalizability and enables more biologically meaningful predictions.

The relationship between ontology complexity and predictive performance is also clearly evident. The cross-ontology performance hierarchy is visually summarized in Figure 5. The Biological Process ontology, which represents the largest and most hierarchically complex domain, consistently exhibited lower prediction accuracy. Previous research has shown that BP annotations are often incomplete, dynamically evolving, and subject to higher levels of annotation uncertainty, which complicates computational prediction (Ashburner et al., 2000; Dessimoz & Škunca, 2017). In contrast, Cellular Component annotations are typically more structurally defined and stable, facilitating more accurate prediction.

Figure 5. Cross-Ontology Comparative Visualization of POSA-GO Predictive Performance. This figure provides a consolidated graphical representation of Fmax performance across MF, BP, and CC domains. It visually reinforces the relative ranking (CC > MF > BP) and demonstrates domain-specific prediction difficulty.

These results have significant practical implications for biological research and drug discovery, as AI-driven functional annotation enables faster identification of therapeutic targets, accelerates molecular discovery, and enhances biological understanding (Li et al., 2025; Setu et al., 2025). Accurate prediction of Molecular Function and Cellular Component annotations can directly support enzyme characterization, structural analysis, and therapeutic target identification. The demonstrated importance of ontology-aware embeddings suggests that future models may benefit from integrating additional biological data sources, including protein family classification, structural databases, and evolutionary relationships (Mistry et al., 2021; Varadi et al., 2022).

Overall, this study highlights the critical importance of hybrid architectures that integrate sequence-level information with structured semantic knowledge. As protein function prediction continues to evolve, models incorporating hierarchical biological information, attention-based learning, and structured embeddings are likely to represent the next generation of annotation systems and play a central role in future AI-driven drug discovery and biological innovation (Li et al., 2025; Setu et al., 2025). The POSA-GO framework exemplifies this paradigm shift and represents a significant advancement in computational protein function prediction.

6. Limitations

Even though POSA-GO has some good points, there are some problems that need to be addressed. First, the study only uses CAFA3 data, which, while thorough, may not fully show the range of protein annotations that are available across organisms and databases. Functional landscapes change over time, and GO annotations are updated from time to time. This means that models trained on static datasets may not pick up on new relationships or biological discoveries. Second, the differences in performance between ontologies show that predicting Biological Process terms is still a difficult task. This suggests that the model may still have trouble capturing complex interactions at the pathway level. Third, the funnel plots show low heterogeneity, but this consistency comes from a dataset that was controlled internally, not from several independent studies. Because of this, the results may not apply to real-world datasets where there is a lot of noise in the annotations, an imbalance in the species, and different standards for curation. The ablation analysis only looks at two modules; it doesn't check for redundancy in other parts of POSA-GO, like training strategies and optimization methods. Finally, while attention mechanisms improve interpretability, their biological significance remains contentious, restricting the capacity to obtain mechanistic insights exclusively from attention weight distributions. More external validation and cross-dataset benchmarking are still needed to show that the results can be used in more situations.

7. Conclusion

POSA-GO demonstrates robust and consistent improvements in protein function prediction across all GO ontologies, with clear contributions from both its attention mechanism and PO2Vec embedding module. The statistical evidence, supported by narrow confidence intervals and symmetrical funnel plots, confirms the reliability of these gains. While prediction difficulty varies by ontology, the model’s integrated architecture represents a meaningful advancement in functional annotation. Continued refinement and external validation will further strengthen its role in computational biology and large-scale protein characterization. The demonstrated predictive accuracy and robustness of POSA-GO highlight its translational relevance in drug discovery, where precise protein function prediction is essential for identifying and prioritizing therapeutic targets.

References

Almagro Armenteros, J. J., Sønderby, C. K., Sønderby, S. K., Nielsen, H., & Winther, O. (2017). DeepLoc: Prediction of protein subcellular localization using deep learning. Bioinformatics, 33(21), 3387–3395. https://doi.org/10.1093/bioinformatics/btx431

Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene Ontology: Tool for the unification of biology. Nature Genetics, 25(1), 25–29. https://doi.org/10.1038/75556

Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. Wiley. https://doi.org/10.1002/9780470743386

Cao, Y., & Shen, Y. (2021). TALE: Transformer-based protein function annotation with joint sequence-label embedding. Bioinformatics, 37(17), 2825–2833. https://doi.org/10.1093/bioinformatics/btab198

DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188. https://doi.org/10.1016/0197-2456(86)90046-2

Dessimoz, C., & Škunca, N. (2017). The Gene Ontology handbook. Springer. https://doi.org/10.1007/978-1-4939-3743-1

Egger, M., Davey Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. BMJ, 315(7109), 629–634. https://doi.org/10.1136/bmj.315.7109.629

Elnaggar, A., Heinzinger, M., Dallago, C., Rehawi, G., Wang, Y., Jones, L., Gibbs, T., Feher, T., Angerer, C., Steinegger, M., et al. (2021). ProtTrans: Towards cracking the language of life's code through self-supervised deep learning and high performance computing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 7112–7127. https://doi.org/10.1109/TPAMI.2021.3095381

Fan, K., Guan, Y., & Zhang, Y. (2020). Graph2GO: A multi-modal attributed network embedding method for inferring protein functions. GigaScience, 9(7), giaa081. https://doi.org/10.1093/gigascience/giaa081

Fatima Zohra Smaili, Xin Gao, Robert Hoehndorf, Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations, Bioinformatics, Volume 34, Issue 13, July 2018, Pages i52–i60, https://doi.org/10.1093/bioinformatics/bty259

Gillis, J., & Pavlidis, P. (2013). Characterizing the state of the art in computational gene function prediction. BMC Bioinformatics, 14(Suppl 3), S15. https://doi.org/10.1186/1471-2105-14-S3-S15

Gu, Z., Luo, X., Chen, J., Deng, M., & Lai, L. (2023). Hierarchical graph transformer with contrastive learning for protein function prediction. Bioinformatics, 39(7), btad410. https://doi.org/10.1093/bioinformatics/btad410

Higgins, J. P. T., & Thompson, S. G. (2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558. https://doi.org/10.1002/sim.1186

Higgins, J. P. T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M. J., & Welch, V. A. (2022). Cochrane handbook for systematic reviews of interventions (Version 6.3). Cochrane. http://www.training.cochrane.org/handbook

Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557–560. https://doi.org/10.1136/bmj.327.7414.557

Jiang, Y., Oron, T. R., Clark, W. T., Bankapur, A. R., D'Andrea, D., Lepore, R., Funk, C. S., Kahanda, I., Verspoor, K. M., Ben-Hur, A., Koo, D. C. E., Penfold-Brown, D., Shasha, D. E., Youngs, N., Bonneau, R., Lin, A., Sahraeian, S. M. E., Martelli, P. L., Profiti, G., … Radivojac, P. (2016). An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biology, 17, 184. https://doi.org/10.1186/s13059-016-1037-6

Jiao, P., Wang, B., Wang, X., Liu, B., Wang, Y., & Li, J. (2023). Struct2GO: Protein function prediction based on graph pooling algorithm and AlphaFold2 structure information. Bioinformatics, 39(10), btad637. https://doi.org/10.1093/bioinformatics/btad637

Kulmanov, M., & Hoehndorf, R. (2019). DeepGOPlus: Improved protein function prediction from sequence. Bioinformatics, 36(2), 422–429. https://doi.org/10.1093/bioinformatics/btz595

Kulmanov, M., Khan, M. A., & Hoehndorf, R. (2018). DeepGO: Predicting protein functions from sequence and interactions using deep ontology-aware classifiers. Bioinformatics, 34(4), 660–668. https://doi.org/10.1093/bioinformatics/btx624

Li, W., Wang, B., Dai, J., Kou, Y., Chen, X., Pan, Y., Hu, S., & Xu, Z. Z. (2024). Partial order relation-based gene ontology embedding improves protein function prediction. Briefings in Bioinformatics, 25(2), bbae077. https://doi.org/10.1093/bib/bbae077

Li, Y., Liu, S., Tong, R., Zhang, P., Bian, J., Wang, T., & Gu, P. (2025). Revolutionizing Healthcare: The Role of Artificial Intelligence in Drug Discovery and Delivery. Integrative Biomedical Research, 9(1), 1-8. https://doi.org/10.25163/biomedical.9110452

Setu, S. N., Amin, R. B., & Mia, R. (2025). Benchmarking the Omics Revolution: A Comprehensive Review of Methodological Consistency and Clinical Readiness. Journal of Precision Biosciences, 7(1), 1-11. https://doi.org/10.25163/biosciences.7110539

Liu, Y., Wang, B., Yan, B., Jiang, H., & Dai, Y. (2025). POSA-GO: Fusion of hierarchical gene ontology and protein language models for protein function prediction. International Journal of Molecular Sciences, 26(13), 6362. https://doi.org/10.3390/ijms26136362.

Mao, Y., Xu, W., Shun, Y. et al. A multimodal model for protein function prediction. Sci Rep 15, 10465 (2025). https://doi.org/10.1038/s41598-025-94612-y

Mistry, J., Chuguransky, S., Williams, L., Qureshi, M., Salazar, G. A., Sonnhammer, E. L. L., Tosatto, S. C. E., Paladin, L., Raj, S., Richardson, L. J., Finn, R. D., & Bateman, A. (2021). Pfam: The protein families database in 2021. Nucleic Acids Research, 49(D1), D412–D419. https://doi.org/10.1093/nar/gkaa913

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., et al. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71

Pearson, W. R. (2013). An introduction to sequence similarity (“homology”) searching. Current Protocols in Bioinformatics, 42(1), 3.1.1–3.1.8. https://doi.org/10.1002/0471250953.bi0301s42

Radivojac, P., Clark, W. T., Oron, T. R., Schnoes, A. M., Wittkop, T., Sokolov, A., Graim, K., Funk, C., Verspoor, K., Ben-Hur, A., Pandey, G., Yunes, J. M., Talwalkar, A. S., Repo, S., Souza, M. L., Piovesan, D., Casadio, R., Wang, Z., Cheng, J., … Friedberg, I. (2013). A large-scale evaluation of computational protein function prediction. Nature Methods, 10(3), 221–227. https://doi.org/10.1038/nmeth.2340

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., & Fergus, R. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118

Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., et al. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences of the United States of America, 118(15), e2016239118. https://doi.org/10.1073/pnas.2016239118

Sillitoe, I., Dawson, N., Lewis, T. E., Das, S., Lees, J. G., Ashford, P., Tolulope, A., Scholes, H. M., Senatorov, I., Bujan, A., Ceballos Rodriguez-Conde, F., Dowling, B., Thornton, J. M., & Orengo, C. A. (2019). CATH: Expanding the horizons of structure-based functional annotations. Nucleic Acids Research, 47(D1), D280–D284. https://doi.org/10.1093/nar/gky1097

Smaili, F. Z., Gao, X., & Hoehndorf, R. (2019). OPA2Vec: Combining formal and informal content of biomedical ontologies for improved similarity-based predictions. Bioinformatics, 35(12), 2133–2140. https://doi.org/10.1093/bioinformatics/bty933

Thumuluri V, Almagro Armenteros JJ, Johansen AR, Nielsen H, Winther O. DeepLoc 2.0: multi-label subcellular localization prediction using protein language models. Nucleic Acids Res. 2022 Jul 5;50(W1):W228-W234. doi: 10.1093/nar/gkac278. PMID: 35489069; PMCID: PMC9252801.

Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., Yuan, D., Stroe, O., Wood, G., Laydon, A., Zidek, A., Green, T., Tunyasuvunakool, K., Petersen, S., Jumper, J., Clancy, E., Green, R., Vora, A., Luttrell, J., … Velankar, S. (2022). AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space. Nucleic Acids Research, 50(D1), D439–D444. https://doi.org/10.1093/nar/gkab1061

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 6000–6010).

You, R., Yao, S., Mamitsuka, H., & Zhu, S. (2021). DeepGraphGO: Graph neural network for large-scale, multispecies protein function prediction. Bioinformatics, 37(Supplement_1), i262–i271. https://doi.org/10.1093/bioinformatics/btab270

You, R., Yao, S., Xiong, T., Huang, X., Sun, F., & Mamitsuka, H. (2018). NetGO: Improving protein function prediction using large-scale protein–protein interaction data and deep learning. Bioinformatics, 34(18), 3119–3128. https://doi.org/10.1093/nar/gkz388

Zhou, N., Jiang, Y., Bergquist, T. R., Lee, A. J., Kacsoh, B. Z., Crocker, A. W., Lewis, K. A., Georghiou, G., Nguyen, H. N., Hamid, M. N., Davis, L., Dogan, T., Atalay, V., Rifaioglu, A. S., Dalkiran, A., Cetin-Atalay, R., Zhang, C., Hurto, R. L., Freddolino, P. L., … Radivojac, P. (2019). The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes. Genome Biology, 20, 244. https://doi.org/10.1186/s13059-019-1835-8

Article metrics

View details

Downloads

Citations

1138

Views

View Dimensions

View Plumx

View Altmetric

0
Save

0
Citation

1138
View

1
Share

Bioinfo Chem

Article Contents

Artificial Intelligence in Drug Discovery: Systematic Review and Meta-Analysis of Predictive Performance, Structural Modeling, and Translational Reliability

Abstract

1. Introduction

2.Materials and Methods

3. MLOps and Engineering Standards for Reproducibility

4. Results

5. Discussion

6. Limitations

7. Conclusion

References

Stay connected