Bioinfo Chem

System biology and Infochemistry | Online ISSN 3071-4826
1
Citations
13.3k
Views
32
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
Figures and Tables
REVIEWS   (Open Access)

Artificial Intelligence in Drug Repurposing: Machine Learning, Deep Learning, and Network-Based Drug Discovery

Shunqi Liu 1*

+ Author Affiliations

Bioinfo Chem 5 (1) 1-8 https://doi.org/10.25163/bioinformatics.5110723

Submitted: 01 January 2023 Revised: 17 February 2023  Published: 27 February 2023 


Abstract

AI-driven drug repurposing is increasingly emerging as a powerful strategy in drug discovery, addressing the high cost, long timelines, and low success rates of conventional drug development. By leveraging artificial intelligence, machine learning, and deep learning, drug repurposing is evolving from serendipitous discovery toward systematic, data-driven approaches. This review synthesizes recent advances in AI-driven drug repurposing, highlighting the integration of heterogeneous biomedical data, including genomic, transcriptomic, clinical, and pharmacological datasets. Machine learning and deep learning models demonstrate strong potential to identify novel drug–disease relationships, particularly by capturing complex, nonlinear biological interactions. Network pharmacology approaches further enhance this capability by modeling drug action within interconnected biological systems rather than isolated targets. Despite these advances, key challenges remain. Data heterogeneity, limited model interpretability, and insufficient experimental validation continue to constrain clinical translation. These limitations highlight the need for robust validation frameworks and improved data integration strategies. Overall, AI-driven drug repurposing represents a shift toward predictive and integrative drug discovery. Continued progress will depend on combining advanced computational methods with interdisciplinary validation to translate computational insights into clinically actionable therapies.

Keywords: AI-driven drug repurposing; machine learning; deep learning; network pharmacology; biomedical data integration

1. Introduction

The modern pharmaceutical landscape, despite its remarkable scientific advances, continues to grapple with what is often described—perhaps somewhat uneasily—as a persistent “productivity gap.” Put simply, the increasing financial and temporal investments in drug development have not translated into a proportional rise in successful therapeutic approvals. Estimates frequently suggest that bringing a single drug to market may require over a decade of research and upwards of billions of dollars, yet the probability of success remains strikingly low (Pushpakom et al., 2019). This imbalance, where effort expands but output stagnates, has prompted researchers and industry stakeholders alike to reconsider traditional paradigms of drug discovery.

Against this backdrop, drug repositioning—or repurposing—has gradually shifted from a peripheral strategy to something far more central. Rather than initiating discovery from scratch, repositioning seeks new therapeutic applications for compounds that are already approved or at least partially characterized. This approach, while conceptually simple, offers tangible advantages: reduced development timelines, lower financial risk, and a higher likelihood of clinical success due to pre-existing safety data (Ashburn & Thor, 2004; Pushpakom et al., 2019). Yet, despite these advantages, early repositioning efforts were rarely systematic. They were, more often than not, guided by clinical observation, serendipitous findings, or retrospective insights—valuable, certainly, but not easily reproducible or scalable.

Over time, however, the landscape began to shift. The rapid expansion of biomedical data—particularly with the advent of high-throughput technologies—has created opportunities that earlier generations of researchers could scarcely have anticipated. Massive repositories of genomic, transcriptomic, proteomic, and phenotypic data now exist, alongside clinical datasets derived from electronic health records and pharmacovigilance systems (Lamb et al., 2006). Yet, paradoxically, the very abundance of this data has introduced a new challenge: complexity. Traditional analytical approaches, often grounded in linear assumptions and limited datasets, struggle to extract meaningful patterns from such high-dimensional and heterogeneous information.

It is within this context that artificial intelligence (AI) has emerged—not as a singular solution, perhaps, but as a compelling set of tools capable of navigating this complexity. Machine learning (ML) and deep learning (DL), in particular, have demonstrated an ability to uncover relationships that might otherwise remain obscured. Unlike conventional statistical models, which typically require predefined hypotheses, AI systems can operate in a more exploratory manner, identifying latent associations across diverse datasets (LeCun et al., 2015; Chen et al., 2018). This shift—from hypothesis-driven to data-driven discovery—marks a subtle yet significant transformation in how drug repurposing is approached.

Early computational efforts in drug repositioning laid important groundwork. Methods integrating chemical, genomic, and pharmacological data began to show that drug–target interactions could be predicted with reasonable accuracy using algorithmic approaches (Yamanishi et al., 2008; Yamanishi et al., 2010). Subsequent developments introduced supervised learning frameworks, such as bipartite local models, which further refined the prediction of drug–target interactions (Bleakley & Yamanishi, 2009). These approaches, while promising, were still constrained by the limitations of feature engineering and the availability of curated datasets.

More recently, machine learning models such as Support Vector Machines and Random Forests have been applied to predict drug efficacy, side effects, and molecular interactions, often leveraging integrated datasets that combine chemical properties with biological activity profiles (Napolitano et al., 2013; Kai & Hon-Cheong, 2018). These models, although powerful, typically rely on structured inputs and may struggle with unstructured or highly complex data types.

Deep learning, by contrast, has introduced a different paradigm. By employing multilayered neural networks, DL models can automatically learn hierarchical representations from raw data, whether those data are molecular structures, gene expression profiles, or clinical records (Aliper et al., 2016; LeCun et al., 2015). This capability is particularly valuable in drug repurposing, where relationships between drugs and diseases are rarely linear or straightforward. Indeed, deep neural networks have shown promise in capturing intricate, nonlinear dependencies that are essential for understanding drug–target interactions in multifactorial diseases such as cancer (Chen et al., 2018).

In parallel, network-based approaches have gained traction as a means of conceptualizing biological systems not as isolated components, but as interconnected networks. These methods often rely on the assumption that diseases can be represented as modules within molecular interaction networks, and that drugs targeting proteins proximal to these modules may exert therapeutic effects (Cheng et al., 2012). Techniques such as network propagation and clustering enable researchers to identify candidate drugs by examining their position within these networks, effectively shifting the focus from individual targets to system-level interactions. While this perspective is not entirely new, its integration with AI methodologies has significantly enhanced its predictive capacity.

Despite these advances, it would be overly optimistic to suggest that AI-driven drug repurposing has fully realized its potential. Several challenges remain, and they are not trivial. One of the most persistent issues concerns data quality. Biomedical datasets, although abundant, are often fragmented, inconsistently annotated, and subject to varying degrees of noise (Li et al., 2016). This heterogeneity can compromise model performance, particularly when integrating data from multiple sources. Moreover, imbalanced datasets—where certain diseases or drug classes are overrepresented—can introduce bias, leading to skewed predictions and potentially overlooking novel therapeutic opportunities.

Another, perhaps more conceptual, challenge lies in the interpretability of AI models. Many deep learning architectures function as “black boxes,” producing predictions without offering clear explanations for how those predictions were derived (Schneider, 2018). While high predictive accuracy is certainly valuable, the lack of transparency can hinder clinical adoption. Clinicians and regulatory bodies, understandably, require not only evidence of efficacy but also a mechanistic understanding of how a drug exerts its effects.

Validation, too, remains a critical bottleneck. Although numerous computational studies report promising results, relatively few predictions successfully transition to experimental or clinical validation. This gap—between in silico prediction and real-world application—reflects both methodological limitations and practical constraints (Brown & Patel, 2018). For instance, factors such as drug dosage, bioavailability, and tissue-specific effects are often not fully captured in computational models, yet they play a decisive role in clinical outcomes.

Taken together, these considerations suggest that while AI has undoubtedly transformed the landscape of drug repurposing, its integration into clinical practice remains an ongoing process—one that requires not only technical refinement but also interdisciplinary collaboration. The field stands, in a sense, at an inflection point: rich with potential, yet still navigating the complexities of translation from algorithmic insight to therapeutic reality.

2. Methodology

2.1 Conceptual Framing and Review Scope

This narrative review was designed to critically examine the evolving landscape of AI-driven drug repurposing, with a particular emphasis on methodological convergence across computational paradigms. Rather than adopting a strictly systematic protocol, the approach intentionally allowed for interpretive flexibility—arguably necessary when engaging with a field that is itself rapidly shifting. The review focused on identifying conceptual patterns, methodological innovations, and recurring limitations across studies published primarily within the last decade, while also incorporating foundational works that established early computational frameworks.

2.2 Literature Identification and Selection Strategy

Relevant literature was identified through targeted searches across major scientific databases, including PubMed, Scopus, and Web of Science, complemented by domain-specific repositories such as DrugBank, ChEMBL, and PubChem. Search queries combined keywords such as “drug repurposing,” “machine learning,” “deep learning,” “network pharmacology,” and “AI in drug discovery.” Additional sources were retrieved through backward and forward citation tracking to ensure inclusion of influential studies (e.g., early drug–target interaction models and deep learning applications).

Inclusion criteria prioritized studies that (i) applied computational or AI-based methods to drug repurposing, (ii) demonstrated methodological innovation or integration of multi-modal datasets, and (iii) provided measurable performance outputs such as predictive accuracy or AUC. Exclusion criteria included purely experimental studies lacking computational components and reports with insufficient methodological transparency.

2.3 Data Sources and Thematic Categorization

The selected literature was organized into four primary methodological domains:

  • Traditional machine learning and matrix-based approaches
  • Deep learning architectures
  • Network-based and systems biology frameworks
  • Natural language processing and text-mining strategies

This categorization was not imposed arbitrarily but emerged gradually through iterative reading, as recurring analytical patterns became more apparent. Supporting data sources included large-scale biomedical repositories such as DrugBank, KEGG, UniProt, and the Connectivity Map, which collectively underpin most AI-driven repurposing workflows.

2.4 Analytical Synthesis Approach

Rather than quantitatively aggregating results, the review employed a qualitative synthesis framework. Studies were compared based on methodological design, data integration strategies, predictive performance, and translational relevance. Particular attention was given to how different computational approaches address biological complexity—whether through structured prediction, nonlinear modeling, or network-level inference.

Tables were constructed to summarize key findings across methodologies, highlighting algorithmic strengths, limitations, and application domains. These tabular syntheses were used not merely as descriptive tools, but as analytical anchors to identify broader trends across the field.

2.5 Critical Evaluation and Validation Considerations

A central component of the methodology involved assessing not only predictive performance but also interpretability and translational feasibility. Studies were evaluated for their ability to bridge the gap between computational prediction and experimental validation—a recurring limitation across the literature.

In doing so, the review aimed to move beyond surface-level comparisons and toward a more integrative understanding of how AI methodologies collectively contribute to drug repurposing. This interpretive lens, while inherently subjective, provides a more holistic perspective on a field that resists strict methodological boundaries.

3. The Digital Renaissance of Pharmacology: Reimagining Drug Repurposing Through Artificial Intelligence

The contemporary pharmaceutical ecosystem, despite its extraordinary scientific depth, still seems to move—at times—more slowly than one might expect. There is, almost paradoxically, an abundance of biological knowledge, yet a persistent scarcity of efficiently translated therapies. The process of de novo drug discovery remains long, uncertain, and financially burdensome, often extending over a decade with a high likelihood of failure before reaching clinical approval (Ashburn & Thor, 2004). This tension between knowledge generation and therapeutic realization has, gradually but unmistakably, pushed the scientific community toward alternative strategies that are not necessarily new, but are now being reconsidered with renewed urgency.

Drug repurposing, in this context, emerges less as a novel invention and more as a rediscovery—albeit one reshaped by modern computational capabilities. At its core, repurposing leverages existing drugs, compounds whose safety profiles and pharmacokinetics are already partially understood, to explore new therapeutic indications. Historically, such discoveries were often incidental, arising from unexpected clinical observations. Yet, as biomedical data has grown both in volume and complexity, this once opportunistic process has begun to evolve into something more systematic, more deliberate, and perhaps more predictive. The transition is subtle but important: from chance to computation, from observation to inference.

The emergence of large-scale datasets—particularly those derived from genomics, transcriptomics, and proteomics—has played a central role in this transformation. Initiatives such as the Connectivity Map have demonstrated that gene-expression signatures can serve as bridges linking drugs, diseases, and molecular pathways (Lamb et al., 2006). However, the challenge is not merely the availability of data, but the ability to interpret it. It is here that artificial intelligence (AI), encompassing both machine learning and deep learning paradigms, begins to exert its influence, offering tools capable of navigating biological complexity in ways that traditional approaches cannot.

3.1 The Machine Learning Vanguard: Structured Prediction in High-Dimensional Spaces

Machine learning (ML), perhaps the most established branch of AI in biomedical research, has provided a foundational framework for drug repurposing. Unlike classical statistical models—which often assume linearity and struggle with high-dimensional datasets—ML algorithms are designed to detect patterns within complex, heterogeneous data environments. Methods such as Support Vector Machines, Random Forests, and kernel-based approaches have been widely employed to classify and predict drug–disease and drug–target relationships (Menden et al., 2013).

One of the early conceptual advances in this space involved framing drug–target interaction (DTI) prediction as a supervised learning problem. Bleakley and Yamanishi (2009), for instance, introduced bipartite local models that learn interaction patterns by considering drugs and targets as interconnected entities within a network. This approach, while seemingly straightforward, allowed for the integration of multiple data types, thereby improving predictive performance. Similarly, kernel-based learning techniques have enabled the fusion of chemical similarity, genomic data, and pharmacological profiles into unified predictive frameworks (Nascimento et al., 2016).

In parallel, other studies have approached DTI prediction through linkage-based modeling. By treating drug–target interactions as edges within a network, these models aim to infer missing links based on observed patterns. Keiser et al. (2009), for example, demonstrated that chemical similarity alone could reveal unexpected drug–target associations, suggesting that even relatively simple descriptors can yield meaningful insights when analyzed appropriately.

Yet, despite these advances, traditional ML methods are not without limitations. They often rely heavily on feature engineering—the manual selection and transformation of input variables—which can introduce bias or overlook subtle patterns embedded within raw data. Moreover, as datasets grow increasingly complex, the scalability of these models becomes a concern. It is perhaps for these reasons that the field has gradually shifted toward more flexible and adaptive learning paradigms.

3.2 Deep Learning: Capturing the Nonlinear Fabric of Biological Systems

If machine learning laid the groundwork, deep learning (DL) has, in many ways, expanded the horizon. Characterized by multilayered neural architectures, DL models are capable of learning hierarchical representations directly from raw data, reducing the need for manual feature extraction. This capability is particularly relevant in drug repurposing, where biological relationships are rarely linear and often involve intricate, multi-level interactions.

Aliper et al. (2016) provided an early demonstration of the power of deep learning in this domain. By training deep neural networks on transcriptomic datasets derived from the Connectivity Map, they showed that these models could predict pharmacological properties and therapeutic classes with notable accuracy. What distinguishes such models is not merely their predictive performance, but their ability to capture latent relationships—connections that are not immediately visible through conventional analysis.

Further developments have introduced architectures such as autoencoders, which compress high-dimensional data into lower-dimensional representations while preserving essential features. These models enable the integration of diverse datasets—ranging from drug side-effect profiles to gene expression signatures—into cohesive analytical frameworks. Similarly, ensemble classifiers, such as iPPI-Esml, have been employed to predict protein–protein interactions by incorporating physicochemical properties and advanced signal processing techniques (Jia et al., 2015).

Despite their promise, deep learning models introduce their own set of challenges. Chief among these is interpretability. While these models can achieve high levels of accuracy, their decision-making processes are often opaque, making it difficult to extract mechanistic insights. This “black box” nature raises concerns, particularly in clinical contexts where understanding the rationale behind a prediction is as important as the prediction itself.

3.3 Network Biology and Systems-Level Reasoning: The Logic of Proximity

A growing body of research suggests that understanding drug action requires moving beyond isolated targets toward a systems-level perspective. Biological processes, after all, are governed by networks—complex webs of interactions among genes, proteins, and metabolites. Network-based approaches in drug repurposing attempt to harness this complexity by modeling biological systems as interconnected graphs.

One of the central concepts in this area is the notion of network proximity. The idea, in its simplest form, is that drugs whose targets are located near disease-associated proteins within a network are more likely to exert therapeutic effects. Cheng et al. (2012) operationalized this concept through network-based inference methods, demonstrating that the topology of drug–target networks can be used to predict novel interactions.

Other approaches, such as the PREDICT framework developed by Gottlieb et al. (2011), integrate multiple layers of information—including drug similarity and disease phenotype data—to infer new drug indications. Similarly, Sirota et al. (2011) utilized gene expression compendia to identify drugs capable of reversing disease-associated transcriptional signatures, thereby providing a functional basis for repurposing hypotheses.

What makes network-based approaches particularly compelling is their ability to capture indirect relationships. A drug may not target a disease protein directly, but it may influence a pathway that ultimately modulates disease progression. This “guilt-by-association” principle, while conceptually intuitive, has proven to be a powerful tool when combined with computational methods.

3.4 Mining the Unstructured Frontier: Natural Language Processing and Knowledge Extraction

While structured datasets such as omics profiles are invaluable, a significant portion of biomedical knowledge remains embedded within unstructured text—scientific articles, clinical reports, and trial records. Extracting meaningful information from these sources presents a different kind of challenge, one that has been addressed through Natural Language Processing (NLP) and text mining.

Li and Lu (2012) demonstrated that pharmacogenomic relationships could be systematically extracted from clinical trial data, enabling the identification of gene–drug interactions that might otherwise remain hidden. Similarly, Leaman et al. (2013) developed DNorm, a system for disease name normalization, which facilitates the aggregation and comparison of information across diverse textual sources.

These methods, while perhaps less visually striking than deep learning models, play a crucial role in enriching the repurposing pipeline. By integrating textual knowledge with structured data, researchers can construct more comprehensive models that reflect the full spectrum of available evidence.

3.5 Challenges, Limitations, and the Persistent Translational Gap

Despite the considerable progress achieved, AI-driven drug repurposing remains, in many respects, an evolving field. One of the most persistent challenges lies in data quality. Biomedical datasets are often fragmented across institutions, inconsistently annotated, and subject to varying levels of noise. This heterogeneity can compromise model performance and limit the generalizability of findings (Brown & Patel, 2017).

Another issue concerns data imbalance. Certain diseases—particularly those that are well-studied—are overrepresented in available datasets, while rare conditions remain underexplored. This imbalance can lead to biased models that favor known associations, potentially overlooking novel therapeutic opportunities.

Moreover, the gap between computational prediction and clinical application remains significant. While models may identify promising candidates, translating these predictions into effective treatments requires rigorous experimental validation. Factors such as drug dosage, tissue-specific effects, and long-term safety are difficult to capture fully within computational frameworks.

3.6 Toward a Convergent Future: Integrating Intelligence and Validation

If there is a unifying theme emerging from this digital renaissance, it is perhaps the recognition that no single methodology is sufficient on its own. Machine learning, deep learning, network biology, and NLP each offer unique strengths, but their true potential lies in integration. By combining these approaches, researchers can develop more robust and holistic models of drug action.

At the same time, the importance of validation cannot be overstated. Computational predictions must be complemented by experimental and clinical studies to ensure their reliability and relevance. As the field continues to evolve, the challenge will not only be to generate predictions, but to translate them into tangible therapeutic outcomes.

4. From Data to Decision: Interpreting the Performance of AI-Driven Drug Repurposing

The results synthesized from the reviewed literature and associated tables suggest—perhaps more clearly than expected—that pharmacology is undergoing not just incremental progress, but something closer to a structural transformation. What once depended heavily on serendipity, intuition, and isolated clinical observations is now increasingly shaped by systematic, algorithm-driven reasoning. This shift is not merely technological; it reflects a deeper reorientation in how therapeutic discovery is conceptualized. By the late 2010s, artificial intelligence (AI) and machine learning (ML) had begun to meaningfully address the long-standing productivity gap in drug development by capitalizing on the known safety profiles of existing compounds (Ashburn & Thor, 2004). Yet, while the promise is evident, the underlying mechanisms—and their limitations—deserve careful interpretation.

4.1 The Data Foundation: Integration as Both Strength and Constraint

If there is one recurring theme across the results, it is that the performance of any computational model is inseparable from the data on which it is trained. High-quality repositories such as DrugBank and ChEMBL have served as essential backbones for predictive modeling, offering structured information on drug properties, targets, and biochemical interactions (Wishart et al., 2018; Gaulton et al., 2017). These databases, while often treated as static resources, function more accurately as evolving ecosystems—continuously refined, yet still imperfect.

The incorporation of transcriptomic resources, particularly the Connectivity Map and its expanded LINCS platform, introduced a different dimension to repurposing strategies (Lamb et al., 2006; Subramanian et al., 2017). Rather than focusing solely on chemical similarity, these datasets enabled researchers to examine gene-expression signatures as a shared language linking drugs and diseases. This shift allowed models to move beyond surface-level associations toward more mechanistic interpretations.

However, the results also suggest a subtle tension. While integrating heterogeneous datasets—chemical, genomic, phenotypic—generally improves predictive performance, it also introduces complexity. Data heterogeneity, inconsistencies across experimental conditions, and varying annotation standards can, at times, obscure rather than clarify relationships. In this sense, data integration is both the strength and the constraint of AI-driven repurposing.

4.2 Revisiting Traditional Machine Learning: Reliability Within Boundaries

Before the rise of deep architectures, traditional machine learning methods provided the first compelling evidence that drug–target interactions could be predicted with meaningful accuracy. Models such as Support Vector Machines and Random Forests, as summarized in the results tables, demonstrated strong classification capabilities, particularly when applied to well-characterized targets (Bleakley & Yamanishi, 2009).

The bipartite local model framework introduced by Bleakley and Yamanishi (2009) remains a notable milestone, achieving high predictive performance by learning interaction patterns within drug–target networks. Similarly, the PREDICT model extended this idea by integrating multiple similarity measures, including phenotypic and side-effect data, to infer novel drug indications (Gottlieb et al., 2011). These approaches illustrate an important principle: even relatively simple models can yield valuable insights when supported by well-curated data.

Matrix factorization techniques further expanded the methodological toolkit by addressing the so-called “cold-start” problem. By treating drug–disease associations as incomplete matrices, methods such as low-rank approximation were able to infer missing links without relying on explicitly defined negative samples (Luo et al., 2018). This flexibility is particularly relevant in biomedical research, where true negative interactions are often unknown or uncertain.

Yet, these methods are not without limitations. Their reliance on predefined features and relatively shallow architectures restricts their ability to capture the full complexity of biological systems. They perform well within structured boundaries—but those boundaries, as the results suggest, are increasingly being challenged.

4.3 Deep Learning and the Emergence of Nonlinear Insight

The transition from traditional ML to deep learning (DL) represents, in many ways, a shift from structured prediction to exploratory inference. Deep Neural Networks (DNNs), with their layered architectures, are capable of capturing nonlinear relationships that are difficult—if not impossible—to model using conventional techniques.

The findings associated with Aliper et al. (2016) highlight this shift clearly. By leveraging transcriptomic data, DNNs were able to outperform traditional models in predicting therapeutic classes, suggesting that deeper architectures can uncover latent biological relationships that simpler models overlook. Similarly, convolutional approaches applied to molecular and protein structures have improved predictions of binding affinity and functional interactions.

What is particularly striking, however, is not just the increase in accuracy, but the change in perspective. Deep learning does not simply refine existing predictions—it redefines the space in which those predictions are made. Biological systems, with their feedback loops and multi-layered interactions, are inherently nonlinear. DL models, perhaps for the first time, offer a computational framework that aligns with this complexity.

At the same time, the results hint at a certain unease. As models become more powerful, they also become less interpretable. The ability to predict does not necessarily equate to the ability to explain—a distinction that becomes critical in clinical contexts.

4.4 Network Biology: Reconstructing Pharmacology as an Interactome

Another important dimension emerging from the results is the growing role of network-based approaches. Rather than viewing drug action as a simple one-to-one interaction between a compound and a target, these methods conceptualize pharmacology as a system of interconnected relationships.

Network-based inference models, such as those proposed by Cheng et al. (2012), rely on the principle of proximity—suggesting that drugs targeting proteins near disease-associated modules are more likely to be therapeutically effective. This “guilt-by-association” logic, while conceptually intuitive, gains significant power when applied at scale.

The integration of network biology with computational methods allows for the identification of indirect effects—pathways through which a drug may influence a disease without directly targeting its primary protein. This perspective is further supported by earlier work on drug–target networks, which demonstrated that integrating chemical and genomic data can reveal previously unrecognized interactions (Yamanishi et al., 2008).

In many ways, this represents a shift from reductionism to systems thinking. Drugs are no longer viewed as isolated agents, but as participants in a broader biological network.

4.5 Extracting Knowledge from Text: The Role of NLP in Repurposing

While structured datasets provide a foundation for modeling, a significant portion of biomedical knowledge remains embedded in unstructured text. The application of Natural Language Processing (NLP) has therefore become an important complement to data-driven approaches.

Tools such as DNorm have enabled the normalization of disease terminology, allowing researchers to aggregate and compare information across diverse sources (Leaman et al., 2013). Similarly, methods for extracting pharmacogenomic relationships from clinical trial data have demonstrated that valuable insights can be derived directly from textual records (Li & Lu, 2012).

These approaches, though sometimes less emphasized, play a critical role in ensuring that computational models are informed by the full breadth of scientific knowledge. They bridge the gap between numerical data and human understanding—between structured datasets and the narrative complexity of biomedical research.

4.6 The Persistent Gap: From Computational Prediction to Clinical Reality

Despite the considerable advances highlighted in the results, the transition from computational prediction to clinical application remains uneven. The so-called translational gap persists, shaped by both technical and practical challenges.

Data quality remains a central concern. Even the most sophisticated models cannot fully compensate for incomplete, noisy, or biased datasets. The reliance on well-studied diseases further amplifies this issue, potentially limiting the discovery of novel therapeutic opportunities.

Equally important is the issue of interpretability. As noted in broader discussions of AI in drug discovery, the “black box” nature of many models poses a barrier to clinical adoption (Schneider, 2018). Clinicians require not only predictions, but explanations—mechanistic insights that justify therapeutic decisions.

Finally, real-world variables such as drug bioavailability, tissue specificity, and patient heterogeneity introduce complexities that are difficult to capture computationally. While AI can generate hypotheses at unprecedented speed, validation remains a fundamentally human—and experimental—process (Brown & Patel, 2018).

4.7 Toward a Balanced Future: Integrating Computation and Clinical Insight

Taken together, the results suggest that AI-driven drug repurposing is not a replacement for traditional approaches, but rather an extension of them. Machine learning provides structure, deep learning introduces flexibility, network biology offers context, and NLP contributes depth. Each methodology addresses a different aspect of the problem, and their integration appears to be the most promising path forward.

Yet, perhaps the most important insight is that progress in this field depends not only on algorithms, but on collaboration—between data scientists, clinicians, and experimental researchers. The future of drug repurposing lies not in automation alone, but in the careful alignment of computational insight with clinical reality.

5. Limitations

Despite its integrative scope, this review is not without limitations. As a narrative synthesis, it does not employ a formal systematic review protocol, which may introduce selection bias in the inclusion of studies. While efforts were made to incorporate diverse and representative literature, the rapidly evolving nature of AI-driven drug repurposing means that some recent developments may not be fully captured.

Additionally, the heterogeneity of methodologies—ranging from classical machine learning to deep neural networks—makes direct comparison challenging, particularly when performance metrics are reported inconsistently across studies. The reliance on published results also introduces the possibility of publication bias, where studies reporting positive outcomes are overrepresented.

Finally, the review emphasizes computational approaches, which, while informative, may underrepresent experimental and clinical validation aspects that are critical for real-world applicability.

6. Conclusion

AI-driven drug repurposing represents a meaningful shift in how therapeutic discovery is conceptualized—less dependent on chance, and increasingly guided by data-intensive reasoning. Machine learning, deep learning, and network-based approaches collectively expand the analytical capacity to identify novel drug–disease relationships. Yet, this progress is not without tension. Challenges related to data quality, interpretability, and translational validation persist, limiting immediate clinical impact. Ultimately, the future of drug repurposing will likely depend not on any single computational method, but on the integration of diverse approaches, coupled with rigorous experimental validation. The field, in many ways, remains unfinished—but undeniably transformative.

Author Contributions  

S.L. conceptualized the study, designed the review framework, conducted literature synthesis and analysis, interpreted the findings, and drafted, reviewed, and finalized the manuscript.

References


Aliper, A., Plis, S., Artemov, A., Ulloa, A., Mamoshina, P., & Zhavoronkov, A. (2016). Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Molecular Pharmaceutics, 13(7), 2524–2530. https://doi.org/10.1021/acs.molpharmaceut.6b00248

Ashburn, T. T., & Thor, K. B. (2004). Drug repositioning: Identifying and developing new uses for existing drugs. Nature Reviews Drug Discovery, 3(8), 673–683. https://doi.org/10.1038/nrd1468

Bateman, A., Martin, M. J., O'Donovan, C., et al. (2017). UniProt: The universal protein knowledgebase. Nucleic Acids Research, 45(D1), D158–D169. https://doi.org/10.1093/nar/gkw1099

Bleakley, K., & Yamanishi, Y. (2009). Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics, 25(18), 2397–2403. https://doi.org/10.1093/bioinformatics/btp433

Brown, A. S., & Patel, C. J. (2017). A standard database for drug repositioning. Scientific Data, 4, 170029. https://doi.org/10.1038/sdata.2017.29

Brown, A. S., & Patel, C. J. (2018). A review of validation strategies for computational drug repositioning. Briefings in Bioinformatics, 19(1), 174–177. https://doi.org/10.1093/bib/bbw110

Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke, T. (2018). The rise of deep learning in drug discovery. Drug Discovery Today, 23(6), 1241–1250. https://doi.org/10.1016/j.drudis.2018.01.039

Cheng, F., Liu, C., Jiang, J., Lu, W., Li, W., Liu, G., ... & Tang, Y. (2012). Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Computational Biology, 8(5), e1002503. https://doi.org/10.1371/journal.pcbi.1002503

Gaulton, A., Bellis, L. J., Bento, A. P., et al. (2017). The ChEMBL database in 2017. Nucleic Acids Research, 45(D1), D945–D954. https://doi.org/10.1093/nar/gkw1074

Gottlieb, A., Stein, G. Y., Ruppin, E., & Sharan, R. (2011). PREDICT: A method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology, 7, 496. https://doi.org/10.1038/msb.2011.26

Gottlieb, A., Stein, G. Y., Ruppin, E., & Sharan, R. (2011). PREDICT: A method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology, 7, 496. https://doi.org/10.1038/msb.2011.26

Jia, J., Liu, Z., Xiao, X., Liu, B., & Chou, K. C. (2015). iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. Journal of Theoretical Biology, 377, 47–56. https://doi.org/10.1016/j.jtbi.2015.04.010

Kai, Z., & Hon-Cheong, S. (2018). Drug repositioning for schizophrenia and depression/anxiety disorders: A machine learning approach leveraging expression data. IEEE Journal of Biomedical and Health Informatics.

Kanehisa, M., Furumichi, M., Tanabe, M., et al. (2017). KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Research, 45(D1), D353–D361. https://doi.org/10.1093/nar/gkw1092

Keiser, M. J., Setola, V., Irwin, J. J., Laggner, C., Abbas, A. I., Hufeisen, S. J., ... & Roth, B. L. (2009). Predicting new molecular targets for known drugs. Nature, 462(7270), 175–181. https://doi.org/10.1038/nature08506

Kim, S., Thiessen, P. A., Bolton, E. E., et al. (2016). PubChem substance and compound databases. Nucleic Acids Research, 44(D1), D1202–D1213. https://doi.org/10.1093/nar/gkv951

Lamb, J., Crawford, E. D., Peck, D., Modell, J. W., Blat, I. C., Wrobel, M. J., ... & Golub, T. R. (2006). The Connectivity Map: Using gene-expression signatures to connect small molecules, genes, and disease. Science, 313(5795), 1929–1935. https://doi.org/10.1126/science.1132939

Leaman, R., Islamaj Dogan, R., & Lu, Z. (2013). DNorm: Disease name normalization with pairwise learning to rank. Bioinformatics, 29(22), 2909–2917. https://doi.org/10.1093/bioinformatics/btt473

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539

Li, J., & Lu, Z. (2012). Systematic identification of pharmacogenomics information from clinical trials. Journal of Biomedical Informatics, 45(5), 870–878. https://doi.org/10.1016/j.jbi.2012.02.008

Li, J., Zheng, S., Chen, B., et al. (2016). A survey of current trends in computational drug repositioning. Briefings in Bioinformatics, 17(1), 2–12. https://doi.org/10.1093/bib/bbv020

Luo, H., Li, M., Wang, S., et al. (2018). Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics, 34(11), 1904–1912. https://doi.org/10.1093/bioinformatics/bty013

Menden, M. P., Iorio, F., Garnett, M., McDermott, U., Benes, C. H., Ballester, P. J., & Saez-Rodriguez, J. (2013). Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS ONE, 8(4), e61318. https://doi.org/10.1371/journal.pone.0061318

Napolitano, F., Zhao, Y., Moreira, V. M., et al. (2013). Drug repositioning: A machine-learning approach through data integration. Journal of Cheminformatics, 5(1), 30. https://doi.org/10.1186/1758-2946-5-30

Nascimento, A. C., Prudêncio, R. B., & Costa, I. G. (2016). A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics, 17, 46. https://doi.org/10.1186/s12859-016-0890-3

Pushpakom, S., Iorio, F., Eyers, P. A., et al. (2019). Drug repurposing: Progress, challenges and recommendations. Nature Reviews Drug Discovery, 18(1), 41–58. https://doi.org/10.1038/nrd.2018.168

Schneider, G. (2018). Automating drug discovery. Nature Reviews Drug Discovery, 17(2), 97–113. https://doi.org/10.1038/nrd.2017.232

Sirota, M., Dudley, J. T., Kim, J., Chiang, A. P., Morgan, A. A., Sweet-Cordero, A., ... & Butte, A. J. (2011). Discovery and preclinical validation of drug indications using compendia of public gene expression data. Science Translational Medicine, 3(96), 96ra77. https://doi.org/10.1126/scitranslmed.3001318

Subramanian, A., Narayan, R., Corsello, S. M., et al. (2017). A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell, 171(6), 1437–1452. https://doi.org/10.1016/j.cell.2017.10.049

Wishart, D. S., Feunang, Y. D., Guo, A. C., et al. (2018). DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Research, 46(D1), D1074–D1082. https://doi.org/10.1093/nar/gkx1037

Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., & Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240. https://doi.org/10.1093/bioinformatics/btn162

Yamanishi, Y., Araki, M., Gutteridge, A., Honda, W., & Kanehisa, M. (2008). Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240. https://doi.org/10.1093/bioinformatics/btn162

Yamanishi, Y., Kotera, M., Kanehisa, M., et al. (2010). Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics, 26(12), i246–i254. https://doi.org/10.1093/bioinformatics/btq176


Article metrics
View details
0
Downloads
0
Citations
13
Views

View Dimensions


View Plumx


View Altmetric



0
Save
0
Citation
13
View
0
Share