1. Introduction
The modern pharmaceutical landscape, despite its remarkable scientific advances, continues to grapple with what is often described—perhaps somewhat uneasily—as a persistent “productivity gap.” Put simply, the increasing financial and temporal investments in drug development have not translated into a proportional rise in successful therapeutic approvals. Estimates frequently suggest that bringing a single drug to market may require over a decade of research and upwards of billions of dollars, yet the probability of success remains strikingly low (Pushpakom et al., 2019). This imbalance, where effort expands but output stagnates, has prompted researchers and industry stakeholders alike to reconsider traditional paradigms of drug discovery.
Against this backdrop, drug repositioning—or repurposing—has gradually shifted from a peripheral strategy to something far more central. Rather than initiating discovery from scratch, repositioning seeks new therapeutic applications for compounds that are already approved or at least partially characterized. This approach, while conceptually simple, offers tangible advantages: reduced development timelines, lower financial risk, and a higher likelihood of clinical success due to pre-existing safety data (Ashburn & Thor, 2004; Pushpakom et al., 2019). Yet, despite these advantages, early repositioning efforts were rarely systematic. They were, more often than not, guided by clinical observation, serendipitous findings, or retrospective insights—valuable, certainly, but not easily reproducible or scalable.
Over time, however, the landscape began to shift. The rapid expansion of biomedical data—particularly with the advent of high-throughput technologies—has created opportunities that earlier generations of researchers could scarcely have anticipated. Massive repositories of genomic, transcriptomic, proteomic, and phenotypic data now exist, alongside clinical datasets derived from electronic health records and pharmacovigilance systems (Lamb et al., 2006). Yet, paradoxically, the very abundance of this data has introduced a new challenge: complexity. Traditional analytical approaches, often grounded in linear assumptions and limited datasets, struggle to extract meaningful patterns from such high-dimensional and heterogeneous information.
It is within this context that artificial intelligence (AI) has emerged—not as a singular solution, perhaps, but as a compelling set of tools capable of navigating this complexity. Machine learning (ML) and deep learning (DL), in particular, have demonstrated an ability to uncover relationships that might otherwise remain obscured. Unlike conventional statistical models, which typically require predefined hypotheses, AI systems can operate in a more exploratory manner, identifying latent associations across diverse datasets (LeCun et al., 2015; Chen et al., 2018). This shift—from hypothesis-driven to data-driven discovery—marks a subtle yet significant transformation in how drug repurposing is approached.
Early computational efforts in drug repositioning laid important groundwork. Methods integrating chemical, genomic, and pharmacological data began to show that drug–target interactions could be predicted with reasonable accuracy using algorithmic approaches (Yamanishi et al., 2008; Yamanishi et al., 2010). Subsequent developments introduced supervised learning frameworks, such as bipartite local models, which further refined the prediction of drug–target interactions (Bleakley & Yamanishi, 2009). These approaches, while promising, were still constrained by the limitations of feature engineering and the availability of curated datasets.
More recently, machine learning models such as Support Vector Machines and Random Forests have been applied to predict drug efficacy, side effects, and molecular interactions, often leveraging integrated datasets that combine chemical properties with biological activity profiles (Napolitano et al., 2013; Kai & Hon-Cheong, 2018). These models, although powerful, typically rely on structured inputs and may struggle with unstructured or highly complex data types.
Deep learning, by contrast, has introduced a different paradigm. By employing multilayered neural networks, DL models can automatically learn hierarchical representations from raw data, whether those data are molecular structures, gene expression profiles, or clinical records (Aliper et al., 2016; LeCun et al., 2015). This capability is particularly valuable in drug repurposing, where relationships between drugs and diseases are rarely linear or straightforward. Indeed, deep neural networks have shown promise in capturing intricate, nonlinear dependencies that are essential for understanding drug–target interactions in multifactorial diseases such as cancer (Chen et al., 2018).
In parallel, network-based approaches have gained traction as a means of conceptualizing biological systems not as isolated components, but as interconnected networks. These methods often rely on the assumption that diseases can be represented as modules within molecular interaction networks, and that drugs targeting proteins proximal to these modules may exert therapeutic effects (Cheng et al., 2012). Techniques such as network propagation and clustering enable researchers to identify candidate drugs by examining their position within these networks, effectively shifting the focus from individual targets to system-level interactions. While this perspective is not entirely new, its integration with AI methodologies has significantly enhanced its predictive capacity.
Despite these advances, it would be overly optimistic to suggest that AI-driven drug repurposing has fully realized its potential. Several challenges remain, and they are not trivial. One of the most persistent issues concerns data quality. Biomedical datasets, although abundant, are often fragmented, inconsistently annotated, and subject to varying degrees of noise (Li et al., 2016). This heterogeneity can compromise model performance, particularly when integrating data from multiple sources. Moreover, imbalanced datasets—where certain diseases or drug classes are overrepresented—can introduce bias, leading to skewed predictions and potentially overlooking novel therapeutic opportunities.
Another, perhaps more conceptual, challenge lies in the interpretability of AI models. Many deep learning architectures function as “black boxes,” producing predictions without offering clear explanations for how those predictions were derived (Schneider, 2018). While high predictive accuracy is certainly valuable, the lack of transparency can hinder clinical adoption. Clinicians and regulatory bodies, understandably, require not only evidence of efficacy but also a mechanistic understanding of how a drug exerts its effects.
Validation, too, remains a critical bottleneck. Although numerous computational studies report promising results, relatively few predictions successfully transition to experimental or clinical validation. This gap—between in silico prediction and real-world application—reflects both methodological limitations and practical constraints (Brown & Patel, 2018). For instance, factors such as drug dosage, bioavailability, and tissue-specific effects are often not fully captured in computational models, yet they play a decisive role in clinical outcomes.
Taken together, these considerations suggest that while AI has undoubtedly transformed the landscape of drug repurposing, its integration into clinical practice remains an ongoing process—one that requires not only technical refinement but also interdisciplinary collaboration. The field stands, in a sense, at an inflection point: rich with potential, yet still navigating the complexities of translation from algorithmic insight to therapeutic reality.