1. Introduction
The science of drug and chemical safety assessment—once grounded in a relatively stable set of experimental traditions—now appears to be undergoing a rather profound, if somewhat uneven, transformation. For decades, toxicological evaluation has relied predominantly on animal-based models, with endpoints that are, in many cases, overt and late-stage: mortality, organ pathology, or measurable biochemical disruption. These approaches have undeniably contributed to regulatory safety frameworks, yet their limitations have become increasingly difficult to overlook. High financial costs, extended study durations, and ethical concerns surrounding animal use have long been cited. Perhaps more importantly, however, there is a growing recognition that interspecies differences—in gene expression, metabolic capacity, and physiological regulation—complicate the extrapolation of animal findings to human outcomes, often contributing to unexpected failures in clinical development (Hartung, 2010; Leist et al., 2014).
This sense of inadequacy is not entirely new. As early as the late 20th century, discussions around improving risk assessment frameworks were already emerging (National Research Council [NRC], 1983). Yet, it was the landmark report Toxicity Testing in the 21st Century: A Vision and a Strategy that crystallized a more decisive shift in thinking. The report proposed, somewhat boldly at the time, that toxicology should transition from an observational discipline—focused on identifying adverse outcomes after exposure—to a predictive science rooted in mechanistic understanding at the cellular and molecular levels (Council, 2007). In retrospect, this proposal did not merely suggest methodological refinement; it called for a conceptual reorientation of the field.
The momentum generated by this vision has since been reinforced by regulatory and societal pressures. Legislative initiatives, particularly within the European Union, have accelerated the move away from animal testing, effectively compelling the scientific community to explore alternative methodologies. At the same time, advances in computational biology, high-throughput experimentation, and systems-level data integration have made such alternatives not only feasible but increasingly compelling. Still, the transition has been anything but linear. While enthusiasm for new approaches is evident, their validation, standardization, and regulatory acceptance remain ongoing challenges.
Central to this evolving landscape is the emergence of New Approach Methodologies (NAMs)—a collective term encompassing in vitro assays, in silico models, and high-content data streams derived from omics technologies. These approaches attempt to capture biological responses at a resolution that traditional methods often cannot achieve. For instance, high-throughput screening platforms, such as those developed under programs like ToxCast and Tox21, generate vast datasets describing chemical interactions across diverse biological targets (Kavlock et al., 2012). When combined with computational techniques, these datasets offer the potential to identify patterns and predictive signatures that would otherwise remain obscured.
Within this framework, the concept of the Adverse Outcome Pathway (AOP) has emerged as a particularly useful organizing principle. Rather than treating toxicity as a singular endpoint, AOPs describe a sequence of causally linked events, beginning with a molecular initiating event and progressing through intermediate biological changes to an adverse outcome at the organism level (Ankley et al., 2010). This structured representation allows researchers to focus on early, mechanistically relevant perturbations, thereby enabling earlier and potentially more sensitive predictions of toxicity. Yet, even here, one encounters a degree of complexity: biological systems are rarely linear, and mapping these pathways often involves navigating overlapping networks and context-dependent responses.
At the intersection of these developments lies an increasingly influential role for machine learning and computational modeling. Techniques such as random forests and other ensemble learning methods have demonstrated considerable utility in handling high-dimensional toxicological data (Breiman, 2001). Similarly, quantitative structure–activity relationship (QSAR) models and broader in silico toxicology approaches aim to predict chemical hazards based on molecular features and known biological interactions (Raies & Bajic, 2016). These tools, while powerful, are not without limitations. Their predictive performance depends heavily on data quality, representativeness, and the underlying assumptions embedded within model architectures. As such, their integration into regulatory decision-making continues to require careful scrutiny.
A persistent challenge in this domain—one that perhaps sits at the core of predictive toxicology—is the translation of in vitro findings to in vivo human contexts. Cellular assays, while highly informative, often lack the physiological complexity of whole organisms. They capture snapshots of biological activity but may not fully account for processes such as absorption, distribution, metabolism, and excretion (ADME). To address this, physiologically based pharmacokinetic (PBPK) modeling and quantitative in vitro-to-in vivo extrapolation (QIVIVE) have become essential components of modern toxicological workflows. These approaches aim to contextualize in vitro bioactivity within realistic exposure scenarios, effectively bridging the gap between experimental systems and human biology (Paini et al., 2017; Wetmore, 2015).
PBPK models, in particular, simulate the movement of chemicals through the body using mathematical representations of physiological compartments. When integrated with in vitro data, they allow for the estimation of internal exposure metrics, such as tissue concentrations over time. This integration supports the calculation of safety margins that are more directly relevant to human health than traditional external dose metrics (Bessems et al., 2014; Rotroff et al., 2010). Still, these models require extensive parameterization and validation, and uncertainties can propagate through each stage of the modeling process.
Large-scale collaborative initiatives have played a significant role in advancing these methodologies. Programs such as SEURAT-1 have sought to develop integrated testing strategies capable of replacing repeated-dose animal studies, emphasizing data sharing, methodological standardization, and interdisciplinary collaboration (Whelan & Schwarz, 2011). Similarly, efforts to create centralized data repositories and standardized workflows have improved the accessibility and reproducibility of toxicological research (Berggren et al., 2017). Yet, despite these advances, the aspiration of a fully animal-free regulatory paradigm remains, at least for now, an ongoing process rather than an accomplished reality.
What becomes increasingly apparent is that no single methodology—whether experimental or computational—can fully address the complexities of toxicological prediction. Instead, the field appears to be moving toward integrated, systems-level approaches, where diverse data types and modeling strategies are combined into cohesive frameworks. These “metamodels,” as they are sometimes described, attempt to unify mechanistic insights, exposure assessments, and predictive analytics into a single interpretive structure. The promise here is considerable: a more accurate, efficient, and ethically aligned approach to drug safety evaluation. And yet, the realization of this promise depends on overcoming substantial scientific, technical, and regulatory hurdles.
In this context, the integration of systems biology with machine learning does not merely represent a technological advancement; it reflects a broader shift in how toxicity itself is conceptualized. No longer seen as an isolated endpoint, toxicity is increasingly understood as an emergent property of complex biological networks responding to chemical perturbation. Capturing this complexity—without oversimplifying it—remains one of the central challenges of modern toxicology.