1. Introduction
In recent years, the trajectory of biological research—perhaps somewhat quietly at first, and then unmistakably—has shifted toward a data-intensive paradigm. What once relied heavily on reductionist, single-layer observations has now expanded into a multidimensional landscape shaped by high-throughput sequencing and advanced analytical platforms. The emergence of “omics” technologies—spanning genomics, transcriptomics, proteomics, metabolomics, and epigenomics—has not merely added volume to biological data; it has fundamentally altered how we conceptualize biological systems themselves. Rather than discrete, independently functioning components, genes, proteins, and metabolites are increasingly understood as elements of deeply interconnected networks that co-evolve and co-regulate cellular behavior .
Yet, despite the promise embedded in this abundance of data, a certain tension persists. Single-omics studies, while undeniably informative, often provide only fragmented glimpses into biological processes. They capture snapshots—sometimes precise, sometimes noisy—but rarely the full dynamic interplay that governs phenotype expression or disease progression. This limitation has prompted a growing recognition: to approach something resembling a systems-level understanding, one must move beyond isolated data layers and instead integrate them in a coherent, biologically meaningful way (Hasin et al., 2017; Gomez-Cabrero et al., 2014).
It is within this context that multiview learning has emerged—not as a singular solution, but rather as a conceptual and computational framework capable of navigating the complexity of multi-omics data. At its core, multiview learning treats each omics dataset as a distinct “view” of the same underlying biological entity. These views, while individually informative, collectively encode a richer, more nuanced representation of biological reality. The challenge, however, lies in how to reconcile their differences—differences in scale, noise structure, dimensionality, and even biological interpretation (Li et al., 2016; Sun, 2013).
The difficulty is not trivial. Biological datasets are often characterized by what is commonly referred to as the “large p, small n” problem—thousands, sometimes tens of thousands, of features measured across relatively few samples. This imbalance introduces statistical instability and increases the risk of overfitting. Compounding this issue is the heterogeneity inherent in multi-omics data: transcriptomic measurements may not align neatly with proteomic outputs, and epigenetic modifications may obscure or modulate gene expression in ways that are not immediately apparent. Such discrepancies are not merely technical artifacts; they reflect genuine biological complexity that integration models must account for rather than ignore (Ahmad & Fröhlich, 2016; Rappoport & Shamir, 2018).
To address these challenges, several integration strategies have been proposed, each with its own conceptual strengths and limitations. Early integration—often described as feature-level fusion—combines data from multiple omics layers into a single matrix before analysis. While appealing in its simplicity, this approach tends to obscure modality-specific structures and may amplify noise when datasets differ substantially in scale or distribution (Pavlidis et al., 2001). Late integration, in contrast, analyzes each omics layer independently and subsequently combines the results, such as clustering outputs or predictive scores. This method preserves modality-specific insights but may miss subtle cross-modal interactions that only become apparent when data are jointly modeled (Bickel & Scheffer, 2004).
Somewhere between these two extremes lies intermediate integration, which arguably represents the most conceptually compelling approach. Here, data are integrated during the learning process itself, allowing models to capture both shared and modality-specific patterns simultaneously. Techniques such as Joint and Individual Variation Explained (JIVE) and multi-omics factor analysis exemplify this strategy, decomposing datasets into components that reflect common biological signals as well as unique variations (Lock et al., 2013; Argelaguet et al., 2018). This dual representation is particularly valuable in biological contexts, where both consensus and diversity across data types carry meaningful information.
Indeed, the theoretical underpinnings of multiview learning often revolve around two complementary principles: the consensus principle and the complementary principle. The former emphasizes agreement across views, seeking latent representations that are consistent across multiple data modalities. The latter, however, acknowledges that each omics layer captures distinct aspects of biological function—DNA methylation patterns, for instance, may reveal regulatory mechanisms that are invisible at the transcriptomic level. Effective integration, therefore, requires a careful balancing act: extracting shared structure without discarding modality-specific insights (Blum & Mitchell, 1998; Li et al., 2016).
Historically, a range of statistical methods laid the groundwork for multiview integration. Canonical Correlation Analysis (CCA), introduced by Hotelling (1936), provided one of the earliest frameworks for identifying relationships between paired datasets. Its modern extensions, including sparse CCA, have been widely կիրառեդ in genomic studies to uncover correlated patterns across omics layers (Witten & Tibshirani, 2009). Similarly, matrix factorization techniques—such as Non-negative Matrix Factorization (NMF)—have enabled the decomposition of complex datasets into interpretable components, facilitating the identification of underlying biological processes (Seung & Lee, 1999; Zitnik & Zupan, 2015).
Network-based approaches have also gained prominence, particularly in the form of Similarity Network Fusion (SNF), which constructs sample similarity networks for each data type and integrates them into a unified representation. This method has demonstrated notable success in cancer subtyping, where integrating genomic, transcriptomic, and epigenomic data can reveal clinically relevant patient clusters that are not apparent from single-omics analyses alone (Wang et al., 2014; Hoadley et al., 2014).
More recently, the advent of deep learning has expanded the methodological landscape even further. Models such as deep canonical correlation analysis and variational autoencoders have introduced the capacity to model highly non-linear relationships across data modalities. These approaches, while often criticized for their limited interpretability, have shown considerable promise in tasks such as disease classification, biomarker discovery, and drug response prediction (Andrew et al., 2013). Their ability to learn latent representations that capture complex cross-modal interactions suggests a powerful, albeit still evolving, direction for multi-omics integration.
And yet, despite these advances, certain questions remain unresolved. How can we ensure that integrated representations are not only statistically robust but also biologically interpretable? To what extent do current models capture causal relationships rather than mere correlations? And perhaps most importantly, how can these computational frameworks be translated into clinically actionable insights?
This review, therefore, seeks to navigate these questions by examining the evolution of multiview learning approaches for omics data integration. By tracing the progression from classical statistical models to contemporary deep learning architectures, we aim to highlight both the conceptual continuity and the methodological innovation that define this field. In doing so, we hope to underscore a central, if somewhat tentative, conclusion: that meaningful biological understanding increasingly depends not on the depth of individual data layers, but on the coherence with which they are integrated.