Multiview Learning for Omics Data Integration: From Multi-Modal Data Fusion to Systems-Level Biological Insights

Constance B. Bailey

doi:10.25163/bioinformatics.3110736

Bioinfo Chem

System biology and Infochemistry | Online ISSN 3071-4826

Citations

31.7k

Views

Articles

Submit

Volume 3 Number 1 2021

REVIEWS (Open Access)

Previous Next Contents Vol 3 (1)

Multiview Learning for Omics Data Integration: From Multi-Modal Data Fusion to Systems-Level Biological Insights

Abstract 1. Introduction 2. Methodology 3. Biomedical Multi-view Learning. 4. Synthesising the Landscape of Multi-View Omics Integration 5. Limitations 6. Conclusion Author Contributions References

Constance B. Bailey 1*

+ Author Affiliations

Bioinfo Chem 3 (1) 1-12 https://doi.org/10.25163/bioinformatics.3110736

Submitted: 28 July 2021 Revised: 17 September 2021 Accepted: 24 September 2021 Published: 26 September 2021

Abstract

The rapid expansion of omics technologies has, somewhat paradoxically, both clarified and complicated our understanding of biological systems. While genomics, transcriptomics, proteomics, and related modalities provide unprecedented detail, each—on its own—seems to capture only a fragment of a much larger, deeply interconnected biological narrative. It is within this tension that multiview learning has begun to emerge, not as a definitive solution, but rather as a flexible and evolving framework for integration.This review explores how multiview learning approaches attempt to reconcile heterogeneous, high-dimensional omics datasets into coherent representations of biological systems. We examine the conceptual foundations underlying integration—particularly the balance between consensus and complementarity—and trace the progression from classical statistical models, such as canonical correlation analysis, to more recent deep learning architectures. Along the way, we consider three dominant fusion strategies—early, intermediate, and late integration—each offering distinct advantages and limitations. Particular attention is given to how these methods address persistent challenges, including dimensionality imbalance, modality heterogeneity, and data incompleteness. Through synthesis of methodological and application-oriented studies, this review highlights the growing role of multiview learning in areas such as cancer subtyping, biomarker discovery, and drug response prediction. Ultimately, the field appears to be shifting—quietly but decisively—toward a central insight: that meaningful biological understanding increasingly depends not on individual data layers, but on how effectively they are integrated.

Keywords: Multi-omics integration; Multiview learning; Data fusion; Systems biology; Machine learning

References

Ahmad, A., & Fröhlich, H. (2016). Integrating heterogeneous omics data via statistical inference and learning techniques. gene expression, 4, 5.

Andrew, G., Arora, R., Bilmes, J., & Livescu, K. (2013). Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning (pp. 1247–1255).

Argelaguet, R., Velten, B., Arnol, D., Dietrich, S., Zenz, T., Marioni, J. C., ... & Stegle, O. (2018). Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Molecular Systems Biology, 14(6), e8124.

Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., ... & Cancer Genome Atlas Network. (2012). The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483(7391), 603–607.

Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the 4th IEEE International Conference on Data Mining (pp. 19–26).

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (pp. 92–100).

Gomez-Cabrero, D., Abugessaisa, I., Maier, D., Teschendorff, A., Merkenschlager, M., Gisel, A., ... & Tegnér, J. (2014). Data integration in the era of omics: Current and future challenges. BMC Systems Biology, 8(Suppl 2), I1.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

Hasin, Y., Seldin, M., & Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology, 18, 1–15.

Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., ... & Cancer Genome Atlas Network. (2014). Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158(4), 929–944.

Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4), 321–377.

Li, Y., Wu, F. X., & Ngom, A. (2016). A review on machine learning principles for multi-view biological data integration. Briefings in Bioinformatics, 19(2), 325–340.

Liang, M., Li, Z., Chen, T., & Zeng, J. (2014). Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(4), 928–937.

Lock, E. F., Hoadley, K. A., Marron, J. S., & Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics, 7(1), 523–542.

Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., ... & Shen, R. (2013). Pattern discovery and cancer gene identification in integrated cancer genomic data. Proceedings of the National Academy of Sciences, 110(11), 4245–4250.

Nguyen, T., Tagett, R., Diaz, D., & Draghici, S. (2017). A novel approach for data integration and disease subtyping. Genome Research, 27(12), 2025–2039.

Pavlidis, P., Weston, J., Cai, J., & Grundy, W. N. (2001). Gene functional classification from heterogeneous data. In Proceedings of the 5th International Conference on Computational Molecular Biology (pp. 242–248).

Rappoport, N., & Shamir, R. (2018). Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Research, 46(20), 10546–10562.

Seung, H. S., & Lee, D. D. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.

Shen, R., Olshen, A. B., & Ladanyi, M. (2009). Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25(22), 2906–2912.

Sun, S. (2013). A survey of multi-view machine learning. Neural Computing and Applications, 23(7), 2031–2038.

Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5), 988–999.

Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., ... & Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3), 333–337.

Witten, D. M., & Tibshirani, R. J. (2009). Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology, 8(1), Article 28.

Yang, Z., & Michailidis, G. (2016). A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics, 32(1), 1–8.

Zitnik, M., & Zupan, B. (2015). Data fusion by matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 41–53.

Bioinfo Chem

Article Contents

Multiview Learning for Omics Data Integration: From Multi-Modal Data Fusion to Systems-Level Biological Insights

Abstract

References

Recommended articles

Stay connected