Bioinfo Chem

System biology and Infochemistry | Online ISSN 3071-4826
1
Citations
13.7k
Views
32
Articles
Your new experience awaits. Try the new design now and help us make it even better
Switch to the new experience
REVIEWS   (Open Access)

Constance B. Bailey 1*

 

+ Author Affiliations

Bioinfo Chem 3 (1) 1-12 https://doi.org/10.25163/bioinformatics.3110736

Submitted: 28 July 2021 Revised: 17 September 2021  Accepted: 24 September 2021  Published: 26 September 2021 


Abstract

The rapid expansion of omics technologies has, somewhat paradoxically, both clarified and complicated our understanding of biological systems. While genomics, transcriptomics, proteomics, and related modalities provide unprecedented detail, each—on its own—seems to capture only a fragment of a much larger, deeply interconnected biological narrative. It is within this tension that multiview learning has begun to emerge, not as a definitive solution, but rather as a flexible and evolving framework for integration.This review explores how multiview learning approaches attempt to reconcile heterogeneous, high-dimensional omics datasets into coherent representations of biological systems. We examine the conceptual foundations underlying integration—particularly the balance between consensus and complementarity—and trace the progression from classical statistical models, such as canonical correlation analysis, to more recent deep learning architectures. Along the way, we consider three dominant fusion strategies—early, intermediate, and late integration—each offering distinct advantages and limitations. Particular attention is given to how these methods address persistent challenges, including dimensionality imbalance, modality heterogeneity, and data incompleteness. Through synthesis of methodological and application-oriented studies, this review highlights the growing role of multiview learning in areas such as cancer subtyping, biomarker discovery, and drug response prediction. Ultimately, the field appears to be shifting—quietly but decisively—toward a central insight: that meaningful biological understanding increasingly depends not on individual data layers, but on how effectively they are integrated.

Keywords: Multi-omics integration; Multiview learning; Data fusion; Systems biology; Machine learning

References

Ahmad, A., & Fröhlich, H. (2016). Integrating heterogeneous omics data via statistical inference and learning techniques. gene expression, 4, 5.

Andrew, G., Arora, R., Bilmes, J., & Livescu, K. (2013). Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning (pp. 1247–1255).

Argelaguet, R., Velten, B., Arnol, D., Dietrich, S., Zenz, T., Marioni, J. C., ... & Stegle, O. (2018). Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Molecular Systems Biology, 14(6), e8124.

Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A. A., ... & Cancer Genome Atlas Network. (2012). The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483(7391), 603–607.

Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the 4th IEEE International Conference on Data Mining (pp. 19–26).

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (pp. 92–100).

Gomez-Cabrero, D., Abugessaisa, I., Maier, D., Teschendorff, A., Merkenschlager, M., Gisel, A., ... & Tegnér, J. (2014). Data integration in the era of omics: Current and future challenges. BMC Systems Biology, 8(Suppl 2), I1.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

Hasin, Y., Seldin, M., & Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology, 18, 1–15.

Hoadley, K. A., Yau, C., Wolf, D. M., Cherniack, A. D., Tamborero, D., Ng, S., ... & Cancer Genome Atlas Network. (2014). Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 158(4), 929–944.

Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3/4), 321–377.

Li, Y., Wu, F. X., & Ngom, A. (2016). A review on machine learning principles for multi-view biological data integration. Briefings in Bioinformatics, 19(2), 325–340.

Liang, M., Li, Z., Chen, T., & Zeng, J. (2014). Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(4), 928–937.

Lock, E. F., Hoadley, K. A., Marron, J. S., & Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics, 7(1), 523–542.

Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., ... & Shen, R. (2013). Pattern discovery and cancer gene identification in integrated cancer genomic data. Proceedings of the National Academy of Sciences, 110(11), 4245–4250.

Nguyen, T., Tagett, R., Diaz, D., & Draghici, S. (2017). A novel approach for data integration and disease subtyping. Genome Research, 27(12), 2025–2039.

Pavlidis, P., Weston, J., Cai, J., & Grundy, W. N. (2001). Gene functional classification from heterogeneous data. In Proceedings of the 5th International Conference on Computational Molecular Biology (pp. 242–248).

Rappoport, N., & Shamir, R. (2018). Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Research, 46(20), 10546–10562.

Seung, H. S., & Lee, D. D. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.

Shen, R., Olshen, A. B., & Ladanyi, M. (2009). Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25(22), 2906–2912.

Sun, S. (2013). A survey of multi-view machine learning. Neural Computing and Applications, 23(7), 2031–2038.

Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5), 988–999.

Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., ... & Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nature Methods, 11(3), 333–337.

Witten, D. M., & Tibshirani, R. J. (2009). Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology, 8(1), Article 28.

Yang, Z., & Michailidis, G. (2016). A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics, 32(1), 1–8.

Zitnik, M., & Zupan, B. (2015). Data fusion by matrix factorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 41–53.


Article metrics
View details
0
Downloads
0
Citations
12
Views
📖 Cite article

View Dimensions


View Plumx


View Altmetric



0
Save
0
Citation
12
View
0
Share