Transformer-Based Deep Learning for Protein Structure and Function Prediction: From Sequence Understanding to Biological Insight: A Review

Simon J. Moore

doi:10.25163/bioinformatics.3110734

Bioinfo Chem

System biology and Infochemistry | Online ISSN 3071-4826

Citations

31.7k

Views

Articles

Submit

Volume 3 Number 1 2021

REVIEWS (Open Access)

Previous Next Contents Vol 3 (1)

Transformer-Based Deep Learning for Protein Structure and Function Prediction: From Sequence Understanding to Biological Insight: A Review

Abstract 1. Introduction 2. Methodology 3. Large Language Models in Bioinformatics: A Linguistic Turn in Understanding Biological Systems 4. Synthesizing the Computational Evolution of Protein Science: From Alignment Heuristics to Representation Intelligence 5. Limitations 6. Conclusion Author Contributions References

Simon J. Moore ¹*

+ Author Affiliations

Bioinfo Chem 3 (1) 1-12 https://doi.org/10.25163/bioinformatics.3110734

Submitted: 20 November 2020 Revised: 12 January 2021 Accepted: 21 January 2021 Published: 23 January 2021

Abstract

There is, perhaps, something quietly transformative happening in how we understand proteins. For decades, the field relied on a combination of experimental precision and evolutionary inference—methods that were undeniably powerful, yet often limited by scale, cost, and the boundaries of known biology. What has changed, more recently, is not just the volume of data, but the way we interpret it. This review explores the emergence of Transformer-based deep learning models as a turning point in protein science, where sequences are no longer treated merely as biochemical strings, but as a form of language—structured, contextual, and, to some extent, interpretable. At the center of this shift lies the idea that long-range dependencies—once difficult to capture—can now be modeled directly through attention mechanisms. These models appear capable of extracting structural and functional signals from raw sequences alone, sometimes without explicit evolutionary guidance. And yet, their success raises questions that feel as important as the answers they provide: what exactly are these systems learning, and how reliably can we trust their predictions? By tracing the evolution from alignment-based methods to large-scale representation learning, this review attempts to situate Transformer models within a broader computational narrative. It suggests that we are moving—perhaps cautiously—toward a framework where biological complexity can be read, predicted, and even designed with increasing fluency.Keywords: Transformer models; Protein structure prediction; Protein language models; Bioinformatics; Deep learning

References

Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215(3), 403–410.

Altschul, S. F., Madden, T. L., Schäffer, A. A., J. Zhang, Z. Zhang, Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.

Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science, 181(4096), 223–230.

Armenteros, J. J. A., Sønderby, C. K., Sønderby, S. K., Nielsen, H., & Winther, O. (2017). DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics, 33(21), 3387–3395.

Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., ... & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Research, 28(1), 235–242.

Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508–519.

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 837–845.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Eddy, S. R. (1998). Profile hidden Markov models. Bioinformatics, 14(9), 755–763.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.

Finn, R. D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R. Y., Eddy, S. R., ... & Punta, M. (2013). Pfam: the protein families database. Nucleic Acids Research, 42(D1), D222–D230.

Finn, R. D., Clements, J., & Eddy, S. R. (2011). HMMER web server: Interactive sequence similarity searching. Nucleic Acids Research, 39(suppl_2), W29–W37.

Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, 292(2), 195–202.

Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22(12), 2577–2637.

Levinthal, C. (1968). Are there pathways for protein folding? Journal de Chimie Physique, 65, 44–45.

Maiorov, V. N., & Crippen, G. M. (1994). Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins. Journal of Molecular Biology, 235(2), 625–634.

Matthews, B. W. (1975). Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2), 442–451.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., & Tramontano, A. (2018). Critical assessment of methods of protein structure prediction (CASP)—Round XII. Proteins: Structure, Function, and Bioinformatics, 86, 7–15.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI Technical Report.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI.

Remmert, M., Biegert, A., Hauser, A., & Söding, J. (2012). HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nature Methods, 9(2), 173–175.

Rocklin, G. J., Chidyausiku, T. M., Goreshnik, I., Ford, A., Houliston, S., Lemak, A., ... & Baker, D. (2017). Global analysis of protein folding using massively parallel design, synthesis, and testing. Science, 357(6347), 168–175.

Sarkisyan, K. S., Bolotin, D. A., Meer, M. V., Usmanova, D. R., Mishin, A. S., Sharonov, G. V., ... & Kondrashov, F. A. (2016). Local fitness landscape of the green fluorescent protein. Nature, 533(7603), 397–401.

Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B., & Wu, C. H. (2015). UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics, 31(6), 926–932.

Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., ... & von Mering, C. (2015). STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Research, 43(D1), D447–D452.

Transformer-Based Deep Learning for Protein Structure and Function Prediction: From Sequence Understanding to Biological Insight: A Review

UniProt Consortium. (2015). UniProt: A hub for protein information. Nucleic Acids Research, 43(D1), D204–D212.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998–6008.

Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., ... & Schwede, T. (2018). SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Research, 46(W1), W296–W303.

Yang, J., Roy, A., & Zhang, Y. (2013). BioLiP: A semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Research, 41(D1), D1096–D1103.

Zemla, A. (2003). LGA: a method for finding 3D structural similarities of macromolecules. Nucleic Acids Research, 31(13), 3370–3374.

Zhang, Y., & Skolnick, J. (2004). Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics, 57(4), 702–710.

Bioinfo Chem

Article Contents

Transformer-Based Deep Learning for Protein Structure and Function Prediction: From Sequence Understanding to Biological Insight: A Review

Abstract

References

Recommended articles

Stay connected