Prediction of Protein–Metal Ion-Binding Sites Using Sequence Homology and Machine-Learning Methods
Zihan Tian 1, Cao Wei 1, Yutaka Moriwaki 1, Tohru Terada 1, Shugo Nakamura 1, Kazuya Sumikoshi 1, Fang Chun 1, and Kentaro Shimizu 1*
Advanced Bioinformatics & Chemistry 1(1) 025-036 https://doi.org/10.25163/abc.11208022130119
Submitted: 21 July 2019 Revised: 22 August 2019 Published: 06 September 2019
Abstract
Metal ions are essential for metalloproteins to perform their catalytic or structural functions. To understand their role in protein function, it is important to identify metal ion-binding sites. Because experimental identification is labor-intensive and time-consuming, computational methods are expected to be used in the prediction of protein–metal ion-binding sites. A range of computational methods have been proposed to predict metal ion-binding sites from protein sequences. In this study, we implemented two methods of predicting metal ion-binding sites for Ca2+, Co2+, Cu2+, Cu+, Fe3+, Fe2+, Hg2+, Mg2+, Mn2+, Ni2+, and Zn2+ from amino acid sequences. One is a homology-based method, and the other is a machine-learning method. The homology-based method predicts the binding sites from homologous sequences obtained by a protein–protein basic local alignment search tool (BLASTP) search. The machine-learning method uses a support vector machine with three protein sequence features. Our results showed that the homology-based method achieved an accuracy of 0.9905 and a specificity of 0.9978, while the machine-learning method showed balanced performance with regard to accuracy, sensitivity, and specificity. Especially, the sensitivity of the machine-learning method was 0.8239, and many metal ion-binding sites were predicted only by the machine-learning method.
Keywords: protein, metal ion, binding site prediction, machine learning, homology search
References
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, DJ. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. https://doi.org/10.1093/nar/25.17.3389 PMid:9254694 PMCid:PMC146917 |
||||
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. (2000). The protein data bank. Nucleic Acids Res. 28, 235-242. https://doi.org/10.1093/nar/28.1.235 PMid:10592235 PMCid:PMC102472 |
||||
Binet, M.R.B., Ma, R., McLeod, C.W., Poole, R.K. (2003). Detection and characterization of zinc-and cadmium-binding proteins in Escherichia coli by gel electrophoresis and laser ablation-inductively coupled plasma-mass spectrometry. Anal. Biochem. 318, 30-38. https://doi.org/10.1016/S0003-2697(03)00190-8 |
||||
Boser, B.E., Guyon, I.M., Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. Proc. the fifth Annual Workshop on Computational Learning Theory. ACM. 25, 144-152. https://doi.org/10.1145/130385.130401 |
||||
Chen, Z., Wang, Y., Zhai, Y.F., Song, J., Zhang, Z. (2013). ZincExplorer: an accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences. Mol. Biosyst. 9, 2213-2222. https://doi.org/10.1039/c3mb70100j PMid:23861030 |
||||
Cooper, G.M., Hausman, R.E. (2007). The cell: Molecular approach. ASM Press, Washington, D.C. | ||||
Degtyarenko, K. (2000). Bioinorganic motifs: towards functional classification of metalloproteins. Bioinformatics. 16, 851-864. https://doi.org/10.1093/bioinformatics/16.10.851 PMid:11120676 |
||||
Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W. (2012). CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 28, 3150-3152. https://doi.org/10.1093/bioinformatics/bts565 PMid:23060610 PMCid:PMC3516142 |
||||
Goyal, K., Mande, S.C. (2008). Exploiting 3D structural templates for detection of metal-binding sites in protein structures. Proteins. 70, 1206-1218. https://doi.org/10.1002/prot.21601 PMid:17847089 |
||||
Greenough, L., Schermerhorn, K.M., Mazzola, L., Bybee, J., Rivizzigno, D., Cantin, E., Slatko, B.E., Gardner, A.F. (2015). Adapting capillary gel electrophoresis as a sensitive, high-throughput method to accelerate characterization of nucleic acid metabolic enzymes. Nucleic Acids Res. 44, e15-e15. https://doi.org/10.1093/nar/gkv899 PMid:26365239 PMCid:PMC4737176 |
||||
Haberal, I., Ogul, H. (2019). Prediction of Protein Metal Binding Sites Using Deep Neural Networks. Mol. Inform. 38, e1800169. https://doi.org/10.1002/minf.201800169 PMid:30977960 |
||||
Hellman, L.M., Fried, M.G. (2007). Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions. Nat. Protoc. 2, 1849. https://doi.org/10.1038/nprot.2007.249 PMid:17703195 PMCid:PMC2757439 |
||||
Herald, V.L., Heazlewood, J.L., Day, D.A., Millar, A.H. (2003). Proteomic identification of divalent metal cation-binding proteins in plant mitochondria. FEBS Lett. 537, 96-100. https://doi.org/10.1016/S0014-5793(03)00101-7 |
||||
Holm, R.H., Kennepohl, P., Solomon, E.I. (1996). Structural and functional aspects of metal sites in biology. Chem. Rev. 96, 2239-2314. https://doi.org/10.1021/cr9500390 PMid:11848828 |
||||
Jensen, M.R., Petersen, G., Lauritzen, C., Pedersen, J., Led, J.J. (2005). Metal binding sites in proteins: identification and characterization by paramagnetic NMR relaxation. Biochemistry. 44, 11014-11023. https://doi.org/10.1021/bi0508136 PMid:16101285 |
||||
Korshin, G., Chow, C.W.K., Fabris, R., Drikas, M. (2009). Absorbance spectroscopy-based examination of effects of coagulation on the reactivity of fractions of natural organic matter with varying apparent molecular weights. Water Res. 43, 1541-1548. https://doi.org/10.1016/j.watres.2008.12.041 PMid:19131089 |
||||
Kumar, S. (2017). Prediction of metal ion binding sites in proteins from amino acid sequences by using simplified amino acid alphabets and random forest model. Genomics Inform. 15, 162-169. https://doi.org/10.5808/GI.2017.15.4.162 PMid:29307143 PMCid:PMC5769865 |
||||
Lin, C.T., Lin, K.L., Yang, C.H., Chung, I.F., Huang, C.D., Yang, Y.S. (2005). Protein metal binding residue prediction based on neural networks. Int. J. Neural Syst. 15, 71-84. https://doi.org/10.1142/S0129065705000116 PMid:15912584 |
||||
Lu, C., Lin, Y., Lin, J., Yu, C. (2012). Prediction of Metal Ion-Binding Sites in Proteins Using the Fragment Transformation Method. PLoS ONE. 7, e39252. https://doi.org/10.1371/journal.pone.0039252 PMid:22723976 PMCid:PMC3377655 |
||||
Matthews, J.M., Loughlin, F.E., Mackay, J.P. (2008). Designed metal-binding sites in biomolecular and bioinorganic interactions. Curr. Opin. Struct. Biol. 18, 484-490. https://doi.org/10.1016/j.sbi.2008.04.009 PMid:18554898 |
||||
Passerini, A., Andreini, C., Menchetti, S., Rosato, A., Frasconi, P. (2007). Predicting zinc binding at the proteome level. BMC Bioinformatics. 8, 39. https://doi.org/10.1186/1471-2105-8-39 PMid:17280606 PMCid:PMC1800866 |
||||
Passerini, A., Punta, M., Ceroni, A., Rost, B., Frasconi, P. (2006). Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins. 65, 305-316. https://doi.org/10.1002/prot.21135 PMid:16927295 |
||||
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res. 12, 2825-2830. | ||||
Schymkowitz, J.W.H., Rousseau, F., Martins, I.C., Ferkinghoff-Borg, J., Stricher, F., Serrano, L. (2005). Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Nucleic Acids Res. 102, 10147-10152. https://doi.org/10.1073/pnas.0501980102 PMid:16006526 PMCid:PMC1177371 |
||||
Srivastava, A., Kumar, M. (2018). Prediction of zinc binding sites in proteins using sequence derived information. J. Biomol. Struct. Dyn. 36, 4413-4423. https://doi.org/10.1080/07391102.2017.1417910 PMid:29241411 |
||||
Yan, R., Wang, X., Tian, Y., Xu, J., Xu, X., Lin, J. (2019). Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods. Molecular Omics. 15, 205-215. https://doi.org/10.1039/C9MO00043G PMid:31046040 |
||||
Zhu,D., Herbert, B.E., Schlautman, M.A., Carraway, E.R. (2004). Characterization of cation-π interactions in aqueous solution using deuterium nuclear magnetic resonance spectroscopy. J. Environ. Qual. 33, 276-284. https://doi.org/10.2134/jeq2004.2760 PMid:14964382 |
View Dimensions
View Altmetric
Save
Citation
View
Share