System and chemical biology
RESEARCH ARTICLE   (Open Access)

Prediction of Protein–Metal Ion-Binding Sites Using Sequence Homology and Machine-Learning Methods

Zihan Tian 1, Cao Wei 1, Yutaka Moriwaki 1, Tohru Terada 1, Shugo Nakamura 1, Kazuya Sumikoshi 1, Fang Chun 1, and Kentaro Shimizu 1*

+ Author Affiliations

Advanced Bioinformatics & Chemistry 1(1) 025-036 https://doi.org/10.25163/abc.11208022130119

Submitted: 21 July 2019  Revised: 22 August 2019  Published: 06 September 2019 

Abstract

Metal ions are essential for metalloproteins to perform their catalytic or structural functions. To understand their role in protein function, it is important to identify metal ion-binding sites. Because experimental identification is labor-intensive and time-consuming, computational methods are expected to be used in the prediction of protein–metal ion-binding sites. A range of computational methods have been proposed to predict metal ion-binding sites from protein sequences. In this study, we implemented two methods of predicting metal ion-binding sites for Ca2+, Co2+, Cu2+, Cu+, Fe3+, Fe2+, Hg2+, Mg2+, Mn2+, Ni2+, and Zn2+ from amino acid sequences. One is a homology-based method, and the other is a machine-learning method. The homology-based method predicts the binding sites from homologous sequences obtained by a protein–protein basic local alignment search tool (BLASTP) search. The machine-learning method uses a support vector machine with three protein sequence features. Our results showed that the homology-based method achieved an accuracy of 0.9905 and a specificity of 0.9978, while the machine-learning method showed balanced performance with regard to accuracy, sensitivity, and specificity. Especially, the sensitivity of the machine-learning method was 0.8239, and many metal ion-binding sites were predicted only by the machine-learning method.

Keywords: protein, metal ion, binding site prediction, machine learning, homology search
 

References

Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, DJ. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402.
https://doi.org/10.1093/nar/25.17.3389
PMid:9254694 PMCid:PMC146917
 
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. (2000). The protein data bank. Nucleic Acids Res. 28, 235-242.
https://doi.org/10.1093/nar/28.1.235
PMid:10592235 PMCid:PMC102472
 
Binet, M.R.B., Ma, R., McLeod, C.W., Poole, R.K. (2003). Detection and characterization of zinc-and cadmium-binding proteins in Escherichia coli by gel electrophoresis and laser ablation-inductively coupled plasma-mass spectrometry. Anal. Biochem. 318, 30-38.
https://doi.org/10.1016/S0003-2697(03)00190-8
 
Boser, B.E., Guyon, I.M., Vapnik, V.N. (1992). A training algorithm for optimal margin classifiers. Proc. the fifth Annual Workshop on Computational Learning Theory. ACM. 25, 144-152.
https://doi.org/10.1145/130385.130401
 
Chen, Z., Wang, Y., Zhai, Y.F., Song, J., Zhang, Z. (2013). ZincExplorer: an accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences. Mol. Biosyst. 9, 2213-2222.
https://doi.org/10.1039/c3mb70100j
PMid:23861030
 
Cooper, G.M., Hausman, R.E. (2007). The cell: Molecular approach. ASM Press, Washington, D.C.
 
Degtyarenko, K. (2000). Bioinorganic motifs: towards functional classification of metalloproteins. Bioinformatics. 16, 851-864.
https://doi.org/10.1093/bioinformatics/16.10.851
PMid:11120676
 
Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W. (2012). CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics. 28, 3150-3152.
https://doi.org/10.1093/bioinformatics/bts565
PMid:23060610 PMCid:PMC3516142
 
Goyal, K., Mande, S.C. (2008). Exploiting 3D structural templates for detection of metal-binding sites in protein structures. Proteins. 70, 1206-1218.
https://doi.org/10.1002/prot.21601
PMid:17847089
 
Greenough, L., Schermerhorn, K.M., Mazzola, L., Bybee, J., Rivizzigno, D., Cantin, E., Slatko, B.E., Gardner, A.F. (2015). Adapting capillary gel electrophoresis as a sensitive, high-throughput method to accelerate characterization of nucleic acid metabolic enzymes. Nucleic Acids Res. 44, e15-e15.
https://doi.org/10.1093/nar/gkv899
PMid:26365239 PMCid:PMC4737176
 
Haberal, I., Ogul, H. (2019). Prediction of Protein Metal Binding Sites Using Deep Neural Networks. Mol. Inform. 38, e1800169.
https://doi.org/10.1002/minf.201800169
PMid:30977960
 
Hellman, L.M., Fried, M.G. (2007). Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions. Nat. Protoc. 2, 1849.
https://doi.org/10.1038/nprot.2007.249
PMid:17703195 PMCid:PMC2757439
 
Herald, V.L., Heazlewood, J.L., Day, D.A., Millar, A.H. (2003). Proteomic identification of divalent metal cation-binding proteins in plant mitochondria. FEBS Lett. 537, 96-100.
https://doi.org/10.1016/S0014-5793(03)00101-7
 
Holm, R.H., Kennepohl, P., Solomon, E.I. (1996). Structural and functional aspects of metal sites in biology. Chem. Rev. 96, 2239-2314.
https://doi.org/10.1021/cr9500390
PMid:11848828
 
Jensen, M.R., Petersen, G., Lauritzen, C., Pedersen, J., Led, J.J. (2005). Metal binding sites in proteins: identification and characterization by paramagnetic NMR relaxation. Biochemistry. 44, 11014-11023.
https://doi.org/10.1021/bi0508136
PMid:16101285
 
Korshin, G., Chow, C.W.K., Fabris, R., Drikas, M. (2009). Absorbance spectroscopy-based examination of effects of coagulation on the reactivity of fractions of natural organic matter with varying apparent molecular weights. Water Res. 43, 1541-1548.
https://doi.org/10.1016/j.watres.2008.12.041
PMid:19131089
 
Kumar, S. (2017). Prediction of metal ion binding sites in proteins from amino acid sequences by using simplified amino acid alphabets and random forest model. Genomics Inform. 15, 162-169.
https://doi.org/10.5808/GI.2017.15.4.162
PMid:29307143 PMCid:PMC5769865
 
Lin, C.T., Lin, K.L., Yang, C.H., Chung, I.F., Huang, C.D., Yang, Y.S. (2005). Protein metal binding residue prediction based on neural networks. Int. J. Neural Syst. 15, 71-84.
https://doi.org/10.1142/S0129065705000116
PMid:15912584
 
Lu, C., Lin, Y., Lin, J., Yu, C. (2012). Prediction of Metal Ion-Binding Sites in Proteins Using the Fragment Transformation Method. PLoS ONE. 7, e39252.
https://doi.org/10.1371/journal.pone.0039252
PMid:22723976 PMCid:PMC3377655
 
Matthews, J.M., Loughlin, F.E., Mackay, J.P. (2008). Designed metal-binding sites in biomolecular and bioinorganic interactions. Curr. Opin. Struct. Biol. 18, 484-490.
https://doi.org/10.1016/j.sbi.2008.04.009
PMid:18554898
 
Passerini, A., Andreini, C., Menchetti, S., Rosato, A., Frasconi, P. (2007). Predicting zinc binding at the proteome level. BMC Bioinformatics. 8, 39.
https://doi.org/10.1186/1471-2105-8-39
PMid:17280606 PMCid:PMC1800866
 
Passerini, A., Punta, M., Ceroni, A., Rost, B., Frasconi, P. (2006). Identifying cysteines and histidines in transition-metal-binding sites using support vector machines and neural networks. Proteins. 65, 305-316.
https://doi.org/10.1002/prot.21135
PMid:16927295
 
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res. 12, 2825-2830.
 
Schymkowitz, J.W.H., Rousseau, F., Martins, I.C., Ferkinghoff-Borg, J., Stricher, F., Serrano, L. (2005). Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Nucleic Acids Res. 102, 10147-10152.
https://doi.org/10.1073/pnas.0501980102
PMid:16006526 PMCid:PMC1177371
 
Srivastava, A., Kumar, M. (2018). Prediction of zinc binding sites in proteins using sequence derived information. J. Biomol. Struct. Dyn. 36, 4413-4423.
https://doi.org/10.1080/07391102.2017.1417910
PMid:29241411
 
Yan, R., Wang, X., Tian, Y., Xu, J., Xu, X., Lin, J. (2019). Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods. Molecular Omics. 15, 205-215.
https://doi.org/10.1039/C9MO00043G
PMid:31046040
 
Zhu,D., Herbert, B.E., Schlautman, M.A., Carraway, E.R. (2004). Characterization of cation-π interactions in aqueous solution using deuterium nuclear magnetic resonance spectroscopy. J. Environ. Qual. 33, 276-284.
https://doi.org/10.2134/jeq2004.2760
PMid:14964382

PDF
Full Text
Export Citation

View Dimensions


View Plumx



View Altmetric



5
Save
0
Citation
1370
View
5
Share