Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Author(s): Xu Y;Xu Y;Xu Y; Liu D; Liu D; Liu D; Gong H; Gong H; Gong H
  • Source:
    Nature computational science [Nat Comput Sci] 2024 Nov; Vol. 4 (11), pp. 840-850. Date of Electronic Publication: 2024 Oct 25.
  • Publication Type:
    Journal Article
  • Language:
    English
  • Additional Information
    • Source:
      Publisher: Springer Nature Country of Publication: United States NLM ID: 101775476 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 2662-8457 (Electronic) Linking ISSN: 26628457 NLM ISO Abbreviation: Nat Comput Sci Subsets: MEDLINE
    • Publication Information:
      Original Publication: [New York, N.Y.] : Springer Nature, [2021]-
    • Subject Terms:
    • Abstract:
      Accurate prediction of protein mutation effects is of great importance in protein engineering and design. Here we propose GeoStab-suite, a suite of three geometric learning-based models-GeoFitness, GeoDDG and GeoDTm-for the prediction of fitness score, ΔΔG and ΔT m of a protein upon mutations, respectively. GeoFitness engages a specialized loss function to allow supervised training of a unified model using the large amount of multi-labeled fitness data in the deep mutational scanning database. To further improve the downstream tasks of ΔΔG and ΔT m prediction, the encoder of GeoFitness is reutilized as a pre-trained module in GeoDDG and GeoDTm to overcome the challenge of lacking sufficient labeled data. This pre-training strategy, in combination with data expansion, markedly improves model performance and generalizability. In the benchmark test, GeoDDG and GeoDTm outperform the other state-of-the-art methods by at least 30% and 70%, respectively, in terms of the Spearman correlation coefficient.
      Competing Interests: Competing interests: The authors declare no competing interests.
      (© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.)
    • References:
      Riesselman, A. J., Ingraham, J. B. & Marks, D. S. Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822 (2018). (PMID: 10.1038/s41592-018-0138-4)
      Dahiyat, B. I. & Mayo, S. L. De novo protein design: fully automated sequence selection. Science 278, 82–87 (1997). (PMID: 10.1126/science.278.5335.82)
      Tokuriki, N. & Tawfik, D. S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 19, 596–604 (2009). (PMID: 10.1016/j.sbi.2009.08.003)
      Pucci, F., Bourgeas, R. & Rooman, M. High-quality thermodynamic data on the stability changes of proteins upon single-site mutations. J. Phys. Chem. Ref. Data 45, 023104 (2016). (PMID: 10.1063/1.4947493)
      Yeoman, C. J. et al. in Advances in Applied Microbiology (eds Laskin, A. I. et al.) 1–55 (Elsevier, 2010); https://doi.org/10.1016/s0065-2164(10)70001-0.
      Kopanos, C. et al. VarSome: the human genomic variant search engine. Bioinformatics 35, 1978–1980 (2018). (PMID: 10.1093/bioinformatics/bty897)
      Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014). (PMID: 10.1038/nmeth.3027)
      Yang, K. K., Wu, Z. & Arnold, F. H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 16, 687–694 (2019). (PMID: 10.1038/s41592-019-0496-6)
      Luo, Y. et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat. Commun. 12, 5743 (2021). (PMID: 10.1038/s41467-021-25976-8)
      Li, M. et al. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering. J. Cheminform 15, 12 (2023). (PMID: 10.1186/s13321-023-00688-x)
      Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Proc. Advances in Neural Information Processing Systems Vol. 34 (eds Ranzato, M. et al.) 29287–29303 (Curran Associates, 2021).
      Rao, R. M. et al. MSA Transformer. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8844–8856 (PMLR, 2021).
      Mansoor, S., Baek, M., Juergens, D., Watson, J. L. & Baker, D. Zero-shot mutation effect prediction on protein stability and function using RoseTTAFold. Protein Sci. 32, e4780 (2023). (PMID: 10.1002/pro.4780)
      Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021). (PMID: 10.1073/pnas.2016239118)
      Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). (PMID: 10.1126/science.ade2574)
      Dehouck, Y. et al. Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics 25, 2537–2543 (2009). (PMID: 10.1093/bioinformatics/btp445)
      Montanucci, L., Capriotti, E., Frank, Y., Ben-Tal, N. & Fariselli, P. DDGun: an untrained method for the prediction of protein stability changes upon single and multiple point variations. BMC Bioinformatics 20, 335 (2019). (PMID: 10.1186/s12859-019-2923-1)
      Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005). (PMID: 10.1093/nar/gki387)
      Benevenuta, S., Pancotti, C., Fariselli, P., Birolo, G. & Sanavia, T. An antisymmetric neural network to predict free energy changes in protein variants. J. Phys. D Appl. Phys. 54, 245403 (2021). (PMID: 10.1088/1361-6463/abedfb)
      Li, B., Yang, Y. T., Capra, J. A. & Gerstein, M. B. Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks. PLoS Comput. Biol. 16, e1008291 (2020). (PMID: 10.1371/journal.pcbi.1008291)
      Pancotti, C. et al. A deep-learning sequence-based method to predict protein stability changes upon genetic variations. Genes 12, 911 (2021). (PMID: 10.3390/genes12060911)
      Fariselli, P., Martelli, P. L., Savojardo, C. & Casadio, R. INPS: predicting the impact of non-synonymous variations on protein stability from sequence. Bioinformatics 31, 2816–2821 (2015). (PMID: 10.1093/bioinformatics/btv291)
      Capriotti, E., Fariselli, P., Rossi, I. & Casadio, R. A three-state prediction of single point mutations on protein stability changes. BMC Bioinformatics 9, S6 (2008). (PMID: 10.1186/1471-2105-9-S2-S6)
      Chen, Y. et al. PremPS: predicting the impact of missense mutations on protein stability. PLoS Comput. Biol. 16, e1008543 (2020). (PMID: 10.1371/journal.pcbi.1008543)
      Zhou, Y., Pan, Q., Pires, D. E. V., Rodrigues, C. H. M. & Ascher, D. B. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res. 51, W122–W128 (2023). (PMID: 10.1093/nar/gkad472)
      Iqbal, S. et al. Assessing the performance of computational predictors for estimating protein stability changes upon missense mutations. Brief. Bioinform. 22, bbab184 (2021). (PMID: 10.1093/bib/bbab184)
      Pancotti, C. et al. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Brief. Bioinform. 23, bbab555 (2022). (PMID: 10.1093/bib/bbab555)
      Pucci, F., Schwersensky, M. & Rooman, M. Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr. Opin. Struct. Biol. 72, 161–168 (2022). (PMID: 10.1016/j.sbi.2021.11.001)
      Masso, M. & Vaisman, I. I. AUTO-MUTE 2.0: a portable framework with enhanced capabilities for predicting protein functional consequences upon mutation. Adv. Bioinform. 2014, 278385 (2014). (PMID: 10.1155/2014/278385)
      Pucci, F., Bourgeas, R. & Rooman, M. Predicting protein thermal stability changes upon point mutations using statistical potentials: introducing HoTMuSiC. Sci. Rep. 6, 23257 (2016). (PMID: 10.1038/srep23257)
      Louis, B. B. V. & Abriata, L. A. Reviewing challenges of predicting protein melting temperature change upon mutation through the full analysis of a highly detailed dataset with high-resolution structures. Mol. Biotechnol. 63, 863–884 (2021). (PMID: 10.1007/s12033-021-00349-0)
      Berman, H. M. The protein data bank. Nucleic Acids Res. 28, 235–242 (2000). (PMID: 10.1093/nar/28.1.235)
      Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). (PMID: 10.1038/s41586-021-03819-2)
      Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019). (PMID: 10.1186/s13059-019-1845-6)
      Pucci, F., Bernaerts, K. V., Kwasigroch, J. M. & Rooman, M. Quantification of biases in predictions of protein stability changes upon mutations. Bioinformatics 34, 3659–3665 (2018). (PMID: 10.1093/bioinformatics/bty348)
      Usmanova, D. R. et al. Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation. Bioinformatics 34, 3653–3658 (2018). (PMID: 10.1093/bioinformatics/bty340)
      Hernández, I. M., Dehouck, Y., Bastolla, U., López-Blanco, J. R. & Chacón, P. Predicting protein stability changes upon mutation using a simple orientational potential. Bioinformatics 39, btad011 (2023). (PMID: 10.1093/bioinformatics/btad011)
      Laimer, J., Hofer, H., Fritz, M., Wegenkittl, S. & Lackner, P. Maestro—multi agent stability prediction upon point mutations. BMC Bioinformatics 16, 116 (2015). (PMID: 10.1186/s12859-015-0548-6)
      Tsuboyama, K. et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature 620, 434–444 (2023). (PMID: 10.1038/s41586-023-06328-6)
      Rodrigues, C. H., Pires, D. E. & Ascher, D. B. DynaMut2: assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Sci. 30, 60–69 (2020). (PMID: 10.1002/pro.3942)
      Blondel, M., Teboul, O., Berthet, Q. & Djolonga, J. Fast differentiable sorting and ranking. In Proc. 37th International Conference of Machine Learning (eds Daume, H. & Singh, A.) 950–959 (ICML, 2020).
      Nikam, R., Kulandaisamy, A., Harini, K., Sharma, D. & Gromiha, M. M. ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res. 49, D420–D424 (2020). (PMID: 10.1093/nar/gkaa1035)
      Xavier, J. S. et al. ThermoMutDB: a thermodynamic database for missense mutations. Nucleic Acids Res. 49, D475–D479 (2020). (PMID: 10.1093/nar/gkaa925)
      Akdel, M. et al. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 29, 1056–1067 (2022). (PMID: 10.1038/s41594-022-00849-w)
      Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29, 1–2 (2022). (PMID: 10.1038/s41594-021-00714-2)
      Pak, M. A. et al. Using AlphaFold to predict the impact of single mutations on protein stability and function. PLoS ONE 18, e0282689 (2023). (PMID: 10.1371/journal.pone.0282689)
      Kumar, M. D. S. ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions. Nucleic Acids Res. 34, D204–D206 (2006). (PMID: 10.1093/nar/gkj103)
      Nair, P. S. & Vihinen, M. Varibench: a benchmark database for variations. Hum. Mutat. 34, 42–49 (2013). (PMID: 10.1002/humu.22204)
      Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In Proc. Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. et al.) 1417 (Curran Associates, 2019).
      Xu, Y., Liu, D. & Gong, H. Improving the prediction of protein stability changes upon mutations by geometric learning and a pre-training strategy. Code Ocean https://doi.org/10.24433/CO.2318813.v1 (2024).
    • Grant Information:
      32171243 National Natural Science Foundation of China (National Science Foundation of China)
    • Accession Number:
      0 (Proteins)
    • Publication Date:
      Date Created: 20241025 Date Completed: 20241120 Latest Revision: 20241121
    • Publication Date:
      20241122
    • Accession Number:
      10.1038/s43588-024-00716-2
    • Accession Number:
      39455825