Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Academic Press Country of Publication: United States NLM ID: 9426302 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1095-9130 (Electronic) Linking ISSN: 10462023 NLM ISO Abbreviation: Methods Subsets: MEDLINE
    • Publication Information:
      Publication: Duluth, MN : Academic Press
      Original Publication: San Diego : Academic Press, c1990-
    • Subject Terms:
    • Abstract:
      Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species "Teacher model" to guide a more compact, species-specific "Student model", with the "Teacher" generating pseudo-labels that enhance the "Student" learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model's superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi.
      Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
      (Copyright © 2024 Elsevier Inc. All rights reserved.)
    • Contributed Indexing:
      Keywords: Arabidopsis thaliana; Knowledge distillation; Natural language processing (NLP); Neural network model; Post-translational modification (PTM); Protein ubiquitination
    • Accession Number:
      0 (Arabidopsis Proteins)
    • Publication Date:
      Date Created: 20241024 Date Completed: 20241201 Latest Revision: 20241201
    • Publication Date:
      20241204
    • Accession Number:
      10.1016/j.ymeth.2024.10.006
    • Accession Number:
      39447942