DeepDBS: Identification of DNA-binding sites in protein sequences by using deep representations and random forest.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Academic Press Country of Publication: United States NLM ID: 9426302 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1095-9130 (Electronic) Linking ISSN: 10462023 NLM ISO Abbreviation: Methods Subsets: MEDLINE
    • Publication Information:
      Publication: Duluth, MN : Academic Press
      Original Publication: San Diego : Academic Press, c1990-
    • Subject Terms:
    • Abstract:
      Interactions of biological molecules in organisms are considered to be primary factors for the lifecycle of that organism. Various important biological functions are dependent on such interactions and among different kinds of interactions, the protein DNA interactions are very important for the processes of transcription, regulation of gene expression, DNA repairing and packaging. Thus, keeping the knowledge of such interactions and the sites of those interactions is necessary to study the mechanism of various biological processes. As experimental identification through biological assays is quite resource-demanding, costly and error-prone, scientists opt for the computational methods for efficient and accurate identification of such DNA-protein interaction sites. Thus, herein, we propose a novel and accurate method namely DeepDBS for the identification of DNA-binding sites in proteins, using primary amino acid sequences of proteins under study. From protein sequences, deep representations were computed through a one-dimensional convolution neural network (1D-CNN), recurrent neural network (RNN) and long short-term memory (LSTM) network and were further used to train a Random Forest classifier. Random Forest with LSTM-based features outperformed the other models, as well as the existing state-of-the-art methods with an accuracy score of 0.99 for self-consistency test, 10-fold cross-validation, 5-fold cross-validation, and jackknife validation while 0.92 for independent dataset testing. It is concluded based on results that the DeepDBS can help accurate and efficient identification of DNA binding sites (DBS) in proteins.
      Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
      (Copyright © 2024 Elsevier Inc. All rights reserved.)
    • Contributed Indexing:
      Keywords: 1D-CNN; Artificial Intelligence; Computational Biology; DNA binding proteins; DNA binding sites; LSTM; Machine Learning; RNN; Random Forest
    • Accession Number:
      9007-49-2 (DNA)
      0 (DNA-Binding Proteins)
    • Publication Date:
      Date Created: 20240913 Date Completed: 20241029 Latest Revision: 20241029
    • Publication Date:
      20241031
    • Accession Number:
      10.1016/j.ymeth.2024.09.004
    • Accession Number:
      39270885