InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: MDPI Country of Publication: Switzerland NLM ID: 101551097 Publication Model: Electronic Cited Medium: Internet ISSN: 2073-4425 (Electronic) Linking ISSN: 20734425 NLM ISO Abbreviation: Genes (Basel) Subsets: MEDLINE
    • Publication Information:
      Original Publication: Basel : MDPI
    • Subject Terms:
    • Abstract:
      Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
    • References:
      Plant J. 2007 Oct;52(2):342-51. (PMID: 17764506)
      Genetics. 1977 Aug;86(4):813-33. (PMID: 17248751)
      Biochim Biophys Acta. 2015 Apr;1849(4):403-16. (PMID: 25086340)
      BMC Bioinformatics. 2008 Jan 14;9:18. (PMID: 18194517)
      Mob Genet Elements. 2016 Nov 4;6(6):e1256852. (PMID: 28090383)
      Nucleic Acids Res. 2007 Jul;35(Web Server issue):W265-8. (PMID: 17485477)
      J Mol Biol. 1990 Oct 5;215(3):403-10. (PMID: 2231712)
      PLoS Comput Biol. 2018 Apr 23;14(4):e1006097. (PMID: 29684010)
      Nucleic Acids Res. 2018 Nov 30;46(21):e126. (PMID: 30107434)
      Bioinformatics. 2003 Feb 12;19(3):362-7. (PMID: 12584121)
      PLoS One. 2012;7(2):e32010. (PMID: 22359654)
      Front Plant Sci. 2018 Feb 15;9:175. (PMID: 29497436)
      BMC Biol. 2020 Jun 18;18(1):63. (PMID: 32552824)
      Biol Direct. 2009 Nov 02;4:41. (PMID: 19883502)
      Mob DNA. 2017 Dec 6;8:19. (PMID: 29225705)
      G3 (Bethesda). 2017 Jun 7;7(6):1875-1885. (PMID: 28413161)
      Plant Physiol. 2018 Feb;176(2):1410-1422. (PMID: 29233850)
      Nat Genet. 2016 Jun;48(6):657-66. (PMID: 27158781)
      Cytogenet Genome Res. 2013;140(2-4):286-94. (PMID: 23899810)
      Genome Biol Evol. 2015 Jan 07;7(2):493-504. (PMID: 25573958)
      Methods Mol Biol. 2016;1374:165-86. (PMID: 26519405)
      Bioinformatics. 2020 Aug 1;36(15):4269-4275. (PMID: 32415954)
      New Phytol. 2020 Sep;227(6):1736-1748. (PMID: 31677277)
      Mol Biol Evol. 2013 Apr;30(4):772-80. (PMID: 23329690)
      BMC Genomics. 2010 Feb 17;11:113. (PMID: 20163715)
      Genes (Basel). 2017 Oct 24;8(10):. (PMID: 29064432)
      Anal Biochem. 2014 Oct 1;462:76-83. (PMID: 25016190)
      Gene. 2003 Jun 5;311:1-11. (PMID: 12853133)
      Biology (Basel). 2018 May 25;7(2):. (PMID: 29799487)
      J Genet. 2016 Dec;95(4):1039-1052. (PMID: 27994207)
      Cytogenet Genome Res. 2005;110(1-4):462-7. (PMID: 16093699)
      PLoS One. 2012;7(10):e48595. (PMID: 23119066)
      Trends Genet. 2000 Jun;16(6):276-7. (PMID: 10827456)
      New Phytol. 2015 Apr;206(1):368-380. (PMID: 25417867)
      Funct Plant Biol. 2014 May;41(6):557-567. (PMID: 32481013)
      Front Plant Sci. 2015 Apr 07;6:216. (PMID: 25904926)
      Nucleic Acids Res. 2011 Sep 1;39(16):6864-78. (PMID: 21609951)
      Syst Biol. 2020 Jul 1;69(4):613-622. (PMID: 32065640)
      Genome Res. 2006 Oct;16(10):1262-9. (PMID: 16963705)
      BMC Genomics. 2013 Feb 02;14:75. (PMID: 23375136)
      PLoS One. 2013 Jul 29;8(7):e71118. (PMID: 23923055)
      Mob DNA. 2019 Jan 29;10:6. (PMID: 30719103)
      Curr Opin Genet Dev. 1995 Dec;5(6):814-21. (PMID: 8745082)
      Int J Mol Sci. 2019 Aug 06;20(15):. (PMID: 31390781)
      Genetica. 2017 Oct;145(4-5):417-430. (PMID: 28776161)
      PeerJ. 2019 Dec 18;7:e8311. (PMID: 31976169)
      BMC Genomics. 2012 Apr 16;13:137. (PMID: 22507400)
      Front Genet. 2016 Jan 05;6:358. (PMID: 26779254)
      Mol Genet Genomics. 2014 Dec;289(6):1307-19. (PMID: 25106953)
      Mob DNA. 2019 Apr 3;10:13. (PMID: 30988701)
      BMC Bioinformatics. 2011 Apr 22;12:116. (PMID: 21513511)
      Gene. 2017 Aug 30;626:14-25. (PMID: 28476688)
      Bioinformatics. 2009 Aug 1;25(15):1972-3. (PMID: 19505945)
      Ann Bot. 2017 Aug 1;120(2):195-207. (PMID: 28854566)
      Genome Biol. 2019 Dec 16;20(1):275. (PMID: 31843001)
      Curr Opin Plant Biol. 2019 Apr;48:1-8. (PMID: 30579050)
      Mob Genet Elements. 2017 Jan 11;7(1):1-20. (PMID: 28228978)
      Front Plant Sci. 2014 Jul 11;5:339. (PMID: 25071814)
      Genome Biol. 2016 Jan 18;17:7. (PMID: 26781660)
      Annu Rev Genet. 1999;33:479-532. (PMID: 10690416)
      Plant Cell Rep. 2018 Feb;37(2):193-208. (PMID: 29164313)
      Nucleic Acids Res. 2011 Jan;39(Database issue):D70-4. (PMID: 21036865)
      Nucleic Acids Res. 2016 Jan 4;44(D1):D1141-7. (PMID: 26527721)
      J Integr Bioinform. 2013 Nov 14;10(3):231. (PMID: 24231145)
      Genes (Basel). 2019 Apr 09;10(4):. (PMID: 30970574)
      Mob DNA. 2019 Jan 3;10:1. (PMID: 30622655)
      Nat Rev Genet. 2007 Dec;8(12):973-82. (PMID: 17984973)
      Brief Bioinform. 2021 May 20;22(3):. (PMID: 34020551)
      Genes (Basel). 2018 May 15;9(5):. (PMID: 29762547)
      Nat Genet. 2015 Dec;47(12):1435-42. (PMID: 26523774)
    • Contributed Indexing:
      Keywords: InpactorDB; LTR retrotransposons; bioinformatics; deep neural networks; genomics; machine learning; plant genomes
    • Accession Number:
      0 (Retroelements)
    • Publication Date:
      Date Created: 20210202 Date Completed: 20210726 Latest Revision: 20211217
    • Publication Date:
      20231215
    • Accession Number:
      PMC7910972
    • Accession Number:
      10.3390/genes12020190
    • Accession Number:
      33525408