Item request has been placed!
×
Item request cannot be made.
×
Processing Request
Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life.
Item request has been placed!
×
Item request cannot be made.
×
Processing Request
- Additional Information
- Source:
Publisher: Public Library of Science Country of Publication: United States NLM ID: 101183755 Publication Model: eCollection Cited Medium: Internet ISSN: 1545-7885 (Electronic) Linking ISSN: 15449173 NLM ISO Abbreviation: PLoS Biol Subsets: MEDLINE
- Publication Information:
Original Publication: San Francisco, CA : Public Library of Science, [2003]-
- Subject Terms:
- Abstract:
Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. We previously developed a cost-effective hybrid Oxford Nanopore (ONT) long-read and Illumina short-read sequencing approach and used it to assemble 101 drosophilid genomes from laboratory cultures, greatly increasing the number of genome assemblies for this taxonomic group. The next major challenge is to address the laboratory culture bias in taxon sampling by sequencing genomes of species that cannot easily be reared in the lab. Here, we build upon our previous methods to perform amplification-free ONT sequencing of single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections, greatly improving the representation of lesser studied drosophilid taxa in whole-genome data. Using Illumina Novaseq X Plus and ONT P2 sequencers with R10.4.1 chemistry, we set a new benchmark for inexpensive hybrid genome assembly at US $150 per genome while assembling genomes from as little as 35 ng of genomic DNA from a single fly. We present 183 new genome assemblies for 179 species as a resource for drosophilid systematics, phylogenetics, and comparative genomics. Of these genomes, 62 are from pooled lab strains and 121 from single adult flies. Despite the sample limitations of working with small insects, most single-fly diploid assemblies are comparable in contiguity (>1 Mb contig N50), completeness (>98% complete dipteran BUSCOs), and accuracy (>QV40 genome-wide with ONT R10.4.1) to assemblies from inbred lines. We present a well-resolved multi-locus phylogeny for 360 drosophilid and 4 outgroup species encompassing all publicly available (as of August 2023) genomes for this group. Finally, we present a Progressive Cactus whole-genome, reference-free alignment built from a subset of 298 suitably high-quality drosophilid genomes. The new assemblies and alignment, along with updated laboratory protocols and computational pipelines, are released as an open resource and as a tool for studying evolution at the scale of an entire insect family.
Competing Interests: The authors have declared that no competing interests exist.
(Copyright: © 2024 Kim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
- Comments:
Update of: bioRxiv. 2023 Oct 02:2023.10.02.560517. doi: 10.1101/2023.10.02.560517. (PMID: 37873137)
- References:
Wellcome Open Res. 2023 Aug 22;8:361. (PMID: 38868628)
Mol Biol Evol. 2013 Apr;30(4):772-80. (PMID: 23329690)
Genome Biol. 2021 Jan 8;22(1):28. (PMID: 33419473)
Mol Biol Evol. 2022 Mar 2;39(3):. (PMID: 35048974)
Nat Biotechnol. 2024 Jan;42(1):139-147. (PMID: 37081138)
Nature. 2007 Nov 8;450(7167):203-18. (PMID: 17994087)
Bioinformatics. 2010 Mar 15;26(6):841-2. (PMID: 20110278)
Science. 2022 Mar 4;375(6584):eabk2432. (PMID: 35239393)
Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811. (PMID: 30395283)
PLoS Biol. 2015 Apr 16;13(4):e1002078. (PMID: 25879221)
Proc Natl Acad Sci U S A. 1993 May 15;90(10):4548-51. (PMID: 8506297)
PLoS Genet. 2017 Oct 2;13(10):e1007016. (PMID: 28968391)
PLoS Biol. 2023 Oct 12;21(10):e3002333. (PMID: 37824452)
Genetics. 2012 Oct;192(2):533-98. (PMID: 22673804)
G3 (Bethesda). 2018 Oct 3;8(10):3143-3154. (PMID: 30018084)
Bioinformatics. 2008 Mar 1;24(5):637-44. (PMID: 18218656)
Bioinformatics. 2013 May 15;29(10):1341-2. (PMID: 23505295)
Genetics. 1941 Sep;26(5):517-41. (PMID: 17247021)
Genes (Basel). 2019 Jan 18;10(1):. (PMID: 30669388)
Science. 2010 Dec 24;330(6012):1787-97. (PMID: 21177974)
Bioinformatics. 2019 Oct 15;35(20):3961-3969. (PMID: 30903685)
Genetics. 2019 Jan;211(1):333-348. (PMID: 30420487)
Nat Methods. 2021 Nov;18(11):1322-1332. (PMID: 34725481)
Genome Biol. 2020 Sep 14;21(1):245. (PMID: 32928274)
Curr Biol. 2022 Jan 10;32(1):111-123.e5. (PMID: 34788634)
Bioinformatics. 2012 Oct 1;28(19):2520-2. (PMID: 22908215)
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. (PMID: 34320186)
Bioinformatics. 2016 Jul 15;32(14):2103-10. (PMID: 27153593)
Nature. 2012 Feb 08;482(7384):173-8. (PMID: 22318601)
Curr Biol. 2022 Jun 20;32(12):2632-2639.e2. (PMID: 35588743)
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W609-12. (PMID: 16845082)
Nature. 2021 Apr;592(7856):737-746. (PMID: 33911273)
Nat Biotechnol. 2019 May;37(5):540-546. (PMID: 30936562)
Proc Natl Acad Sci U S A. 2020 Apr 28;117(17):9451-9457. (PMID: 32300014)
NAR Genom Bioinform. 2021 May 03;3(2):lqab034. (PMID: 33987534)
Nucleic Acids Res. 2021 Jul 2;49(W1):W293-W296. (PMID: 33885785)
Brief Bioinform. 2011 Jan;12(1):41-51. (PMID: 21278375)
Genome Biol. 2022 Dec 15;23(1):258. (PMID: 36522651)
PLoS Biol. 2018 Jul 30;16(7):e2006348. (PMID: 30059545)
Genome Biol. 2024 Feb 26;25(1):60. (PMID: 38409096)
Genome Res. 2005 Jan;15(1):1-18. (PMID: 15632085)
Nucleic Acids Res. 2015 Jan;43(Database issue):D690-7. (PMID: 25398896)
Genetics. 2018 May;209(1):1-25. (PMID: 29716983)
Genome Biol Evol. 2021 Aug 3;13(8):. (PMID: 34343293)
Mol Biol Evol. 2020 May 1;37(5):1530-1534. (PMID: 32011700)
Nat Ecol Evol. 2023 Aug;7(8):1181-1193. (PMID: 37429904)
J Comput Biol. 2012 May;19(5):455-77. (PMID: 22506599)
Nat Methods. 2021 Feb;18(2):170-175. (PMID: 33526886)
PLoS One. 2007 May 16;2(5):e442. (PMID: 17505533)
Nature. 2020 Nov;587(7833):246-251. (PMID: 33177663)
Genome Res. 2015 Mar;25(3):445-58. (PMID: 25589440)
Science. 2023 Apr 28;380(6643):eabn3943. (PMID: 37104599)
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):645-56. (PMID: 24091398)
Bioinformatics. 2021 Jul 19;37(12):1639-1643. (PMID: 33320174)
Bioinformatics. 2020 May 1;36(9):2896-2898. (PMID: 31971576)
Science. 2000 Mar 24;287(5461):2185-95. (PMID: 10731132)
Mol Ecol Notes. 2007 May 1;7(3):355-364. (PMID: 18784790)
PLoS One. 2014 Nov 19;9(11):e112963. (PMID: 25409509)
Nucleic Acids Res. 2020 Jul 27;48(13):e75. (PMID: 32491177)
G3 (Bethesda). 2018 Oct 3;8(10):3131-3141. (PMID: 30087105)
Genome Res. 2018 Jul;28(7):1029-1038. (PMID: 29884752)
Nat Methods. 2023 Oct;20(10):1483-1492. (PMID: 37710018)
Genome Res. 2010 Jan;20(1):110-21. (PMID: 19858363)
Elife. 2021 Jul 19;10:. (PMID: 34279216)
Mol Ecol. 2016 Feb;25(3):723-40. (PMID: 26523848)
- Grant Information:
R35 GM122592 United States GM NIGMS NIH HHS; F32 GM135998 United States GM NIGMS NIH HHS; R35 GM148244 United States GM NIGMS NIH HHS; T32 HG000044 United States HG NHGRI NIH HHS; R35 GM118165 United States GM NIGMS NIH HHS; R35 GM137834 United States GM NIGMS NIH HHS; K99 GM137041 United States GM NIGMS NIH HHS
- Publication Date:
Date Created: 20240718 Date Completed: 20240718 Latest Revision: 20240725
- Publication Date:
20240726
- Accession Number:
PMC11257246
- Accession Number:
10.1371/journal.pbio.3002697
- Accession Number:
39024225
No Comments.