Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Nature Pub. Group Country of Publication: United States NLM ID: 101215604 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1548-7105 (Electronic) Linking ISSN: 15487091 NLM ISO Abbreviation: Nat Methods Subsets: MEDLINE
    • Publication Information:
      Original Publication: New York, NY : Nature Pub. Group, c2004-
    • Subject Terms:
    • Abstract:
      Bacterial species in microbial communities are often represented by mixtures of strains, distinguished by small variations in their genomes. Short-read approaches can be used to detect small-scale variation between strains but fail to phase these variants into contiguous haplotypes. Long-read metagenome assemblers can generate contiguous bacterial chromosomes but often suppress strain-level variation in favor of species-level consensus. Here we present Strainy, an algorithm for strain-level metagenome assembly and phasing from Nanopore and PacBio reads. Strainy takes a de novo metagenomic assembly as input and identifies strain variants, which are then phased and assembled into contiguous haplotypes. Using simulated and mock Nanopore and PacBio metagenome data, we show that Strainy assembles accurate and complete strain haplotypes, outperforming current Nanopore-based methods and comparable with PacBio-based algorithms in completeness and accuracy. We then use Strainy to assemble strain haplotypes of a complex environmental metagenome, revealing distinct strain distribution and mutational patterns in bacterial species.
      (© 2024. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.)
    • References:
      Zhao, S. et al. Adaptive evolution within gut microbiomes of healthy people. Cell Host Microbe 25, 656–667 (2019).
      Kaper, J. B., Nataro, J. P. & Mobley, H. L. Pathogenic Escherichia coli. Nat. Rev. Microbiol. 2, 123–140 (2004). (PMID: 1504026010.1038/nrmicro818)
      Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013). (PMID: 2322252410.1038/nature11711)
      Good, B. H., McDonald, M. J., Barrick, J. E., Lenski, R. E. & Desai, M. M. The dynamics of molecular evolution over 60,000 generations. Nature 551, 45–50 (2017). (PMID: 29045390578870010.1038/nature24287)
      Yan, Y., Nguyen, L. H., Franzosa, E. A. & Huttenhower, C. Strain-level epidemiology of microbial communities and the human microbiome. Genome Med. 12, 71 (2020). (PMID: 32791981742729310.1186/s13073-020-00765-y)
      Zimmermann, M., Zimmermann-Kogadeeva, M., Wegmann, R. & Goodman, A. L. Mapping human microbiome drug metabolism by gut bacteria and their genes. Nature 570, 462–467 (2019). (PMID: 31158845659729010.1038/s41586-019-1291-3)
      Albanese, D. & Donati, C. Strain profiling and epidemiology of bacterial species from metagenomic sequencing. Nat. Commun. 8, 2260 (2017). (PMID: 29273717574166410.1038/s41467-017-02209-5)
      Olm, M. R. et al. inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains. Nat. Biotechnol. 39, 727–736 (2021). (PMID: 33462508922386710.1038/s41587-020-00797-0)
      Quince, C. et al. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol. 22, 214 (2021). (PMID: 34311761831196410.1186/s13059-021-02419-7)
      Ghurye, J. et al. MetaCarvel: linking assembly graph motifs to biological variants. Genome Biol. 20, 174 (2019). (PMID: 31451112671087410.1186/s13059-019-1791-3)
      Li, D., Liu, C.-M., Luo, R., Sadakane, K. & Lam, T.-W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015). (PMID: 2560979310.1093/bioinformatics/btv033)
      Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017). (PMID: 28298430541177710.1101/gr.213959.116)
      Bertrand, D. et al. Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nat. Biotechnol. 37, 937–944 (2019). (PMID: 3135900510.1038/s41587-019-0191-2)
      Kim, C. Y., Ma, J. & Lee, I. HiFi metagenomic sequencing enables assembly of accurate and complete genomes from human gut microbiota. Nat. Commun. 13, 6367 (2022). (PMID: 36289209960630510.1038/s41467-022-34149-0)
      Dai, D. et al. Long-read metagenomic sequencing reveals shifts in associations of antibiotic resistance genes with mobile genetic elements from sewage to activated sludge. Microbiome 10, 20 (2022). (PMID: 35093160880115210.1186/s40168-021-01216-5)
      Beaulaurier, J. et al. Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities. Genome Res. 30, 437–446 (2020). (PMID: 32075851711152410.1101/gr.251686.119)
      Van Goethem, M. W. et al. Long-read metagenomics of soil communities reveals phylum-specific secondary metabolite dynamics. Commun. Biol. 4, 1302 (2021). (PMID: 34795375860273110.1038/s42003-021-02809-4)
      Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017). (PMID: 28298431541176710.1101/gr.215087.116)
      Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019). (PMID: 3093656210.1038/s41587-019-0072-8)
      Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020). (PMID: 32686750748385510.1038/s41587-020-0503-6)
      Kolmogorov, M. et al. metaFlye: scalable long-read metagenome assembly using repeat graphs. Nat. Methods 17, 1103–1110 (2020). (PMID: 330206561069920210.1038/s41592-020-00971-x)
      Bickhart, D. M. et al. Generating lineage-resolved, complete metagenome-assembled genomes from complex microbial communities. Nat. Biotechnol. 40, 711–719 (2022). (PMID: 3498091110.1038/s41587-021-01130-z)
      Meyer, F. et al. Critical assessment of metagenome interpretation: the second round of challenges. Nat. Methods 19, 429–440 (2022). (PMID: 35396482900773810.1038/s41592-022-01431-4)
      Curry, K. D. et al. Reference-free structural variant detection in microbiomes via long-read coassembly graphs. Bioinformatics 40, i58–i67 (2024). (PMID: 389401561121184310.1093/bioinformatics/btae224)
      Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021). (PMID: 33526886796188910.1038/s41592-020-01056-5)
      Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020). (PMID: 32801147754514810.1101/gr.263566.120)
      Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023). (PMID: 367974931042774010.1038/s41587-023-01662-6)
      Feng, X., Cheng, H., Portik, D. & Li, H. Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nat. Methods 19, 671–674 (2022). (PMID: 35534630934308910.1038/s41592-022-01478-3)
      Benoit, G. et al. High-quality metagenome assembly from long accurate reads with metaMDBG. Nat. Biotechnol. 42, 1378–1383 (2024). (PMID: 381689891139281410.1038/s41587-023-01983-6)
      Fedarko, M. W., Kolmogorov, M. & Pevzner, P. A. Analyzing rare mutations in metagenomes assembled using long and accurate reads. Genome Res. 32, 2119–2133 (2022). (PMID: 36418060980863010.1101/gr.276917.122)
      Kolmogorov, M. et al. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat. Methods 20, 1483–1492 (2023). (PMID: 377100181122290510.1038/s41592-023-01993-x)
      Chen, L. et al. Short- and long-read metagenomics expand individualized structural variations in gut microbiomes. Nat. Commun. 13, 3175 (2022). (PMID: 35676264917756710.1038/s41467-022-30857-9)
      Jin, H. et al. A high-quality genome compendium of the human gut microbiome of Inner Mongolians. Nat. Microbiol. 8, 150–161 (2023). (PMID: 3660450510.1038/s41564-022-01270-1)
      Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
      Edge, P., Bafna, V. & Bansal, V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27, 801–812 (2017). (PMID: 27940952541177510.1101/gr.213462.116)
      Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021). (PMID: 34725481857101510.1038/s41592-021-01299-w)
      Schrinner, S. D. et al. Haplotype threading: accurate polyploid phasing from long reads. Genome Biol. 21, 252 (2020). (PMID: 32951599750485610.1186/s13059-020-02158-1)
      Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 13, 1050–1054 (2016). (PMID: 27749838550314410.1038/nmeth.4035)
      Garg, S. et al. A haplotype-aware de novo assembly of related individuals using pedigree sequence graph. Bioinformatics 36, 2385–2392 (2020). (PMID: 3186007010.1093/bioinformatics/btz942)
      Faure, R., Guiglielmoni, N. & Flot, J.-F. GraphUnzip: unzipping assembly graphs with long reads and Hi-C. Preprint at bioRxiv https://doi.org/10.1101/2021.01.29.428779 (2021).
      Nicholls, S. M. et al. On the complexity of haplotyping a microbial community. Bioinformatics 37, 1360–1366 (2021). (PMID: 33444437820873710.1093/bioinformatics/btaa977)
      Vicedomini, R., Quince, C., Darling, A. E. & Chikhi, R. Strainberry: automated strain separation in low-complexity metagenomes using long reads. Nat. Commun. 12, 4485 (2021). (PMID: 34301928830273010.1038/s41467-021-24515-9)
      Feng, Z., Clemente, J. C., Wong, B. & Schadt, E. E. Detecting and phasing minor single-nucleotide variants from long-read sequencing data. Nat. Commun. 12, 3032 (2021). (PMID: 34031367814437510.1038/s41467-021-23289-4)
      Knyazev, S., Hughes, L., Skums, P. & Zelikovsky, A. Epidemiological data analysis of viral quasispecies in the next-generation sequencing era. Brief. Bioinform. 22, 96–108 (2021). (PMID: 10.1093/bib/bbaa101)
      Jablonski, K. P. & Beerenwinkel, N. in Virus Bioinformatics 51–64 (Chapman and Hall/CRC, 2021).
      Warwick-Dugdale, J. et al. Long-read viral metagenomics captures abundant and microdiverse viral populations and their niche-defining genomic islands. PeerJ 7, e6800 (2019). (PMID: 31086738648718310.7717/peerj.6800)
      Zhou, Z., Luhmann, N., Alikhan, N.-F., Quince, C. & Achtman, M. Accurate reconstruction of microbial strains from metagenomic sequencing using representative reference genomes. In Research in Computational Molecular Biology 225–240 (Springer, 2018).
      Liu, L., Yang, Y., Deng, Y. & Zhang, T. Nanopore long-read-only metagenomics enables complete and high-quality genome reconstruction from mock and complex metagenomes. Microbiome 10, 209 (2022). (PMID: 36457010971668410.1186/s40168-022-01415-8)
      Luo, X., Kang, X. & Schönhuth, A. VeChat: correcting errors in long reads using variation graphs. Nat. Commun. 13, 6657 (2022). (PMID: 36333324963637110.1038/s41467-022-34381-8)
      Mikheenko, A., Saveliev, V. & Gurevich, A. MetaQUAST: evaluation of metagenome assemblies. Bioinformatics 32, 1088–1090 (2015). (PMID: 2661412710.1093/bioinformatics/btv697)
      Shaw, J. & Yu, Y. W. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nat. Methods 20, 1661–1665 (2023).
      Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015). (PMID: 26099265459590410.1093/bioinformatics/btv383)
      Sereika, M. et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. Methods 19, 823–826 (2022). (PMID: 35789207926270710.1038/s41592-022-01539-7)
      Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797–803 (2022).
      Kang, D. D. et al. MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ 7, e7359 (2019). (PMID: 31388474666256710.7717/peerj.7359)
      Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021). (PMID: 33590861793181910.1093/gigascience/giab008)
      Jee, J. et al. Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534, 693–696 (2016). (PMID: 27338792494009410.1038/nature18313)
      Huang, H. et al. Tigecycline resistance-associated mutations in the MepA efflux pump in Staphylococcus aureus. Microbiol. Spectr. 11, e0063423 (2023). (PMID: 3743211410.1128/spectrum.00634-23)
      Jagdmann, J., Andersson, D. I. & Nicoloff, H. Low levels of tetracyclines select for a mutation that prevents the evolution of high-level resistance to tigecycline. PLoS Biol. 20, e3001808 (2022). (PMID: 36170241955017610.1371/journal.pbio.3001808)
      Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018). (PMID: 29750242613799610.1093/bioinformatics/bty191)
      Raghavan, U. N., Albert, R. & Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 76, 036106 (2007). (PMID: 10.1103/PhysRevE.76.036106)
      Kazantseva, E., Donmez, A. & Kolmogorov, M. Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing—real and mock datasets. Zenodo https://doi.org/10.5281/zenodo.11149518 (2024).
      Kazantseva, E., Donmez, A. & Kolmogorov, M. Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing—simulated datasets. Zenodo https://doi.org/10.5281/zenodo.11142288 (2024).
    • Grant Information:
      R01 AI100947 United States AI NIAID NIH HHS; Intramural Research Program of the Center for Cancer Research U.S. Department of Health & Human Services | NIH | National Cancer Institute (NCI)
    • Publication Date:
      Date Created: 20240926 Date Completed: 20241107 Latest Revision: 20241107
    • Publication Date:
      20241112
    • Accession Number:
      10.1038/s41592-024-02424-1
    • Accession Number:
      39327484