Download Version Issues Documentation Status

NextDenovo

NextDenovo is a string graph-based de novo assembler for long reads (CLR, HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu (no correction step for PacBio HiFi reads), but requires significantly less computing resources and storages. After assembly, the per-base accuracy is about 98-99.8%, to further improve single base accuracy, try NextPolish.

NextDenovo contains two core modules: NextCorrect and NextGraph. NextCorrect can be used to correct long noisy reads with approximately 15% sequencing errors, and NextGraph can be used to construct a string graph with corrected reads. It also contains a modified version of minimap2 and some useful utilities (see utilities for more details).

We benchmarked NextDenovo against other assemblers using Oxford Nanopore long reads from human and Drosophila melanogaster, and PacBio continuous long reads (CLR) from Arabidopsis thaliana. NextDenovo produces more contiguous assemblies with fewer contigs compared to the other tools. NextDenovo also shows a high assembly accurate level in terms of assembly consistency and single-base accuracy.

Installation

  • REQUIREMENT

  • INSTALL

    click here or use the following command:

    wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz
    tar -vxzf NextDenovo.tgz && cd NextDenovo
    

    If you want to compile from the source, run:

    git clone git@github.com:Nextomics/NextDenovo.git
    cd NextDenovo && make
    
  • TEST

    nextDenovo test_data/run.cfg
    

Quick Start

  1. Prepare input.fofn

    ls reads1.fasta reads2.fastq reads3.fasta.gz reads4.fastq.gz ... > input.fofn
    
  2. Create run.cfg

    cp doc/run.cfg ./
    

    Note

    Please set read_type and genome_size, and refer to doc/FAQ and doc/OPTION to optimize parallel computing parameters.

  3. Run

    nextDenovo run.cfg
    
  4. Result

    • Sequence: 01_rundir/03.ctg_graph/nd.asm.fasta

    • Statistics: 01_rundir/03.ctg_graph/nd.asm.fasta.stat

Getting Help

  • HELP

    Feel free to raise an issue at the issue page.

    Important

    Please ask questions on the issue page first. They are also helpful to other users.

  • CONTACT

    For additional help, please send an email to huj_at_grandomics_dot_com.

Cite

Hu, J. et al. An efficient error correction and accurate assembly tool for noisy long reads. bioRxiv 2023.03.09.531669 (2023) doi:10.1101/2023.03.09.531669.

Limitations

  1. NextDenovo is optimized for assembly with seed_cutoff >= 10kb. This should not be a big problem because it only requires the longest 30x-45x seeds length >= 10kb. For shorter seeds, it may produce unexpected results for some complex genomes and need be careful to check the quality.

Star

You can track updates by tab the Star button on the upper-right corner at the github page.