NextDenovo

NextDenovo is a string graph-based de novo assembler for long reads (CLR, HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu (no correction step for PacBio HiFi reads), but requires significantly less computing resources and storages. After assembly, the per-base accuracy is about 98-99.8%, to further improve single base accuracy, try NextPolish.

NextDenovo contains two core modules: NextCorrect and NextGraph. NextCorrect can be used to correct long noisy reads with approximately 15% sequencing errors, and NextGraph can be used to construct a string graph with corrected reads. It also contains a modified version of minimap2 and some useful utilities (see utilities for more details).

We benchmarked NextDenovo against other assemblers using Oxford Nanopore long reads from human and Drosophila melanogaster, and PacBio continuous long reads (CLR) from Arabidopsis thaliana. NextDenovo produces more contiguous assemblies with fewer contigs compared to the other tools. NextDenovo also shows a high assembly accurate level in terms of assembly consistency and single-base accuracy.

Installation

REQUIREMENT
- Python (Support python 2 and 3):
  - Paralleltask
```
pip install paralleltask
```

INSTALL

click here or use the following command:

wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz
tar -vxzf NextDenovo.tgz && cd NextDenovo

If you want to compile from the source, run:

git clone git@github.com:Nextomics/NextDenovo.git
cd NextDenovo && make

TEST
```
nextDenovo test_data/run.cfg
```

Quick Start

Prepare input.fofn

ls reads1.fasta reads2.fastq reads3.fasta.gz reads4.fastq.gz ... > input.fofn

Create run.cfg
```
cp doc/run.cfg ./
```
Note

Please set read_type and genome_size, and refer to doc/FAQ and doc/OPTION to optimize parallel computing parameters.
Run
```
nextDenovo run.cfg
```
Result
- Sequence: 01_rundir/03.ctg_graph/nd.asm.fasta
- Statistics: 01_rundir/03.ctg_graph/nd.asm.fasta.stat

Getting Help

HELP

Feel free to raise an issue at the issue page.

Important

Please ask questions on the issue page first. They are also helpful to other users.
CONTACT

For additional help, please send an email to huj_at_grandomics_dot_com.

Cite

Hu J, Wang Z, Sun Z, et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads[J]. Genome Biology, 2024, 25(1): 1-19.

Limitations

NextDenovo is optimized for assembly with seed_cutoff >= 10kb. This should not be a big problem because it only requires the longest 30x-45x seeds length >= 10kb. For shorter seeds, it may produce unexpected results for some complex genomes and need be careful to check the quality.

Star

You can track updates by tab the Star button on the upper-right corner at the github page.