NextDenovo
NextDenovo is a string graph-based de novo assembler for long reads (CLR, HiFi and ONT). It uses a “correct-then-assemble” strategy similar to canu (no correction step for PacBio HiFi reads), but requires significantly less computing resources and storages. After assembly, the per-base accuracy is about 98-99.8%, to further improve single base accuracy, try NextPolish.
NextDenovo contains two core modules: NextCorrect and NextGraph. NextCorrect can be used to correct long noisy reads with approximately 15% sequencing errors, and NextGraph can be used to construct a string graph with corrected reads. It also contains a modified version of minimap2 and some useful utilities (see utilities for more details).
We benchmarked NextDenovo against other assemblers using Oxford Nanopore long reads from human and Drosophila melanogaster, and PacBio continuous long reads (CLR) from Arabidopsis thaliana. NextDenovo produces more contiguous assemblies with fewer contigs compared to the other tools. NextDenovo also shows a high assembly accurate level in terms of assembly consistency and single-base accuracy.
Installation
REQUIREMENT
Python (Support python 2 and 3):
pip install paralleltask
INSTALL
click here or use the following command:
wget https://github.com/Nextomics/NextDenovo/releases/latest/download/NextDenovo.tgz tar -vxzf NextDenovo.tgz && cd NextDenovo
If you want to compile from the source, run:
git clone git@github.com:Nextomics/NextDenovo.git cd NextDenovo && make
TEST
nextDenovo test_data/run.cfg
Quick Start
Prepare input.fofn
ls reads1.fasta reads2.fastq reads3.fasta.gz reads4.fastq.gz ... > input.fofn
Create run.cfg
cp doc/run.cfg ./
Note
Please set read_type and genome_size, and refer to doc/FAQ and doc/OPTION to optimize parallel computing parameters.
Run
nextDenovo run.cfg
Result
Sequence:
01_rundir/03.ctg_graph/nd.asm.fasta
Statistics:
01_rundir/03.ctg_graph/nd.asm.fasta.stat
Getting Help
HELP
Feel free to raise an issue at the issue page.
Important
Please ask questions on the issue page first. They are also helpful to other users.
CONTACT
For additional help, please send an email to huj_at_grandomics_dot_com.
Cite
Hu J, Wang Z, Sun Z, et al. NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads[J]. Genome Biology, 2024, 25(1): 1-19.
Limitations
NextDenovo is optimized for assembly with seed_cutoff >= 10kb. This should not be a big problem because it only requires the longest 30x-45x seeds length >= 10kb. For shorter seeds, it may produce unexpected results for some complex genomes and need be careful to check the quality.
Star
You can track updates by tab the Star
button on the upper-right corner at the github page.