NextDenovo Parameter Reference¶
NextDenovo requires at least one read file (option: input_fofn
) as input, it works with gzip’d FASTA and FASTQ formats and uses a config file
to pass options.
Input¶
input_fofn
(one file one line)ls reads1.fasta reads2.fastq reads3.fasta.gz reads4.fastq.gz ... > input.fofn
config file
A config file is a text file that contains a set of parameters (key=value pairs) to set runtime parameters for NextDenovo. The following is a typical config file, which is also located in
doc/run.cfg
.[General] job_type = local job_prefix = nextDenovo task = all rewrite = yes deltmp = yes parallel_jobs = 20 input_type = raw read_type = clr # clr, ont, hifi input_fofn = input.fofn workdir = 01_rundir [correct_option] read_cutoff = 1k genome_size = 1g # estimated genome size sort_options = -m 20g -t 15 minimap2_options_raw = -t 8 pa_correction = 3 correction_options = -p 15 [assemble_option] minimap2_options_cns = -t 8 nextgraph_options = -a 1
Output¶
workdir/03.ctg_graph/nd.asm.fasta
Contigs with fasta format, the fasta header includes ID, type, length, node count, a consecutive lowercase region in the sequence implies a weak connection, and a low quality base is marked with a single lowercase base.
workdir/03.ctg_graph/nd.asm.fasta.stat
Some basic statistical information (N10-N90, Total size et al.).
Options¶
Global options¶
job_type
= sge
¶local, sge, pbs, lsf, slurm… (default: sge)
job_prefix
= nextDenovo
¶prefix tag for jobs. (default: nextDenovo)
task
= <all
,
correct
,
assemble>
¶
task need to run, correct = only do the correction step, assemble = only do the assembly step (only work if
input_type
= corrected orread_type
= hifi), all = correct + assemble. (default: all)
rewrite
= no
¶overwrite existed directory [yes, no]. (default: no)
deltmp
= yes
¶delete intermediate results. (default: yes)
rerun
= 3
¶re-run unfinished jobs untill finished or reached
rerun
loops, 0=no. (default: 3)
parallel_jobs
= 10
¶number of tasks used to run in parallel. (default: 10)
input_type
= raw
¶input reads type [raw, corrected]. (default: raw)
input_fofn
= input.fofn
¶input file, one line one file. (required)
read_type
= {clr
,
hifi
,
ont}
¶
reads type, clr=PacBio continuous long read, hifi=PacBio highly accurate long reads, ont=NanoPore 1D reads. (required)
workdir
= 01.workdir
¶work directory. (default: ./)
usetempdir
= /tmp/test
¶temporary directory in compute nodes to avoid high IO wait. (default: None)
nodelist
= avanode.list.fofn
¶a list of hostnames of available nodes, one node one line, used with usetempdir for non-sge job_type.
submit
= auto
¶command to submit a job, auto = automatically set by Paralleltask.
kill
= auto
¶command to kill a job, auto = automatically set by Paralleltask.
check_alive
= auto
¶command to check a job status, auto = automatically set by Paralleltask.
job_id_regex
= auto
¶the job-id-regex to parse the job id from the out of
submit
, auto = automatically set by Paralleltask.
use_drmaa
= no
¶use drmaa to submit and control jobs.
Correction options¶
read_cutoff
= 1k
¶filter reads with length <
read_cutoff
. (default: 1k)
genome_size
= 1g
¶estimated genome size, suffix K/M/G recognized, used to calculate
seed_cutoff
/seed_cutfiles
/blocksize
and average depth, it can be omitted when manually settingseed_cutoff
.
seed_depth
= 45
¶expected seed depth, used to calculate
seed_cutoff
, co-use withgenome_size
, you can try to set it 30-45 to get a better assembly result. (default: 45)
seed_cutoff
= 0
¶minimum seed length, <=0 means calculate it automatically using bin/seq_stat.
seed_cutfiles
= 5
¶split seed reads into
seed_cutfiles
subfiles. (default:pa_correction
)
blocksize
= 10g
¶block size for parallel running, split non-seed reads into small files, the maximum size of each file is
blocksize
. (default: 10g)
pa_correction
= 3
¶number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage, overwrite
parallel_jobs
only for this step. (default: 3)
minimap2_options_raw
= -t 10
¶minimap2 options, used to find overlaps between raw reads, see minimap2-nd for details.
correction_options
= -p 10
¶correction options, see following:
-p, --process, set the number of processes used for correcting. (default: 10) -b, --blacklist, disable the filter step and increase more corrected data. -s, --split, split the corrected seed with un-corrected regions. (default: False) -fast, 0.5-1 times faster mode with a little lower accuracy. (default: False) -dbuf, disable caching 2bit files and reduce ~TOTAL_INPUT_BASES/4 bytes of memory usage. (default:False) -max_lq_length, maximum length of a continuous low quality region in a corrected seed, larger max_lq_length will produce more corrected data with lower accuracy. (default: auto [pb/1k, ont/10k])