IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values. So, it will perform better than other assemblers.
Some bug fixes. Use 16 bits to store read length. All IDBA assemblers will support read length up to 65535 by modifying kMaxShortSequence in src/sequence/short_sequence.h
(Please note that IDBA is out of maintainance now, we recommend using IDBA-UD instead which generally performs better.)
IDBA, IDBA-UD, IDBA-Hybrid and IDBA-Tran all in one package Released Oct 18, 2012
All IDBA (iterative de Bruijn graph assembler) series assemblers are
refined and included in this package. Plenty of errors are fixed and
scaffolding on multiple levels of paired-end reads are supported in
IDBA, IDBA-UD and IDBA-Hybrid.
The basic IDBA is included only for comparison.
If you are assembling genomic data without reference, please use IDBA-UD.
If you are assembling genomic data with a similar reference genome, please use IDBA-Hybrid.
If you are assembling transcriptome data, please use IDBA-Tran.
IDBA is an open source de novo assembler for next-generation short read sequences. It is fast, parallel and capable of assembling large scale genomic assembly such as human genome.
Abstract: The de Bruijn graph assembly approach breaks reads into k-mers before assembling them into contigs. The string graph approach forms contigs by connecting two reads with k or more overlapping nucleotides. Both approaches face the problem of false-positive vertices from erroneous reads, missing vertices due to non-uniform coverage and branching due to erroneous reads and repeat regions. A proper choice of k is crucial but for any single k there is always a trade-off: a small k favors the situation of erroneous reads and non-uniform coverage, and a large k favors short repeat regions. We propose an iterative de Bruijn graph approach iterating from small to large k capturing merits of all values in between. With real and simulated data, our IDBA algorithm is superior to all existing algorithms by constructing longer contigs with similar accuracy and using less memory. The running time of IDBA is comparable with existing algorithms.
If you use our assembler in your research, please cite our papers.
Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.
E-mail: Peng Yu