T-IDBA is an iterative De Bruijn Graph De Novo short read assembler for transcriptome. It is purely de novo assembler based on only RNA sequencing reads. In this assembler, not only the reads but also the pair-end information is used to increase the k value in the accumulated de Bruijn graph. Because of the nature of the transcriptome, the transcripts from different genes share only very few repeat patterns. Hence, de Bruijn graph will be decomposed into small connected components when k is large enough. Each component corresponds to one gene in most cases and contains not many transcripts. A heuristic algorithm based on pair-end reads is then used to find the isoforms in T-IDBA.GitHub.
Some bug fixes. Use 16 bits to store read length. All IDBA assemblers will support read length up to 65535 by modifying kMaxShortSequence in src/sequence/short_sequence.h
(Please note that T-IDBA is out of maintainance now, we recommend using IDBA-Tran instead which generally performs better.)
IDBA, IDBA-UD, IDBA-Hybrid and IDBA-Tran all in one package Released Oct 18, 2012
All IDBA (iterative de Bruijn graph assembler) series assemblers are
refined and included in this package. Plenty of errors are fixed and
scaffolding on multiple levels of paired-end reads are supported in
IDBA, IDBA-UD and IDBA-Hybrid.
The basic IDBA is included only for comparison.
If you are assembling genomic data without reference, please use IDBA-UD.
If you are assembling genomic data with a similar reference genome, please use IDBA-Hybrid.
If you are assembling transcriptome data, please use IDBA-Tran.
Please follow the instruction in README file to run the software.
T-IDBA is an open source de novo assembler for next-generation short read RNA sequences.
Abstract: RNA sequencing based on next-generation sequencing technology is useful for analyzing transcriptomes, discovering novel genes and studying exon/intron structures. Similar to genome assembly, de novo transcriptome assembly does not rely on a reference genome and additional annotated information. Most, if not all, existing de novo transcriptome assemblers rely heavily on de novo genome assembly techniques without fully utilizing the properties of transcriptomes and may result in short contigs because of the splicing nature (shared exons) of the genes and the repeats existing in different genes. In this paper, we analyze the properties of the mammalian transcriptome and propose an algorithm to reconstruct expressed isoforms without a reference genome. We extend the iterative de Bruijn graph approach (IDBA) and use pair-end information to solve the problem of long repeats in different genes and the problem of branching introduced by shared exons in the same gene. The graph will then be decomposited into small components, each of which contains a few, if not single, genes. The most possible isoforms which have the most support from the pair-end reads will then be found by depth-first search heuristically. In practice, our de novo transcriptome assembly software, T-IDBA, outperforms Abyss (one of the newest de novo transcriptome assembly tools) substantially in terms of sensitivity and precision for both simulated and real data. We also provide a theoretical analysis of T-IDBA’s performance, which shows that T-IDBA guarantees most isoforms can be recovered as long as the coverage of the isoforms by reads exceeds a certain threshold and matchs with T-IDBA’s performance.
If you use our assembler in your research, please cite our papers.
Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.
Peng, Y., et al. (2011) T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome. RECOMB, Vancouver.
E-mail: Peng Yu