TIDBA-Bioinfomatics Research Group of Hong Kong University

What Is T-IDBA?

T-IDBA is an iterative De Bruijn Graph De Novo short read assembler for transcriptome. It is purely de novo assembler based on only RNA sequencing reads. In this assembler, not only the reads but also the pair-end information is used to increase the k value in the accumulated de Bruijn graph. Because of the nature of the transcriptome, the transcripts from different genes share only very few repeat patterns. Hence, de Bruijn graph will be decomposed into small connected components when k is large enough. Each component corresponds to one gene in most cases and contains not many transcripts. A heuristic algorithm based on pair-end reads is then used to find the isoforms in T-IDBA.

Current Release

Latest version is avaliable in GitHub.

IDBA 1.1.1

Some bug fixes. Use 16 bits to store read length. All IDBA assemblers will support read length up to 65535 by modifying kMaxShortSequence in src/sequence/short_sequence.h

Download current release

(Please note that T-IDBA is out of maintainance now, we recommend using IDBA-Tran instead which generally performs better.)

IDBA, IDBA-UD, IDBA-Hybrid and IDBA-Tran all in one package Released Oct 18, 2012

All IDBA (iterative de Bruijn graph assembler) series assemblers are refined and included in this package. Plenty of errors are fixed and scaffolding on multiple levels of paired-end reads are supported in IDBA, IDBA-UD and IDBA-Hybrid.

The basic IDBA is included only for comparison.
If you are assembling genomic data without reference, please use IDBA-UD.
If you are assembling genomic data with a similar reference genome, please use IDBA-Hybrid.
If you are assembling transcriptome data, please use IDBA-Tran.

Download release 1.1.0

Please follow the instruction in README file to run the software.

Project Description

T-IDBA is an open source de novo assembler for next-generation short read RNA sequences.

Publications

Yu Peng, Henry Leung, S.M. Yiu, Francis Y.L. Chin. T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome (accepted by RECOMB 2011)

Abstract: RNA sequencing based on next-generation sequencing technology is useful for analyzing transcriptomes, discovering novel genes and studying exon/intron structures. Similar to genome assembly, de novo transcriptome assembly does not rely on a reference genome and additional annotated information. Most, if not all, existing de novo transcriptome assemblers rely heavily on de novo genome assembly techniques without fully utilizing the properties of transcriptomes and may result in short contigs because of the splicing nature (shared exons) of the genes and the repeats existing in different genes. In this paper, we analyze the properties of the mammalian transcriptome and propose an algorithm to reconstruct expressed isoforms without a reference genome. We extend the iterative de Bruijn graph approach (IDBA) and use pair-end information to solve the problem of long repeats in different genes and the problem of branching introduced by shared exons in the same gene. The graph will then be decomposited into small components, each of which contains a few, if not single, genes. The most possible isoforms which have the most support from the pair-end reads will then be found by depth-first search heuristically. In practice, our de novo transcriptome assembly software, T-IDBA, outperforms Abyss (one of the newest de novo transcriptome assembly tools) substantially in terms of sensitivity and precision for both simulated and real data. We also provide a theoretical analysis of T-IDBA’s performance, which shows that T-IDBA guarantees most isoforms can be recovered as long as the coverage of the isoforms by reads exceeds a certain threshold and matchs with T-IDBA’s performance.

References

If you use our assembler in your research, please cite our papers.

Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.

Peng, Y., et al. (2011) T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome. RECOMB, Vancouver.

Contact

E-mail: Peng Yu

more support infomation...

Bioinfomatics Research Group

Computer Science, The University of Hong Kong