Bioinfomatics Research Group

Computer Science, The University of Hong Kong

Projects

The finished and current projects.

IDBA

A Practical Iterative De Bruijn Graph De Novo Assembler related to Sequence assembly problem in bioinfomatics. In most of the assembler based on de Bruijn graph, they build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. Other than using a single k value, IDBA uses not only one specific k but use a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values. So, it will perform better than the other assemblers.

T-IDBA

T-IDBA is a Iterative De Bruijn Graph De Novo short read Assembler for transcriptome. It is purely de novo assembler based on only the RNA sequencing reads. In this assembler, not only the reads but also the pair-end information is used to increase the k value in accumulated de Bruijn graph. Because of the nature of the transcriptome, the transcripts from different genes share only very few repeat patterns. Hence, de Bruijn graph will be decomposited into small connected components when k is large enough. Each component corresponds to one gene in most cases and contains not many transcripts. A heuristic algorithm based on pair-end reads is then used to find the isoforms in T-IDBA.

MetaCluster

MetaCluster is an unsupervised binning method for metagenomic sequences.Existing binning methods based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms and phylogenetic markers. While MetaCluster is an integrated binning method based on the unsupervised top-down separation and bottom-up merging strategy. It can bin metagenomic sequencing datasets with mixed complex species abundance ratios from the exactly equal situation to the extremely unbalance situation with consistently higher accuracy when compared with other recently reported methods.

Core

Core is an open source program for predicting protein complex from PPI network.

Meta-IDBA

Meta-IDBA is a Iterative De Bruijn Graph De Novo short read Assembler for metgenomic based on graph partition. It is an assembler specially designed for de novo metagenomic assembly. One of the most difficult problem in metagenomic assembly is that similar subspecies of the same species mix together to make the de Bruijn graph very complicated and intractable. Meta-IDBA handles this problem groupsing the similar regions of similar subpecies by partitioning the graph into components based on the topological structure of the graph. Each components represent a similar region between subspecies from the same species or even from different species. After the components are separated, all contigs in it are aligned to produced a consensus and also the multiple alignment. 

DMPFinder

Supplementary data for DMPFinder