BIOINFORMATICS SOFTWARE
SOAP3: Short read alignment
SOAP3 is a GPU-based software for aligning short reads to a reference
sequence. It can find all alignments with k mismatches, where k is
chosen from 0 to 4. When compared with its previous version SOAP2,
SOAP3 can be up to tens of times faster. For example, when aligning
length-100 reads with the human genome, SOAP3 is the first software
that can find all 4-mismatch alignments in tens of seconds per million
reads.
http://www.cs.hku.hk/2bwt-tools/soap3
ALSE: Motif Finding Tool
ALSE is a software package that discovers the common patterns in a set of DNA sequences.
ALSE finds common patterns of the binding sites for an unknown transcription factor from two sets of sequences, one known to contain binding sites while the other supposed to contain none. Our package can perform exceptionally well when both sets of sequences are available and comparatively well when only the former set is available.
http://alse.cs.hku.hk/
SDS: siRNA Design Software
SDS (siRNA Design Software) is a software tool that helps to design siRNAs for silencing gene expression. The tool takes an mRNA sequence and makes use of existing design tools to output a set of candidates. These candidates are then filtered based on the secondary structure.
SDS is a web-based software with the following features:
* It provides a unified platform for using existing software.
* It compares the output of existing software.
* It enhances the existing software by filtering ineffective siRNAs based on secondary structure.
http://www.cs.hku.hk/~sirna
MSS: Whole Genome Alignment using a Mutation Sensitive Approach
MSS is a software that, given the genomes of two related species, locates regions that probably contain genes and other regulatory elements that are conserved over the two species.
http://www.cs.hku.hk/~mss/
IDBA: Iterative De Bruijn graph short read Assembler
IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values. So, it will perform better than other assemblers.
http://i.cs.hku.hk/~alse/idba
T-IDBA: Iterative De Bruijn Graph De Novo short read Assembler for transcriptome
T-IDBA is an iterative De Bruijn Graph De Novo short read assembler for transcriptome. It is purely de novo assembler based on only RNA sequencing reads. In this assembler, not only the reads but also the pair-end information is used to increase the k value in the accumulated de Bruijn graph. Because of the nature of the transcriptome, the transcripts from different genes share only very few repeat patterns. Hence, de Bruijn graph will be decomposed into small connected components when k is large enough. Each component corresponds to one gene in most cases and contains not many transcripts. A heuristic algorithm based on pair-end reads is then used to find the isoforms in T-IDBA.
http://i.cs.hku.hk/~alse/tidba
Meta-IDBA: Iterative De Bruijn Graph De Novo short read Assembler for metgenomic based on graph partition
Meta-IDBA is an iterative De Bruijn Graph De Novo short read assembler specially designed for de novo metagenomic assembly. One of the most difficult problem in metagenomic assembly is that similar subspecies of the same species mix together to make the de Bruijn graph very complicated and intractable. Meta-IDBA handles this problem grouping similar regions of similar subspecies by partitioning the graph into components based on the topological structure of the graph. Each component represents a similar region between subspecies from the same species or even from different species. After the components are separated, all contigs in it are aligned to produced a consensus and also the multiple alignment.
http://i.cs.hku.hk/~alse/metaidba
MetaCluster
MetaCluster is an unsupervised binning method for metagenomic sequences.Existing binning methods based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms and phylogenetic markers. While MetaCluster is an integrated binning method based on the unsupervised top-down separation and bottom-up merging strategy, it can bin metagenomic sequencing datasets with mixed complex species abundance ratios from the exactly equal situation to the extremely unbalanced situation with consistently higher accuracy when compared with other recently reported methods.
http://i.cs.hku.hk/~alse/MetaCluster
Core for predicting protein complex from PPI network
Core is an open source program for predicting protein complex from PPI network.
http://i.cs.hku.hk/~alse/complexes
Voting Algorithm
Voting is a software package that discovers common patterns, motif, in a set of DNA sequences. Voting is efficient to solve the planted (l,d) motif problem which discover a hidden length-l string motif appear in each input DNA sequence with at most d Hamming distance. Our package guarantees discovering all motifs in a short time.
http://i.cs.hku.hk/~alse/hkubrg/projects/Voting/