Bioinfomatics Research Group

Computer Science, The University of Hong Kong

Project Description

MetaCluster is an unsupervised binning method for metagenomic sequences.Existing binning methods based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms and phylogenetic markers. While MetaCluster is an integrated binning method based on the unsupervised top-down separation and bottom-up merging strategy, it can bin metagenomic sequencing datasets with mixed complex species abundance ratios from the exactly equal situation to the extremely unbalanced situation with consistently higher accuracy when compared with other recently reported methods.

Current Release

Click here for the latest version

Old Release

MetaCluster-64bit-Linux Released Jan 10,2011

Download current release

Usage

1. decompress the file.
2.Use the command :
 #make
The files will be compiled.
3. the tools 'metaCluster-3.0' is in the directory: /bin the command format:
 # ./metaCluster-3.0 input-file [--thresh t]

 input-file: The input file you want to bin. [--thresh t]:
 The threshold for the bottom-up merge step. The default value for t is 0.9.

Publications

Bin Yang, Yu Peng, Henry C.M. Leung, S.M. Yiu, Junjie Qin, Ruiqiang Li, Francis Y.L. Chin, MetaCluster: Unsupervised Binning of Environmental Genomic Fragments and Taxonomic Annotation

Limited by the laboratory technique, traditional microorganism research usually focuses on one single individual species. This significantly limits the deep analysis of intricate biological processes among complex microorganism communities. With the rapid development of genome sequencing techniques, the traditional research methods of microorganisms based on the isolation and cultivation are gradually replaced by metagenomics, also known as environmental genomics. The first step, which is also the major bottleneck of metagenomic data analysis, is the identification and taxonomic characterization of the DNA fragments (reads) resulting from sequencing a sample of mixed species. This step is usually referred as “binning”. Existing binning methods based on sequence similarity and sequence composition markers rely heavily on the reference genomes of known microorganisms and phylogenetic markers. Due to the limited availability of reference genomes and the bias and unstableness of markers, these methods may not be applicable in all cases. Not much unsupervised binning methods are reported, but the unsupervised nature of these methods makes them extremely difficult to annotate the clusters with taxonomic labels. In this paper, we present MetaCluster 2.0, an unsupervised binning method which could bin metagenomic sequencing datasets with high accuracy, and also identify unknown genomes and annotate them with proper taxonomic labels. The running time of MetaCluster 2.0 is at least 30 times faster than existing binning algorithms.

Contact

E-mail: Wang Yi

more support infomation...