BWT-SW

About BWT-SW

BWT-SW is a local alignment tool for searching nucleotide sequences. It performs the same function as BLASTn, the BLAST program for finding regions of local similarity between nucleotide sequences. While BLAST is an approximation of the Smith-Waterman local alignment algorithm and may miss significant alignments (see BLAST Sensitivity), BWT-SW finds all local alignments1. The running speed of BWT-SW depends on the lengths of the database sequence and the query sequence. On a set of experiments using human genome as the database sequence, BWT-SW takes the same order of time as BLASTn on query sequences of length 1000 nucleotides or less (see Performance). As far as we know, BWT-SW is the first practical tool that can find all local alignments.

 

BWT-SW makes use of BWT, which stands for Burrow-Wheeler Transform, to index database sequences. BWT was invented as a compression technique2. It was later extended to support pattern matching3. It only occupies 2 bits of memory per nucleotide, making it possible for us to index nucleotide sequences as long as 4G base pairs in an ordinary PC4. Nucleotide sequences have to be pre-processed before searching can be performed. The pre-processing includes the construction of BWT and the supporting auxiliary data structures. Our implementation takes about 50 minutes to construct the BWT of the human genome and another 25 minutes to build the auxiliary data structures on a Pentium D 3.6GHz PC.

 

BWT-SW is an experimental implementation. It supports the more important functions and parameters of BLASTn. Currently, we have no plan to extend our implementation. Testing of the software is done on a Linux operating system. More extensive testing is done with the default BLASTn parameters. We do not guarantee the correctness of our implementation but we are eager to improve it. Bug reports and other suggestions are welcome.

 

The following features are supported by BWT-SW:

  • Nucleotide vs. nucleotide search with the same statistical measures used by BLASTn
  • Adjustable statistical and scoring parameters.
  • Query filtering by DUST5
  • Lower case query filtering
  • Pairwise (i.e., detailed view showing how regions of local similarity are aligned) or tabular output

 

BWT-SW is written and maintained by C.K. Wong. The code for BLAST statistics is provided by S.S. Wong. It is based on the research conducted by T.W. Lam, W.K. Sung, S.L. Tam, C.K. Wong, and S.M. Yiu. T.W.Lam, S.L. Tam, C.K. Wong, and S.M. Yiu are at the University of Hong Kong and W.K. Sung is at the National University of Singapore.

 

All source code is freely available under the GPL license agreement. You can view a copy of the license agreement here.

 

 

 

1 Alignments are filtered for common starting points and common ending points. The best alignment among alignments with common starting points and common ending points is reported.

2 M. Burrow and D.J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Paolo Alto, California, 1994.

3 P. Ferragina and G. Manzini. Opportunistic data structures with applications. In FOCS, pages 390--398, 2000.

4 Assuming a PC equipped with at least 3G RAM.

5 Source code for DUST filtering adapted from public domain NCBI C toolkit.

 

 

Number of visitors to the download page:

 ********