BWT-SW is a local alignment tool for searching
nucleotide sequences. It performs the same function as BLASTn, the BLAST
program for finding regions of local similarity between nucleotide sequences.
While BLAST is an approximation of the Smith-Waterman local alignment algorithm
and may miss significant alignments (see BLAST
Sensitivity), BWT-SW finds all local alignments1.
The running speed of BWT-SW depends on the lengths of the database sequence and
the query sequence. On a set of experiments using human genome as the database
sequence, BWT-SW takes the same order of time as BLASTn on query sequences of
length 1000 nucleotides or less (see Performance).
As far as we know, BWT-SW is the first practical tool that can find all local
alignments.
BWT-SW makes use of BWT, which stands for
Burrow-Wheeler Transform, to index database sequences. BWT was invented as a
compression technique2. It was later
extended to support pattern matching3.
It only occupies 2 bits of memory per nucleotide, making it possible for us to
index nucleotide sequences as long as 4G base pairs in an ordinary PC4. Nucleotide sequences have to be
pre-processed before searching can be performed. The pre-processing includes
the construction of BWT and the supporting auxiliary data structures. Our
implementation takes about 50 minutes to construct the BWT of the human genome
and another 25 minutes to build the auxiliary data structures on a Pentium D
3.6GHz PC.
BWT-SW is an experimental implementation. It
supports the more important functions and parameters of BLASTn. Currently, we
have no plan to extend our implementation. Testing of the software is done on a
Linux operating system. More extensive testing is done with the default BLASTn
parameters. We do not guarantee the correctness of our implementation but we
are eager to improve it. Bug reports and other suggestions are welcome.
The following features are supported by BWT-SW:
BWT-SW is written and maintained by C.K. Wong. The
code for BLAST statistics is provided by S.S. Wong. It is based on the research
conducted by T.W. Lam, W.K. Sung, S.L. Tam, C.K. Wong, and S.M. Yiu. T.W.Lam,
S.L. Tam, C.K. Wong, and S.M. Yiu are at the
All source code is freely available under the GPL
license agreement. You can view a copy of the license agreement here.
1 Alignments are filtered for common starting points and common ending points. The best alignment among alignments with common starting points and common ending points is reported.
2 M. Burrow and D.J. Wheeler. A block-sorting
lossless data compression algorithm. Technical Report 124, Digital Equipment
Corporation,
3 P. Ferragina and G. Manzini. Opportunistic data structures with applications. In FOCS, pages 390--398, 2000.
4 Assuming a PC equipped with at least 3G RAM.
5 Source code for DUST filtering adapted from public
domain NCBI C toolkit.
Number of visitors to the download page: