BLAST is the standard tool that molecular biologists use to search for sequency similarity in genomic (and protein) databases. It employs a brute force approach of comparing a query sequence against every database sequence - for each pair of the sequences to be matched, BLAST searches for short fixed-length word pairs (seeds i) in the sequences and then extends them to higher-scoring regions. To search multiple queries, the basic approach is to run BLAST on each of the queries one at a time. This is clearly inefficient and fails to exploit common subsequences that the collection of queries may share. In this paper, we propose anew genome search tool, BLAST++, that allows multiple, say K, queries to be searched against a database concurrently. The design of BLAST++ is based on our observation that the seed searching step of BLAST is a bottleneck that consumes more than 80% of the total response time! BLAST++ essentially treats a collection of queries as a single virtual query so that the seed searching step needs to be permofred only once for common subsequences. We implemented BLAST++ as an extension of the NCBI BLAST, and evaluated its performance. Our study shows that the results obtained by BLAST++ are identical to that obtained by executing BLAST on each of the K queries, but the single-process version of BLAST++ copletes the processing in a much shorter time, about only 25% of the original single-process version of NCBI BLAST. |
Cite as: Wang, H., Ong, T.-H., Ooi, B.C. and Tan, K.-L. (2003). BLAST++ : A Tool for BLASTing Queries in Batches. In Proc. First Asia-Pacific Bioinformatics Conference (APBC2003), Adelaide, Australia. CRPIT, 19. Chen, Y.-P. P., Ed. ACS. 71-79. |
(from crpit.com)
(local if available)
|