Title: Biosequence Similarity Search on the Mercury System
1Biosequence Similarity Search on the Mercury
System
- Praveen Krishnamurthy, Jeremy Buhler, Roger
Chamberlain, Mark Franklin, Kwame Gyang, and
Joseph Lancaster - Department of Computer Science and Engineering,
Washington University in Saint Louis, MO
Supported by an NIH STTR Grant NSF Grants
DBI-0237902, ITR-0313203, CCR-0217334
2Outline
- Overview of BLAST
- Overview of the Mercury system
- Description of BLASTN algorithm
- Algorithmic changes to BLASTN
- Improvement in performance
- Related work
- Conclusion
3Basic Local Alignment Search Tool
- Biosequence comparison software
- Query sequence (new genome) to large database of
known biosequences - Look for similar regions
- Exponential growth of genomic databases
- Longer time for searches to complete
- Solutions
- Perform comparison over multiple machines
- Specialized hardware - Our Approach
4The Mercury System
5The Mercury System
- Proximity to disk
- Simple operations performed close to disk
- Avoids CPU use
- 400 Mbytes/s throughput from the disk
- Concurrent Independent operation
- Does not use processor cache cycles, memory or
I/O buses - Reconfigurable logic
- Logic can be tuned to the particular need of the
application
6BLASTN
- BLASTN
- Both the query and the database are long DNA
strings - Consist of A, C, T, G and some unknowns
- Each stage processes lesser data
- The stages become more computationally expensive
7BLASTN - Terminology
Query
ACTGTGTTTCACTGACGGGTGT
Database
CTGTGTCCCCAACACTGCTGACGTAGAATCGTGTAG
w-mer is a sequence of w consecutive bases
8BLASTN - Pipeline - Stage 1
- Matches each 11-mer in query to database
- Exact string matching
- 83 of overall time is spent in this stage
- Filters 92 of data entering this stage
- Only 8 of data proceeds to the next stage
9BLASTN - Pipeline - Stage 2
- Extends the matches from stage 1
ACTGTGTTTCACTGACGGGTGT
GTGTCCCCAACATTTCACTGACGAGAATCGTGTAG
10BLASTN - Pipeline - Stage 2
- Extends the matches from stage 1
- Allows mismatches of individual bases
- Does not allow gaps in either the query or the
database - Match score should be higher than threshold to
proceed - 16 of pipeline time is spent is this stage
- Only 2/100,000 of data entering this stage
proceeds to the next stage
11BLASTN - Pipeline - Stage 3
- Extends the matches from stage 2
ACCACTGTTTCACTGACG_GA_T_GT
CTGTGTCCCCAC_GTTTCACTGACGAGAATCGTGTAG
12BLASTN - Pipeline - Stage 3
- Extends the matches from stage 2
- Scores matches with Gaps inserted in both the
sequences - Smith-Waterman dynamic programming algorithm
- lt1 of pipeline time is spent is this stage
13NCBI - BLASTN
- Stage 1 (word matching) is implemented as a
lookup table - Efficient only for certain word lengths (w 11)
- Performance degrades dramatically for larger
query sizes
Pentium-4 2.6GHz 1Gbyte RAM
14Firmware implementation - Stage 1
Eliminates false-positives from Bloom filters,
obtain offset in query
Discards matches that are close to one another
Matches 11-mers to query, but generates
false-positives
15Bloom filters operation
Programming the query into the bloom filter
(processing query)
K Hash Functions
query
11-mer
m-bit vector
16Bloom filters operation
Finding matches in the database
1 Potential match
K Hash Functions
database
11-mer
0 Not a match
m-bit vector
17Bloom filters operation
Finding matches in the database
?
1 Potential match
K Hash Functions
?
database
11-mer
0 Not a match
?
m-bit vector
False positives are eliminated using a hash
table
18Bloom filter performance
19Performance analysis
Firmware Vs. Software Stage 1
20Overall system throughput
Tputoverall min (Tput1, Tput(23))
21Stage 2 in firmware - Throughput
22Stage 2 in firmware - Speedup
23Related work
- Hardware based commercial systems
- Paracel GeneMatcherTM, used ASIC, and hence is
inflexible - RDisk, FPGA based system with throughput of 60
Mbases/s for stage 1 - High-end commercial system
- Paracel BLASTMachine2TM, 32 CPU linux cluster
- 2.93 Mbases/s for 2.8 Mbase query
- 2 times faster than 1-node Mercury BLASTN
- TimeLogic DeCypherBLASTTM, FPGA based
- 213 Kbases/s for a 16 Mbase query
- Comparable to 1-node Mercury BLASTN
24Conclusion
- BLASTN on the Mercury system
- Bloom filters to improve performance of stage 1
- Efficient hash functions in hardware
- 7x improvement in speed with only stage 1
firmware - gt50x speedup with stage 2 implemented in firmware
- Future work
- Algorithmic changes to stage 2
- Efficient use of hardware capabilities
- Other apps
- BLASTP, BLASTX etc.
25Thank you