Fast Exact String Matching On the GPU - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Fast Exact String Matching On the GPU

Description:

768 MB total on board RAM. 2D Texture Cache for large readonly data ... Comparing running time of (serial) CPU versus (parallel) GPU programs. CPU: 3.0 GHz Intel Xeon ... – PowerPoint PPT presentation

Number of Views:243
Avg rating:3.0/5.0
Slides: 18
Provided by: Michael2026
Category:
Tags: gpu | exact | fast | matching | string

less

Transcript and Presenter's Notes

Title: Fast Exact String Matching On the GPU


1
Fast Exact String Matching On the GPU
  • Michael C. Schatz and Cole Trapnell
  • May 8, 2007
  • CMSC 740 Computer Graphics

2
String Matching Applications
  • A very common problem in computational biology is
    to find all occurrences (or approximate
    occurrences) of one string in another string
  • Genome Assembly, Gene Finding, Comparative
    Genomics, Functional analysis of proteins, Motif
    discovery, SNP analysis, Phylogenetic analysis,
    Primer Design
  • Short Read Resequencing 200 Million 50bp reads
  • Sequence databases are huge, and growing
    exponentially
  • We need ever faster methods for string matching

3
Suffix Trees to the Rescue
  • Tree of all suffixes of string S
  • Suffix i encoded on path to leaf i
  • Nodes positions where suffixes diverge
  • Edges substrings of S
  • Leaves starting position of suffix
  • Suffix Links traverse to next suffix
  • O(n) Construction
  • Ukkonens Algorithm
  • Exploits inter-suffix relationships and suffix
    links
  • O(k) Query Match
  • Every substring Si,j is a prefix of suffix i.
  • Walk from root following the characters in the
    query Q.
  • One leaf for each occurrence of Q in T.

Suffix tree of ACATAC
858E Algorithms for Biosequence Analysis
4
Suffix Tree Search
TAC

A
C
7
4
TAC

ATAC
C
3
6
2
ATAC

5
1
Searching for ATA
Suffix tree of ACATAC
5
Suffix Tree Search
TAC

A
C
7
4
TAC

ATAC
C
3
6
2
ATAC

5
1
Searching for ATA
Suffix tree of ACATAC
6
Suffix Tree Search
TAC

A
C
7
4
TAC

ATAC
C
3
6
2
ATAC

5
1
Searching for ATA
Suffix tree of ACATAC
7
Suffix Tree Search
TAC

A
C
7
4
TAC

ATAC
C
3
6
2
ATAC

5
1
Searching for ATA found at position 3!
Suffix tree of ACATAC
8
Suffix Tree Search
TAC

A
C
7
4
TAC

ATAC
C
3
6
2
ATAC

5
1
Searching for AC found at positions 1
5 Searching for ACT falls off tree gt Not
in S
Suffix tree of ACATAC
9
GPGPU Programming
  • Utilize the highly parallel SIMD architecture of
    the GPU
  • Nominally used for in parallel triangle
    rendering, texture application
  • Each processor executes same kernel
  • Dramatic runtime improvement for scientific
    applications
  • CUDA Architecture
  • API and runtime library to implement C style
    programming of stream processors
  • nVidia GeForce 8800 GTX (G80)
  • 16 multiprocessors w/ 8 processors
  • 128 stream processors _at_ 1.35 GHz
  • 768 MB total on board RAM
  • 2D Texture Cache for large readonly data

Image from CUDA Programming Guide
10
Cmatch GPU Algorithm
  • Load Reference String
  • Create Suffix Tree
  • Load Query Strings
  • Transfer data to GPU
  • Execute Query Kernel
  • Up to 128 simultaneous matches on GPU
  • Fetch Results from GPU
  • Output results

11
Data Structures on the GPU
  • Suffix tree nodes gt 2D Texture
  • Encode node information children pointers as
    RGBA color of texel
  • Arrange nodes in 32x32 blocks along space filling
    curve
  • Optimize near root for inter-thread caching,
    further down for an individual thread.
  • Reference String gt 2D Texture
  • Access many successive characters along edge
  • Query Strings gt On Board RAM
  • Q array with offsets in a large array of
    strings
  • Results buffer gt On Board RAM
  • Q array with id of last visited node for query
    i

12
Experimental Protocol
  • Comparing running time of (serial) CPU versus
    (parallel) GPU programs
  • CPU 3.0 GHz Intel Xeon
  • GPU nVidia GeForce 8800 GTX (128 processors _at_
    1.35 GHz)
  • Simulate short read resequencing projects by
    extracting substrings of reference sequences
  • References
  • Genome of Bacillus anthracis (5.20 Mbp)
  • Genome of Yersinia pestis (4.6 Mbp)
  • BAC-sized portion of Human Chromosome 2 (200 kbp)
  • Query sets (250 Mbp total)
  • 10 million x 25 bp
  • 5 million x 50 bp
  • 1.25 million x 200 bp
  • 312,500 x 800 bp

13
Query Time Results
Speedup of the GPU match kernel versus CPU match
program.
14
Long Read Query Time Results
Future work to improve cache hit rate for longer
reads.
15
Processing Time
GPU Cmatch is bounded by time to construct suffix
tree and IO processing time
16
Conclusions
  • We have reduced the computation processing time
    for short read resequencing from hours to
    minutes.
  • Make sure you have sufficient cooling available
  • Low arithmetic intensity GPGPU programs can have
    dramatic performance improvements (35x) over CPU
    execution
  • Utilizing the texture cache with careful node
    placement and minimizing register use were
    essential to high performance
  • A single GPU can supply same processing power as
    a small computer cluster at a fraction of the
    cost
  • Installing GPUs into an existing cluster can
    provide an order of magnitude increase in
    computing capacity.
  • More information
  • http//www.cbcb.umd.edu/software/cmatch

17
Texture Space filling curve
  • Texture cache organized in 2x2 blocks.
  • Try to place all children of a node are in the
    same cache block
Write a Comment
User Comments (0)
About PowerShow.com