Retrieving Objects from Toriodal Mesh Data Using FastBit Technology PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Retrieving Objects from Toriodal Mesh Data Using FastBit Technology


1
Retrieving Objects from Toriodal Mesh Data Using
FastBit Technology A Progress Report
  • Outline
  • Overview of FastBit technology
  • Recent progresses

John Wu Scientific Data Management, Berkeley Lab
http//sdm.lbl.gov/fastbit
2
FastBit Started In a Big Smash
  • Searching for clues to Quark-Gluon Plasma in a
    large set of high-energy collisions
  • High-Energy Physics experiment STAR
  • 600 participants / 50 institutions / 12 countries
  • Data rate 200 MB/s
  • Data collected 5 PB
  • 1 Billion collision events, 5 MB per event
    (equivalent to having millions of variables)
  • Challenge finding 100 or so events with the best
    evidence of QGP

3
FastBit 10x Faster than DBMS
2-D queries
5-D queries
  • Queries on 12 most queried attributes (2.2
    million records) from STAR High-Energy Physics
    Experiment, average attribute cardinality 222,000
  • Experiments confirm that
  • WAH compressed indexes are 10X faster than bitmap
    indexes from a DBMS, 5X faster than our own
    implementation of BBC
  • Size of WAH compressed indexes is only 30 of raw
    data size (a B-tree index from a popular DBMS
    system is 3-4X)

Wu, Otoo, Shoshani 2001
4
FastBit Grew with a Big Boom
  • Searching for a more fuel efficient combustion
    engine (Homogeneous-Charge Compression Ignition
    engine)
  • Require detailed numerical simulation with
    hundreds of variables
  • Simulation mesh 1000 x 1000 x 1000
  • 1000s time steps per simulation
  • Challenge finding and tracking ignition kernels

5
FastBit Finds Volumes Faster Than Best Isocontour
Finder
  • FastBit finds volume of interest efficiently with
    compressed representation of the volume
  • FastBit identifies volumes of interest as
    efficient as the best algorithm that identify the
    surface only (isocontouring), in theory
  • FastBit is three times faster than the best
    isocontouring algorithm in VTK

3X
Wu, Koegler, Chen, Shoshani 2003
Stockinger, Shalf, Bethel, Wu 2005
6
FastBit Milestones
  • 2007/08 FastBit speed up drug discovery tool
    (first publication not involving any FastBit
    developers)
  • 2007/08 First public release, version a0.7
  • 2007/06 Physical design reviewed
  • 2007/06 First PhD thesis involving FastBit
    completed
  • 2006/03 Prove formal optimality
  • 2006/02 Work on Enron data made headline at
    PRIMEUR
  • 2005/05 Appeared in ACM TechNews
  • 2005/05 Grid Collector wins ISC Award
  • 2005/01 CRD news report on FastBit
  • 2004/12 WAH patent issued

7
FastBit Progress Report
  • Two-level encoding
  • Feature identification on toroidal mesh

http//sdm.lbl.gov/fastbit
8
Two Levels Are Better Than One
  • Most commonly used bitmap index is one-level
    equality encoded (e1)
  • Multi-level encoding was postulated to possibly
    improve query performance Wu, Otoo, Shoshoni,
    2000 Sinha, Winslett, 2007
  • Through extensive analyses, we found the correct
    number of coarse level bins to use, and ensure
    that the two-level encoding always perform better
    Wu, Stockinger, Shoshani

bn binary encoding e1 one-level equality ee
equality-equality re range-equality ie
interval-equality
9
Feature Identification on Toroidal Mesh
  • Defines connectivity based on the distances
    computed from (x, y, z) coordinates
  • Two ways to speed up the feature identification
  • work with lines instead of points
  • use an efficient connected component labeling
    algorithm
  • 10 100 times faster than working with points
    Sinha, Winslett, Wu

10
Better Approach Redefine Connectivity
  • Redefine connectivity based on toroidal
    coordinates
  • Node A is connect to B and C on the same circle
  • To D and E on the circle just below the current
    one in the same plane
  • To F and G on the circle of the same radius in
    the plane just before
  • By symmetry, there are four more points on
    circles above and after
  • A total of 10 neighbors for every node more
    than previous approach
  • Advantages of such connectivity definition
  • Neighbors of consecutive nodes on a circle, i.e.,
    arc, also form arcs
  • These neighboring arcs fall on four different
    circles
  • Our labeling algorithm examines only two out of
    four circles

11
New Connectivity Improves Region Finding
Speedup Speedup
Torus 1 v. XYZ 25
Torus 2 v. XYZ 150
Torus 2 v. Torus 1 6
  • Preliminary results
  • Three different labeling methods shown
  • XYZ a nearest-neighbor mesh constructed from (x,
    y, z) coordinates
  • Torus 1 connectivity described on previous
    page, label nodes
  • Torus 2 connectivity described previously,
    label arcs
  • Speedup ratio of total time used by two methods

12
New Approach Scales Well
  • Approach torus 1 scales linearly with the
    number of nodes in the regions of interest
  • Approach torus 2 scales linearly with the
    number of arcs in the regions on interests
  • Number of arcs lt number of nodes on the
    boundaries of the regions
  • O(arcs) ? O(boundary)
  • For regions defined with simple range conditions
    such potential gt 1e-8, where the boundaries of
    the regions are isocontours, approach torus 2
    scales as well as the best isocontouring
    algorithms
  • Need formal proof

13
Future Plans
  • GTC data
  • Wrap up the current work on 3D GTC data
  • Prepare for new 5D data
  • Add visualization front-end
  • Work with particles
  • FastBit software
  • Python API?
  • Other applications
  • Visualization
  • ?

14
FastBit is an efficient searching tool for
data-driven science. Key techniques in FastBit
have been extensively exercised. If you have an
application that requires searching operations,
feel free to contact us.
Contact Information Contact Information
FastBit website http//sdm.lbl.gov/fastbit
Johns email John.Wu_at_nersc.gov
Aries email Arie_at_lbl.gov
Write a Comment
User Comments (0)
About PowerShow.com