Title: Retrieving Objects from Toriodal Mesh Data Using FastBit Technology
1Retrieving Objects from Toriodal Mesh Data Using
FastBit Technology A Progress Report
- Outline
- Overview of FastBit technology
- Recent progresses
John Wu Scientific Data Management, Berkeley Lab
http//sdm.lbl.gov/fastbit
2FastBit Started In a Big Smash
- Searching for clues to Quark-Gluon Plasma in a
large set of high-energy collisions - High-Energy Physics experiment STAR
- 600 participants / 50 institutions / 12 countries
- Data rate 200 MB/s
- Data collected 5 PB
- 1 Billion collision events, 5 MB per event
(equivalent to having millions of variables) - Challenge finding 100 or so events with the best
evidence of QGP
3FastBit 10x Faster than DBMS
2-D queries
5-D queries
- Queries on 12 most queried attributes (2.2
million records) from STAR High-Energy Physics
Experiment, average attribute cardinality 222,000 - Experiments confirm that
- WAH compressed indexes are 10X faster than bitmap
indexes from a DBMS, 5X faster than our own
implementation of BBC - Size of WAH compressed indexes is only 30 of raw
data size (a B-tree index from a popular DBMS
system is 3-4X)
Wu, Otoo, Shoshani 2001
4FastBit Grew with a Big Boom
- Searching for a more fuel efficient combustion
engine (Homogeneous-Charge Compression Ignition
engine) - Require detailed numerical simulation with
hundreds of variables - Simulation mesh 1000 x 1000 x 1000
- 1000s time steps per simulation
- Challenge finding and tracking ignition kernels
5FastBit Finds Volumes Faster Than Best Isocontour
Finder
- FastBit finds volume of interest efficiently with
compressed representation of the volume - FastBit identifies volumes of interest as
efficient as the best algorithm that identify the
surface only (isocontouring), in theory - FastBit is three times faster than the best
isocontouring algorithm in VTK
3X
Wu, Koegler, Chen, Shoshani 2003
Stockinger, Shalf, Bethel, Wu 2005
6FastBit Milestones
- 2007/08 FastBit speed up drug discovery tool
(first publication not involving any FastBit
developers) - 2007/08 First public release, version a0.7
- 2007/06 Physical design reviewed
- 2007/06 First PhD thesis involving FastBit
completed - 2006/03 Prove formal optimality
- 2006/02 Work on Enron data made headline at
PRIMEUR - 2005/05 Appeared in ACM TechNews
- 2005/05 Grid Collector wins ISC Award
- 2005/01 CRD news report on FastBit
- 2004/12 WAH patent issued
7FastBit Progress Report
- Two-level encoding
- Feature identification on toroidal mesh
http//sdm.lbl.gov/fastbit
8Two Levels Are Better Than One
- Most commonly used bitmap index is one-level
equality encoded (e1) - Multi-level encoding was postulated to possibly
improve query performance Wu, Otoo, Shoshoni,
2000 Sinha, Winslett, 2007 - Through extensive analyses, we found the correct
number of coarse level bins to use, and ensure
that the two-level encoding always perform better
Wu, Stockinger, Shoshani
bn binary encoding e1 one-level equality ee
equality-equality re range-equality ie
interval-equality
9Feature Identification on Toroidal Mesh
- Defines connectivity based on the distances
computed from (x, y, z) coordinates - Two ways to speed up the feature identification
- work with lines instead of points
- use an efficient connected component labeling
algorithm - 10 100 times faster than working with points
Sinha, Winslett, Wu
10Better Approach Redefine Connectivity
- Redefine connectivity based on toroidal
coordinates - Node A is connect to B and C on the same circle
- To D and E on the circle just below the current
one in the same plane - To F and G on the circle of the same radius in
the plane just before - By symmetry, there are four more points on
circles above and after - A total of 10 neighbors for every node more
than previous approach - Advantages of such connectivity definition
- Neighbors of consecutive nodes on a circle, i.e.,
arc, also form arcs - These neighboring arcs fall on four different
circles - Our labeling algorithm examines only two out of
four circles
11New Connectivity Improves Region Finding
Speedup Speedup
Torus 1 v. XYZ 25
Torus 2 v. XYZ 150
Torus 2 v. Torus 1 6
- Preliminary results
- Three different labeling methods shown
- XYZ a nearest-neighbor mesh constructed from (x,
y, z) coordinates - Torus 1 connectivity described on previous
page, label nodes - Torus 2 connectivity described previously,
label arcs - Speedup ratio of total time used by two methods
12New Approach Scales Well
- Approach torus 1 scales linearly with the
number of nodes in the regions of interest - Approach torus 2 scales linearly with the
number of arcs in the regions on interests - Number of arcs lt number of nodes on the
boundaries of the regions - O(arcs) ? O(boundary)
- For regions defined with simple range conditions
such potential gt 1e-8, where the boundaries of
the regions are isocontours, approach torus 2
scales as well as the best isocontouring
algorithms - Need formal proof
13Future Plans
- GTC data
- Wrap up the current work on 3D GTC data
- Prepare for new 5D data
- Add visualization front-end
- Work with particles
- FastBit software
- Python API?
- Other applications
- Visualization
- ?
14FastBit is an efficient searching tool for
data-driven science. Key techniques in FastBit
have been extensively exercised. If you have an
application that requires searching operations,
feel free to contact us.
Contact Information Contact Information
FastBit website http//sdm.lbl.gov/fastbit
Johns email John.Wu_at_nersc.gov
Aries email Arie_at_lbl.gov