SCIENTIFIC DATA MANAGEMENT - PowerPoint PPT Presentation

About This Presentation
Title:

SCIENTIFIC DATA MANAGEMENT

Description:

SCIENTIFIC DATA MANAGEMENT – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 89
Provided by: ArieSh
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: SCIENTIFIC DATA MANAGEMENT


1
SCIENTIFIC DATA MANAGEMENT
Arie Shoshani
  • Computational Research Division
  • Lawrence Berkeley National Laboratory
  • February , 2007

2
Outline
  • Problem areas in managing scientific data
  • Motivating examples
  • Requirements
  • The DOE Scientific Data Management Center
  • A three-layer architectural approach
  • Some results of technologies (details in
    mini-symposium)
  • Specific technologies from LBNL
  • Fastbit innovative bitmap indexing for very
    large datasets
  • Storage Resource Managers providing uniform
    access to storage systems

3
Motivating Example - 1
  • Optimizing Storage Management and Data Accessfor
    High Energy and Nuclear Physics Applications

members
Date of
events/ year
volume/year-
TB
Experiment
/institutions
first data
STAR
350/35
2001
8
9
500
-10
10
PHENIX
350/35
2001
9
600
10
300/30
BABAR
1999
9
80
10
200/40
CLAS
1997
10
300
10
1200/140
10
ATLAS
2007
5000
10
STAR Solenoidal Tracker At RHIC RHIC
Relativistic Heavy Ion Collider LHC Large
Hadron Collider Includes ATLAS, STAR,
A mockup of An event
4
Typical Scientific Exploration Process
  • Generate large amounts of raw data
  • large simulations
  • collect from experiments
  • Post-processing of data
  • analyze data (find particles produced, tracks)
  • generate summary data
  • e.g. momentum, no. of pions, transverse energy
  • Number of properties is large (50-100)
  • Analyze data
  • use summary data as guide
  • extract subsets from the large dataset
  • Need to access events based on partialproperties
    specification (range queries)
  • e.g. ((0.1 lt AVpT lt 0.2) (10 lt Np lt 20)) v (N gt
    6000)
  • apply analysis code

5
Motivating example - 2
  • Combustion simulation 1000x1000x1000 mesh with
    100s of chemical species over 1000s of time steps
    1014 data values
  • Astrophysics simulation 1000x1000x1000 mesh with
    10s of variables per cell over 1000s of time
    steps - 1013 data values
  • This is an image of a single variable
  • Whats needed is search overmultiple variables,
    such as
  • Temperature gt 1000AND pressure gt 106AND HO2 gt
    10-7 AND HO2 gt 10-6
  • Combining multiple single-variable indexes
    efficiently is a challenge
  • Solution specialized bitmap indexes

6
Motivating Example - 3
  • Earth System Grid
  • Accessing large distributed stores for by 100s
    of scientists
  • Problems
  • Different storage systems
  • Security procedures
  • File streaming
  • Lifetime of request
  • Garbage collection
  • Solution
  • Storage Resource Managers (SRMs)

7
Motivating Example 4 Fusion SimulationCoordinati
on between Running Codes
8
Motivating example - 5
  • Data Entry and Browsing tool for entering and
    linking metadata from multiple data sources
  • Metadata Problem for Microarray analysis
  • Microarray schemas are quite complex
  • Many objects experiments, samples, arrays,
    hybridization, measurements,
  • Many associations between them
  • Data is generated and processed in multiple
    locations which participate in the data pipeline
  • In this project Synechococcus sp. WH8102 whole
    genome
  • microbes are cultured at Scripps Institution of
    Oceanography (SIO)
  • then the sample pool is sent to The Institute for
    Genomics Research (TIGR)
  • then images send to Sandia Lab for Hyperspectral
    Imaging and analysis
  • Metadata needs to be captured and LINKED
  • Generating specialized user interfaces is
    expensive and time-consuming to build and change
  • Data is collected on various systems,
    spreadsheets, notebooks, etc.

9
The kind of technology needed
  • DEB Data Entry and Browsing Tool
  • Features
  • - Interface based on lab notebook look and feel
  • - Tools are built on top of commercial DBMS
  • - Schema-driven automatic Screen generation

10
Storage Growth is Exponential
  • Unlike compute and network resources, storage
    resources are not reusable
  • Unless data is explicitly removed
  • Need to use storage wisely
  • Checkpointing, remove replicated data
  • Time consuming, tedious tasks
  • Data growth scales with compute scaling
  • Storage will grow even with good practices (such
    as eliminating unnecessary replicas)
  • Not necessarily on supercomputers but, on
    user/group machines and archival storage
  • Storage cost is a consideration
  • Has to be part of science growth cost
  • But, storage costs going down at a rate similar
    to data growth
  • Need continued investment in new storage
    technologies

Storage Growth 1998-2006 at ORNL (rate 2X /
year)
Storage Growth 1998-2006 at NERSC-LBNL (rate
1.7X / year)
The challenges are in managing the data
11
Data and Storage ChallengesEnd-to-End 3 Phases
of Scientific Investigation)
  • Data production phase
  • Data movement
  • I/O to parallel file system
  • Moving data out of supercomputer storage
  • Sustain data rates of GB/sec
  • Observe data during production
  • Automatic generation of metadata
  • Post-processing phase
  • Large-scale (entire datasets) data processing
  • Summarization / statistical properties
  • Reorganization / transposition
  • Generate data at different granularity
  • On-the-fly data processing
  • computations for visualization / monitoring
  • Data extraction / analysis phase
  • Automate data distribution / replication
  • Synchronize replicated data
  • Data lifetime management to unclog storage
  • Extract subsets efficiently
  • Avoid reading unnecessary data
  • Efficient indexes for fixed content data
  • Automated use of metadata
  • Parallel analysis tools
  • Statistical analysis tools
  • Data mining tools

12
The Scientific Data Management Center (Center
for Enabling Technologies - CET)
  • PI Arie Shoshani, LBNL
  • Annual budget 3.3 Million
  • Established 5 years ago (SciDAC-1)
  • Successfully re-competed for the next 5 years
    (SciDAC-2)
  • Featured in second issue of SciDAC magazine
  • Laboratories
  • ANL, ORNL, LBNL, LLNL, PNNL
  • Universities
  • NCSU, NWU, SDSC, UCD, Uof Utah,

http//www.scidacreview.org/0602/pdf/data.pdf
13
Scientific Data Management Center
Petabytes
Petabytes
Scientific Simulations experiments
Terabytes
Terabytes
  • Climate Modeling
  • Astrophysics
  • Genomics and Proteomics
  • High Energy Physics
  • Fusion

SDM-ISIC Technology
  • Optimizing shared access from mass storage
    systems
  • Parallel-IO for various file formats
  • Feature extraction techniques
  • High-dimensional cluster analysis
  • High-dimensional indexing
  • Parallel statistics

Data Manipulation
Data Manipulation
20 time
  • Using SDM-Center technology
  • Getting files from Tape archive
  • Extracting subset of data from files
  • Reformatting data
  • Getting data from heterogeneous, distributed
    systems
  • moving data over the network

80 time
Scientific Analysis Discovery
80 time
Scientific Analysis Discovery
20 time
Current
Goal
14
A Typical SDM Scenario
Task A Generate Time-Steps
Task B Move TS
Task D Visualize TS
Task C Analyze TS
Control Flow Layer

Flow Tier
Applications Software Tools Layer
Data Mover
Parallel R
Post Processing
Terascale Browser
Simulation Program
Work Tier
I/O System Layer
HDF5 Libraries
Subset extraction
File system
Parallel NetCDF
PVFS
SRM
Storage Network Resouces Layer
15
Approach
  • Use an integrated framework that
  • Provides a scientific workflow capability
  • Supports data mining and analysis tools
  • Accelerates storage and access to data
  • Simplify data management tasks for the scientist
  • Hide details of underlying parallel and
    indexingtechnology
  • Permit assembly of modules using a simple
    graphical workflow description tool

SDM Framework
Scientific Process Automation Layer
Data Mining Analysis Layer
Scientific Application
Scientific Understanding
Storage Efficient Access Layer
16
Technology Details by Layer
Scientific
Scientific
WorkFlow
Web
WorkFlow
Scientific Workflow Components
Process
Process
Wrapping
Management
Management
Automation
Automation
Tools
Tools
Engine
(SPA)
(SPA)
Layer
Layer
Data
Data
Data Analysis and Feature Identification
Efficient
Efficient
Efficient
Data
ASPECT
Efficient
Parallel R
Mining
Mining
Parallel
indexing
Analysis
integration
Parallel
indexing
Statistical
(Bitmap
Framework
Analysis
(Bitmap
Analysis
Visualization
tools
Visualization
Analysis
(
pVTK
)
Index)
(PCA, ICA)
(
pVTK
)
Index)
(DMA)
(DMA)
Layer
Layer
Storage
Storage
Parallel
Parallel
Parallel
ROMIO
Storage
Storage
Parallel
Parallel
Efficient
Efficient
Virtual
Virtual
MPI
-
IO
I/O
Resource
Resource
NetCDF
NetCDF
Access
Access
File
File
System
Manager
Manager (SRM)
(ROMIO)
System
System
(SEA)
(SEA)
(To HPSS)
Layer
Layer
Hardware, OS, and MSS (HPSS)
Hardware, OS, and MSS (HPSS)
17
Technology Details by Layer
Scientific
Scientific
WorkFlow
Web
WorkFlow
Scientific Workflow Components
Process
Process
Wrapping
Management
Management
Automation
Automation
Tools
Tools
Engine
(SPA)
(SPA)
Layer
Layer
Data
Data
Data Analysis and Feature Identification
Efficient
Efficient
Efficient
Data
ASPECT
Efficient
Parallel R
Mining
Mining
Parallel
indexing
Analysis
integration
Parallel
indexing
Statistical
(Bitmap
Framework
Analysis
(Bitmap
Analysis
Visualization
tools
Visualization
Analysis
(
pVTK
)
Index)
(PCA, ICA)
(
pVTK
)
Index)
(DMA)
(DMA)
Layer
Layer
Storage
Storage
Parallel
Parallel
Parallel
ROMIO
Storage
Storage
Parallel
Parallel
Efficient
Efficient
Virtual
Virtual
MPI
-
IO
I/O
Resource
Resource
NetCDF
NetCDF
Access
Access
File
File
System
Manager
Manager (SRM)
(ROMIO)
System
System
(SEA)
(SEA)
(To HPSS)
Layer
Layer
Hardware, OS, and MSS (HPSS)
Hardware, OS, and MSS (HPSS)
18
Data Generation
Scientific Process Automation Layer
Workflow Design and Execution
SimulationRun
Data Mining and Analysis Layer
ParallelnetCDF
MPI-IO
PVFS2
Storage Efficient Access Layer
OS, Hardware (Disks, Mass Store)
19
Parallel NetCDF v.s. HDF5 (ANLNWU)
Interprocess communication
Parallel Virtual File System Enhancements and
deployment
  • Developed Parallel netCDF
  • Enables high performance parallel I/O to
    netCDF datasets
  • Achieves up to 10 fold performance
    improvement over HDF5
  • Enhanced ROMIO
  • Provides MPI access to PVFS2
  • Advanced parallel file system interfaces for
    more efficient access
  • Developed PVFS2
  • Production use at ANL, Ohio SC, Univ. of
    Utah HPC center
  • Offered on Dell clusters
  • Being ported to IBM BG/L system

After
Before
FLASH I/O Benchmark Performance (8x8x8 block
sizes)
20
Technology Details by Layer
Scientific
Scientific
WorkFlow
Web
WorkFlow
Scientific Workflow Components
Process
Process
Wrapping
Management
Management
Automation
Automation
Tools
Tools
Engine
(SPA)
(SPA)
Layer
Layer
Data
Data
Data Analysis and Feature Identification
Efficient
Efficient
Efficient
Data
ASPECT
Efficient
Parallel R
Mining
Mining
Parallel
indexing
Analysis
integration
Parallel
indexing
Statistical
(Bitmap
Framework
Analysis
(Bitmap
Analysis
Visualization
tools
Visualization
Analysis
(
pVTK
)
Index)
(PCA, ICA)
(
pVTK
)
Index)
(DMA)
(DMA)
Layer
Layer
Storage
Storage
Parallel
Parallel
Parallel
ROMIO
Storage
Storage
Parallel
Parallel
Efficient
Efficient
Virtual
Virtual
MPI
-
IO
I/O
Resource
Resource
NetCDF
NetCDF
Access
Access
File
File
System
Manager
Manager (SRM)
(ROMIO)
System
System
(SEA)
(SEA)
(To HPSS)
Layer
Layer
Hardware, OS, and MSS (HPSS)
Hardware, OS, and MSS (HPSS)
21
Statistical Computing with R
  • About R (http//www.r-project.org/)
  • R is an Open Source (GPL), most widely used
    programming environment for statistical analysis
    and graphics similar to S.
  • Provides good support for both users and
    developers.
  • Highly extensible via dynamically loadable
    add-on packages.
  • Originally developed by Robert Gentleman and
    Ross Ihaka.
  • gt
  • gt dyn.load( foo.so)
  • gt .C( foobar )
  • gt dyn.unload( foo.so )

gt library(mva) gt pca lt- prcomp(data) gt
summary(pca)
gt library (rpvm) gt .PVM.start.pvmd () gt
.PVM.addhosts (...) gt .PVM.config ()
22
Providing Task and Data Parallelism in pR
23
Parallel R (pR) Distribution
http//www.ASPECT-SDM.org/Parallel-R
  • Releases History
  • pR enables both data and task parallelism
    (includes task-pR and RScaLAPACK) (version 1.8.1)
  • RScaLAPACK provides R interface to ScaLAPACK
    with its scalability in terms of problem size and
    number of processors using data parallelism
    (release 0.5.1)
  • task-pR achieves parallelism by performing
    out-of-order execution of tasks. With its
    intelligent scheduling mechanism it attains
    significant gain in execution times (release
    0.2.7)
  • pMatrix provides a parallel platform to perform
    major matrix operations in parallel using
    ScaLAPACK and PBLAS Level II III routines

Also Available for download from Rs CRAN web
site (www.R-Project.org) with 37 mirror sites in
20 countries
24
Technology Details by Layer
Scientific
Scientific
WorkFlow
Web
WorkFlow
Scientific Workflow Components
Process
Process
Wrapping
Management
Management
Automation
Automation
Tools
Tools
Engine
(SPA)
(SPA)
Layer
Layer
Data
Data
Data Analysis and Feature Identification
Efficient
Efficient
Efficient
Data
ASPECT
Efficient
Parallel R
Mining
Mining
Parallel
indexing
Analysis
integration
Parallel
indexing
Statistical
(Bitmap
Framework
Analysis
(Bitmap
Analysis
Visualization
tools
Visualization
Analysis
(
pVTK
)
Index)
(PCA, ICA)
(
pVTK
)
Index)
(DMA)
(DMA)
Layer
Layer
Storage
Storage
Parallel
Parallel
Parallel
ROMIO
Storage
Storage
Parallel
Parallel
Efficient
Efficient
Virtual
Virtual
MPI
-
IO
I/O
Resource
Resource
NetCDF
NetCDF
Access
Access
File
File
System
Manager
Manager (SRM)
(ROMIO)
System
System
(SEA)
(SEA)
(To HPSS)
Layer
Layer
Hardware, OS, and MSS (HPSS)
Hardware, OS, and MSS (HPSS)
25
Piecewise Polynomial Models for Classification of
Puncture (Poincaré) plots
  • Classify each of the nodes quasiperiodic,
    islands, separatrix
  • Connections between the nodes
  • Want accurate and robust classification, valid
    when few points in each node

National Compact Stellarator Experiment
Quasiperiodic
Islands
Separatrix
26
Polar Coordinates
  • Transform the (x,y) data to Polar coordinates
    (r,?).
  • Advantages of polar coordinates
  • Radial exaggeration reveals some features that
    are hard to see otherwise.
  • Automatically restricts analysis to radial band
    with data, ignoring inside and outside.
  • Easy to handle rotational invariance.

27
Piecewise Polynomial Fitting Computing
polynomials
  • In each interval, compute the polynomial
    coefficients to fit 1 polynomial to the data.
  • If the error is high, split the data into an
    upper and lower group. Fit 2 polynomials to the
    data, one to each group.

Blue data. Red polynomials. Black interval
boundaries.
28
Classification
  • The number of polynomials needed to fit the data
    and the number of gaps gives the information
    needed to classify the node

Number of polynomials Number of polynomials
Gaps one two
Zero Quasiperiodic Separatrix
gt Zero Quasiperiodic Islands
2 Polynomials 2 Gaps ? Islands
2 Polynomials 0 Gaps ? Separatrix
29
Technology Details by Layer
Scientific
Scientific
WorkFlow
Web
WorkFlow
Scientific Workflow Components
Process
Process
Wrapping
Management
Management
Automation
Automation
Tools
Tools
Engine
(SPA)
(SPA)
Layer
Layer
Data
Data
Data Analysis and Feature Identification
Efficient
Efficient
Efficient
Data
ASPECT
Efficient
Parallel R
Mining
Mining
Parallel
indexing
Analysis
integration
Parallel
indexing
Statistical
(Bitmap
Framework
Analysis
(Bitmap
Analysis
Visualization
tools
Visualization
Analysis
(
pVTK
)
Index)
(PCA, ICA)
(
pVTK
)
Index)
(DMA)
(DMA)
Layer
Layer
Storage
Storage
Parallel
Parallel
Parallel
ROMIO
Storage
Storage
Parallel
Parallel
Efficient
Efficient
Virtual
Virtual
MPI
-
IO
I/O
Resource
Resource
NetCDF
NetCDF
Access
Access
File
File
System
Manager
Manager (SRM)
(ROMIO)
System
System
(SEA)
(SEA)
(To HPSS)
Layer
Layer
Hardware, OS, and MSS (HPSS)
Hardware, OS, and MSS (HPSS)
30
Example Data Flow in Terascale Supernova
Initiative
Logistical Network
Courtesy John Blondin
31
Original TSI Workflow Examplewith John Blondin,
NCSU
Automate data generation, transfer and
visualization of a large-scale simulation at ORNL
32
Top level TSI Workflow
Automate data generation, transfer and
visualization of a large-scale simulation at ORNL
Check whether a time slice is finished
Submit Job to Cray at ORNL
Aggregate all into One large File - Save to HPSS
Yes
Yes
Split it into 22 Files and store them in XRaid
ORNL
NCSU
Head Node submit scheduling to SGE
Notify Head Node at NC State
SGE schedule the transfer for 22 Nodes
Start Ensight to generate Video Files at Head
Node
33
Using the Scientific Workflow Tool
(Kepler)Emphasizing Dataflow (SDSC, NCSU, LLNL)
Automate data generation, transfer and
visualization of a large-scale simulation at ORNL
34
New actors in Fusion workflowto support
automated data movement
KEPLER
Start Two Independent processes
Detect when Files are Generated
Move files
Tar files
Login At ORNL (OTP)
Archive files
2
Kepler Workflow Engine
1
OTP Login actor
File Watcher actor
Scp File copier actor
Taring actor
Local archiving actor
Simulation Program (MPI)
Software components
Disk Cache
Disk Cache
Hardware OS
HPSS ORNL
Seaborg NERSC
Disk cacke Ewok-ORNL
35
Re-applying Technology
SDM technology, developed for one application,
can be effectively targeted at many other
applications
  • Technology
  • Parallel NetCDF
  • Parallel VTK
  • Compressed bitmaps
  • Storage Resource
  • Managers
  • Feature Selection
  • Scientific Workflow

New Applications Climate Climate Combustion,
Astrophysics Astrophysics Fusion (exp.
simulation) Astrophysics
Initial Application Astrophysics
Astrophysics HENP HENP Climate Biology
36
Broad Impact of the SDM Center
  • Astrophysics
  • High speed storage technology, parallel NetCDF,
    integration software used for Terascale Supernova
    Initiative (TSI) and FLASH simulations
  • Tony Mezzacappa ORNL, Mike Zingale U
    of Chicago, Mike Papka ANL
  • Scientific Workflow
  • John Blondin NCSU Doug Swesty, Eric
    Myra Stony Brook
  • Climate
  • High speed storage technology, Parallel NetCDF,
    and ICA technology used for Climate Modeling
    projects
  • Ben Santer LLNL, John Drake ORNL, John
    Michalakes NCAR
  • Combustion
  • Compressed Bitmap Indexing used for fast
    generation of flame regions and tracking their
    progress over time
  • Wendy Koegler, Jacqueline Chen Sandia Lab

ASCI FLASH parallel NetCDF
Dimensionality reduction
Region growing
37
Broad Impact (cont.)
  • Biology
  • Kepler workflow system and web-wrapping
    technology used for executing complex highly
    repetitive workflow tasks for processing
    microarray data
  • Matt Coleman - LLNL
  • High Energy Physics
  • Compressed Bitmap Indexing and Storage Resource
    Managers used for locating desired subsets of
    data (events) and automatically retrieving data
    from HPSS
  • Doug Olson - LBNL, Eric Hjort LBNL, Jerome
    Lauret - BNL
  • Fusion
  • A combination of PCA and ICA technology used to
    identify the key parameters that are relevant to
    the presence of edge harmonic oscillations in a
    Tokomak
  • Keith Burrell - General Atomics
  • Scott Klasky - PPPL

Building a scientific workflow
Dynamic monitoring of HPSS file transfers
Identifying key parameters for the DIII-D
Tokamak
38
Technology Details by Layer
Scientific
Scientific
WorkFlow
Web
WorkFlow
Scientific Workflow Components
Process
Process
Wrapping
Management
Management
Automation
Automation
Tools
Tools
Engine
(SPA)
(SPA)
Layer
Layer
Data
Data
Data Analysis and Feature Identification
Efficient
Efficient
Efficient
Data
ASPECT
Efficient
Parallel R
Mining
Mining
Parallel
indexing
Analysis
integration
Parallel
indexing
Statistical
(Bitmap
Framework
Analysis
(Bitmap
Analysis
Visualization
tools
Visualization
Analysis
(
pVTK
)
Index)
(PCA, ICA)
(
pVTK
)
Index)
(DMA)
(DMA)
Layer
Layer
Storage
Storage
Parallel
Parallel
Parallel
ROMIO
Storage
Storage
Parallel
Parallel
Efficient
Efficient
Virtual
Virtual
MPI
-
IO
I/O
Resource
Resource
NetCDF
NetCDF
Access
Access
File
File
System
Manager
Manager (SRM)
(ROMIO)
System
System
(SEA)
(SEA)
(To HPSS)
Layer
Layer
Hardware, OS, and MSS (HPSS)
Hardware, OS, and MSS (HPSS)
39
FastBitAn Efficient Indexing Technology For
Accelerating Data Intensive Science
  • Outline
  • Overview
  • Searching Technology
  • Applications
  • http//sdm.lbl.gov/fastbit

40
Searching Problems in Data Intensive Sciences
  • Find the collision events with the most distinct
    signature of Quark Gluon Plasma
  • Find the ignition kernels in a combustion
    simulation
  • Track a layer of exploding supernova
  • These are not typical database searches
  • Large high-dimensional data sets (1000 time steps
    X 1000 X 1000 X 1000 cells X 100 variables)
  • No modification of individual records during
    queries, i.e., append-only data
  • Complex questions 500 lt Temp lt 1000 CH3 gt
    10-4
  • Large answers (hit thousands or millions of
    records)
  • Seek collective features such as regions of
    interest, beyond typical average, sum

41
Common Indexing Strategies Not Efficient
  • Task searching high-dimensional append-only data
    with ad hoc range queries
  • Most tree-based indices are designed to be
    updated quickly
  • E.g. family of B-Trees
  • Sacrifice search efficiency to permit dynamic
    update
  • Hash-based indices are
  • Efficient for finding a small number of records
  • But, not efficient for ad hoc multi-dimensional
    queries
  • Most multi-dimensional indices suffer curse of
    dimensionality
  • E.g. R-tree, Quad-trees, KD-trees,
  • Dont scale to high dimensions (lt 20)
  • Are inefficient if some dimensions are not queried

42
Our Approach An Efficient Bitmap Index
  • Bitmap indices
  • Sacrifice update efficiency to gain more search
    efficiency
  • Are efficient for multi-dimensional queries
  • Scale linearly as the number of dimensions
    actually used in a query
  • Bitmap indices may demand too much space
  • We solve the space problem by developing an
    efficient compression method that
  • Reduces the index size, typically 30 of raw
    data, vs. 300 for some B-tree indices
  • Improves operational efficiency, 10X speedup
  • We have applied FastBit to speed up a number of
    DOE funded applications

43
FastBit In a Nutshell
  • FastBit is designed to search multi-dimensional
    append-only data
  • Conceptually in table format
  • rows ? objects
  • columns ? attributes
  • FastBit uses vertical (column-oriented)
    organization for the data
  • Efficient for searching
  • FastBit uses bitmap indices with a specialized
    compression method
  • Proven in analysis to be optimal for
    single-attribute queries
  • Superior to others because they are also
    efficient for multi-dimensional queries











column
row
44
Bit-Sliced Index
  • Take advantage that index need to be is append
    only
  • partition each property into bins
  • (e.g. for 0ltNplt300, have 300 equal size bins)
  • for each bin generate a bit vector
  • compress each bit vector (some version of run
    length encoding)

45
Basic Bitmap Index
  • First commercial version
  • Model 204, P. ONeil, 1987
  • Easy to build faster than building B-trees
  • Efficient for querying only bitwise logical
    operations
  • A lt 2 ? b0 OR b1
  • A gt 2 ? b3 OR b4 OR b5
  • Efficient for multi-dimensional queries
  • Use bitwise operations to combine the partial
    results
  • Size one bit per distinct value per object
  • Definition Cardinality number of distinct
    values
  • Compact for low cardinality attributes only, say,
    lt 100
  • Need to control size for high cardinality
    attributes

Data values
b0
b1
b2
b3
b4
b5
1 0 0 0 0 0 1 0 0
0 1 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
0 1 5 3 1 2 0 4 1
0
1
2
3
4
5
A lt 2
2 lt A
46
Run Length Encoding
  • Uncompressed
  • 0000000000001111000000000 ......000000100000000111
    1100000000 .... 000000
  • Compressed
  • 12, 4, 1000,1,8, 5,492
  • Practical considerations
  • Store very short sequences as-is (literal words)
  • Count bytes/words rather than bits (for long
    sequences)
  • Use first bit for type of word literal or count
  • Use second bit of count to indicate 0 or 1
    sequence
  • literal 31 0-words literal
    31 0-words
  • 00 0F 00 00 80 00 00 1F 02 01 F0 00 80 00
    00 0F
  • Other ideas
  • repeated byte patterns, with counts
  • - Well-known method use in Oracle Byte-aligned
    Bitmap Code (BBC)

Advantage Can perform logical operations such
as AND, OR, NOT, XOR, And COUNT operations
directly on compressed data
47
FastBit Compression Method is Compute-Efficient
Example 2015 bits
10000000000000000000011100000000000000000000000000
000.0000000000000000000000000000000111111111
1111111111111111
Main Idea Use run-length-encoding,
but... partition bits into 31-bit groups on
32-bit machines
31 bits
31 bits
31 bits

Merge neighboring groups with identical bits
Count63 (31 bits)
31 bits
31 bits
Encode each group using one word
  • Name Word-Aligned Hybrid (WAH) code (US patent
    6,831,575)
  • Key features WAH is compute-efficient because it
  • Uses the run-length encoding (simple)
  • Allows operations directly on compressed bitmaps
  • Never breaks any words into smaller pieces during
    operations

48
Compute Efficient Compression Method10 times
faster than best-known method
10X
selectivity
49
Time to Evaluate a Single-Attribute Range
Condition in FastBit is Optimal
  • Evaluating a single attribute range condition may
    require ORing multiple bitmaps
  • Both analysis and timing measurement confirm that
    the query processing time is at worst
    proportional to the number of hits

Worst case Uniform Random Data
Realistic case Zipf Data
BBC Byte-aligned Bitmap Code The best known
bitmap compression
50
Processing Multi-Dimensional Queries
1 0 0 0 0 0 1 0 0
0 1 0 0 1 0 0 0 1
0 0 0 0 0 1 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0
0 0 1 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1
0 0 1 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 1 0
0 1 0 1 0 0 0 0 0
Fast
slow
1 1 0 0 1 0 1 0 1
0 1 0 1 1 0 0 1 0
OR
OR
2,4,5,8
1,2,5,7,9
2,5
AND
0 1 0 0 1 0 0 0 0
  • Merging results from tree-based indices is slow
  • Because sorting and merging are slow
  • Merging results from bitmap indices is fast
  • Because bitwise operations on bitmaps are
    efficient

51
Multi-Attribute Range Queries
2-D queries
5-D queries
  • Results are based on 12 most queried attributes
    (2.2 million records) from STAR High-Energy
    Physics Experiment with average attribute
    cardinality equal to 222,000
  • WAH compressed indices are 10X faster than bitmap
    indices from a DBMS, 5X faster than our own
    implementation of BBC
  • Size of WAH compressed indices is only 30 of raw
    data size (a popular DBMS system uses 3-4X for
    B-tree indices)

52
The Case for Query Driven Visualization
  • Support Compound Range Queries e.g. Get all
    cells where Temperature gt 300k AND Pressure is lt
    200 millibars
  • Subsetting Only load data that corresponds to
    the query.
  • Get rid of visual clutter
  • Reduce load on data analysis pipeline
  • Quickly find and label connected regions
  • Do it really fast!

53
Architecture Overview Query-Driven Vis. Pipeline
FastBit
Data
Display
Vis / Analysis
Query
Index
Stockinger, Shalf, Bethel, Wu 2005
54
DEX Visualization Pipeline
Data
Query
Visualization Toolkit(VTK)
3D visualization of a Supernova explosion
Stockinger, Shalf, Bethel, Wu 2005
55
Extending FastBit to Find Regions of Interest
  • Comparison to what VTK is good at
  • single attribute iso-contouring
  • But, FastBit also does well on
  • Multi-attribute search
  • Region finding produces whole volume rather than
    contour
  • Region tracking
  • Proved to have the same theoretical efficiency as
    the best iso-contouring algorithms
  • Measured to be 3X faster than the best
    iso-contouring algorithms
  • Implemented in Dexterous Data Explorer (DEX)
    jointly with Vis group

3X
Stockinger, Shalf, Bethel, Wu 2005
56
Combustion Flame Front Tracking
  • Need to perform
  • Cell identification
  • Identify all cells that satisfy user
    specified conditions, such as, 600 lt
    Temperature lt 700
  • AND HO2 concentration gt 10-7
  • Region growing
  • Connect neighboring cells into regions
  • Region tracking
  • Track the evolution of the regions (i.e.,
    features) through time
  • All steps perform with Bitmap structures

57
Linear cost with number of segments
Time required to identify regions in 3D Supernova
simulation (LBNL)
On 3D data with over 110 million records, region
finding takes less than 2 seconds
Wu, Koegler, Chen, Shoshani 2003
58
Extending FastBit to Computer Conditional
Histograms
  • Conditional histograms are common in data
    analysis
  • E.g., finding the number of malicious network
    connections in a particular time window
  • Top left a histogram of number of connections to
    port 5554 of machine in LBNL IP address space
    (two-horizontal axes), vertical axis is time
  • Two sets of scans are visible as two sheets
  • Bottom left FastBit computes conditional
    histograms much faster than common data analysis
    tools
  • 10X faster than ROOT
  • 2X faster than ROOT with FastBit indices

Stockinger, Bethel, Campbell, Dart, Wu 2006
59
A Nuclear Physics Example STAR
  • STAR Solenoidal Tracker At RHIC RHIC
    Relativistic Heavy Ion Collider
  • 600 participants / 50 institutions / 12 countries
    / in production since 2000
  • 100 million collision events a year, 5 MB raw
    data per event, several levels of summary data
  • Generated 3 petabytes and 5 million files

Append-only data, aka write-once read-many (WORM)
data
60
Grid Collector
  • Benefits of the Grid Collector
  • transparent object access
  • Selection of objects based on their attribute
    values
  • Improvement of analysis systems throughput
  • Interactive analysis of data distributed on the
    Grid

61
Finding Needles in STAR Data
  • One of the primary goals of STAR is to search for
    Quark Gluon Plasma (QGP)
  • A small number (hundreds) of collision events
    may contain the clearest evidence of QGP
  • Using high-level summary data, researchers found
    80 special events
  • Have track distributions that are indicative of
    QGP
  • Further analysis needs to access more detailed
    data
  • Detailed data are large (terabytes) and reside on
    HPSS
  • May take many weeks to manually migrate to disk
  • We located and retrieved the 80 events in 15
    minutes

62
Grid Collector Speeds up Analyses
  • Test machine 2.8 GHz Xeon, 27 MB/s read speed
  • When searching for rare events, say, selecting
    one event out of 1000, using GC is 20 to 50 times
    faster
  • Using GC to read 1/2 of events, speedup gt 1.5,
    1/10 events, speed up gt 2.

63
Summary Applications Involving FastBit
STAR Search for rare events with special significance BNL (STAR collaboration)
Combustion Data Analysis Finding and tracking ignition kernels Sandia (Combustion Research Facility)
Dexterous Data Explorer (DEX) Interactive exploration of large scientific data (visualize regions of interest) LBNL Vis group
Network Traffic Analysis Enable interactive analysis of network traffic data for forensic and live stream data LBNL Vis group, NERSC/ESNet security,
DNA sequencing anomaly detection Finding anomalies in raw DNA sequencing data to diagnose sequencing machine operations and DNA sample preparations JGI
  • FastBit implements an efficient patented
    compression technique to speed up the searches in
    data intensive scientific applications

64
Technology Details by Layer
Scientific
Scientific
WorkFlow
Web
WorkFlow
Scientific Workflow Components
Process
Process
Wrapping
Management
Management
Automation
Automation
Tools
Tools
Engine
(SPA)
(SPA)
Layer
Layer
Data
Data
Data Analysis and Feature Identification
Efficient
Efficient
Efficient
Data
ASPECT
Efficient
Parallel R
Mining
Mining
Parallel
indexing
Analysis
integration
Parallel
indexing
Statistical
(Bitmap
Framework
Analysis
(Bitmap
Analysis
Visualization
tools
Visualization
Analysis
(
pVTK
)
Index)
(PCA, ICA)
(
pVTK
)
Index)
(DMA)
(DMA)
Layer
Layer
Storage
Storage
Parallel
Parallel
Parallel
ROMIO
Storage
Storage
Parallel
Parallel
Efficient
Efficient
Virtual
Virtual
MPI
-
IO
I/O
Resource
Resource
NetCDF
NetCDF
Access
Access
File
File
System
Manager
Manager (SRM)
(ROMIO)
System
System
(SEA)
(SEA)
(To HPSS)
Layer
Layer
Hardware, OS, and MSS (HPSS)
Hardware, OS, and MSS (HPSS)
65
What is SRM?
  • Storage Resource Managers (SRM) are middleware
    components whose function is to provide
  • Dynamic space allocation
  • Dynamic file management in space
  • For shared storage components on the WAN

66
Motivation
  • Suppose you want to run a job on your local
    machine
  • Need to allocate space
  • Need to bring all input files
  • Need to ensure correctness of files transferred
  • Need to monitor and recover from errors
  • What if files dont fit space? Need to manage
    file streaming
  • Need to remove files to make space for more files
  • Now, suppose that the machine and storage space
    is a shared resource
  • Need to do the above for many users
  • Need to enforce quotas
  • Need to ensure fairness of scheduling users

67
Motivation
  • Now, suppose you want to do that on a WAN
  • Need to access a variety of storage systems
  • mostly remote systems, need to have access
    permission
  • Need to have special software to access mass
    storage systems
  • Now, suppose you want to run distributed jobs on
    the WAN
  • Need to allocate remote spaces
  • Need to move (stream) files to remote sites
  • Need to manage file outputs and their movement to
    destination site(s)

68
Ubiquitous and Transparent Data Access and
Sharing
Petabytes
Tapes
e.g. HPSS
Data Analysis
Terabytes
Data Analysis
Disks
Terabytes
Data Analysis
Disks
Data Analysis
69
Interoperability of SRMs
Client USER/APPLICATIONS
Grid Middleware
SRM
SRM
SRM
SRM
SRM
SRM
SRM
Enstore
JASMine
dCache
Castor
Unix-based disks
SE
70
SDSC Storage Resource Broker - Grid Middleware
This figure was taken from one of the talks by
Reagan Moore
Client Library
SRB Server
Local SRB Server
71
SRM vs. SRB
  • Storage Resource Broker (SRB)
  • Very successful product from SDSC, has long
    history
  • Is a centralized solution where all requests go
    to a central server that includes a metadata
    catalog (MCAT)
  • Developed by a single institution
  • Storage Resource Management (SRM)
  • Based on open standard
  • Developed by multiple institutions for their
    storage systems
  • Designed for interoperation of heterogeneous
    storage systems
  • Features of SRM that SRB does not deal with
  • Managing storage space dynamically based clients
    request
  • Managing content of space based on lifetime
    controlled by client
  • Support for file streaming by pinning and
    releasing files
  • Several institutions now ask for an SRM interface
    to SRB
  • In GGF activity to bridge these technologies

72
GGF GIN-Data SRM inter-op testing(GGF Global
Grid Forum, GIN Grid Interoperability Now)
Client
SRM-TESTER
1. Initiate SRM-TESTER
3. Publish test results
WEB
2. Test Storage Sites according to the spec v1.1
and v2.2
SRM
SRM
SRM
SRM
SRM
SRM
SRM
SRM
SRM
GridFTP HTTP(s) FTP services
CERN LCG
Grid.IT SRM
FNAL CMS
SDSC OSG
APAC SRM
VU SRM
IC.UK EGEE
LBNL STAR
UIO ARC
HRM
HRM
HRM
(performs writes)
(performs writes)
(performs writes)
73
Testing Operations Results
ping put get Advisorydelete Copy(SRMs) Copy(gsiftp)
ARC (UIO.NO) pass fail pass fail pass fail
EGEE (IC.UK) pass pass pass pass pass pass
CMS (FNAL.GOV) pass pass pass pass pass pass
LCG/EGEE (CERN) pass pass pass pass N.A. N.A.
OSG (SDSC) pass pass pass pass pass fail
STAR (LBNL) pass pass pass pass pass pass
74
Peer-to-Peer Uniform Interface
75
Earth Science Grid Analysis Environment

LBNL
HPSS High Performance Storage System
disk
ANL
CAS Community Authorization Services
NCAR
HRM Storage Resource Management
gridFTP Striped server
gridFTP server
openDAPg server
Tomcat servlet engine

MyProxy server
LLNL
MCS client
MyProxy client
disk
CAS client
DRM Storage Resource Management
RLS client
DRM Storage Resource Management
GRAM gatekeeper
gridFTP server
ORNL
gridFTP server
gridFTP
HRM Storage Resource Management
ISI
gridFTP
gridFTP server
HRM Storage Resource Management
MCS Metadata Cataloguing Services
SOAP
HPSS High Performance Storage System
RLS Replica Location Services
RMI
MSS Mass Storage System
disk
disk
76
History and partners in SRM Collaboration
  • 5 year of Storage Resource (SRM) Management
    activity
  • Experience with SRM system implementations
  • Mass Storage Systems
  • HRM-HPSS (LBNL, ORNL, BNL), Enstore (Fermi),
    JasMINE (Jlab), Castor (CERN), MSS (NCAR), Castor
    (RAL)
  • Disk systems
  • DRM(LBNL), jSRM (Jlab), DPM (CERN), universities
  • Combination systems
  • dCache(Fermi) sophisticated multi-storage
    system
  • L-Store (U Vanderbilt) based on Logistical
    Networking
  • StoRM to parallel file systems (ICTP, Trieste,
    Italy)

77
Standards for Storage Resource Management
  • Main concepts
  • Allocate spaces
  • Get/put files from/into spaces
  • Pin files for a lifetime
  • Release files and spaces
  • Get files into spaces from remote sites
  • Manage directory structures in spaces
  • SRMs communicate as peer-to-peer
  • Negotiate transfer protocols

78
DataMover
  • Perform rcp r directory on the WAN

79
SRMs supports data movement betweenstorage
systems

.
2
N
G
O
R

S
T
O
R
E
O
Request
Workflow or
I
E

V
O

C
T
Community
Application-
Consistency Services

I
C
N
I
A
Interpretation
Request
L
T
I
I
F
C
A
Authorization
Specific Data
(e.g., Update Subscription,
C
V
I
A
I
and Planning
Management
C
R
U
M
E
L
Services
Discovery Services
Versioning, Master Copies)
E
T
L
E
P
O
E
Services
Services
L
P
S
R
P
V
D
I
S
O
I
A
V
T
C
C
E
L

G
R
L
1
S

N
O
O
I
E
L
E
E
C
F
T
Storage
V

L
C
Data Filtering or
A
Compute
Data
Monitoring/
Data
General Data
I
A
S
P
R
T
R
N
I
E
Transformation
Scheduling
Transport
Auditing
Federation
E
Discovery
Data
T
U
C
I
C
D
L
N
E
O
I
Services
(Brokering)
Services
Services
Services
Services
L
E
V
R
U
Movement
S
L
M
G
R
O
E
O
E
O
R
C
S
C
E
L

S
G
E
E
N
C
Resource
Storage
I
C
Compute
Data Filtering or
Database
File Transfer
R
S
R

Monitoring/
U
Resource
Resource
Transformation
Management
Service
U
G
O
N
O
Auditing
Manager
Management
Services
Services
(GridFTP)
S
I
S
R
E
E
A
R
R
H
S
Y
T
I
V
I
Communication
Authentication and
T
C
Protocols (e.g.,
Authorization
E
TCP/IP stack)
Protocols (e.g., GSI)
N
N
O
C
This figure based on the Grid Architecture paper
by Globus Team
C
Other Storage
I
Mass Storage System (HPSS)
Compute
R
Networks
B
Systems
A
systems
F
80
Massive Robust File Replication
  • Multi-File Replication why is it a problem?
  • Tedious task many files, repetitious
  • Lengthy task long time, can take hours, even
    days
  • Error prone need to monitor transfers
  • Error recovery need to restart file transfers
  • Stage and archive from MSS limited concurrency,
    down time, transient failures
  • Commercial MSS HPSS at NERSC, ORNL, ,
  • Legacy MSS MSS at NCAR
  • Independent MSS Castor (CERN), Enstore
    (Fermilab), JasMINE (Jlab)

81
DataMover SRMs use in ESG forRobust Multi-file
replication
82
Web-Based File Monitoring Tool
  • Shows
  • Files already transferred- Files during
    transfer
  • Files to be transferred
  • Also shows for
  • each file
  • Source URL
  • Target URL
  • Transfer rate

83
File tracking helps to identify bottlenecks
Shows that archiving is the bottleneck
84
File tracking shows recovery from transient
failures
Total 45 GBs
85
Robust Multi-File Replication
  • Main results
  • DataMover is being used in production for over
    three years
  • Moves about 10 TBs a month currently
  • Averages 8 MB/s (64 Mb/sec) over WAN
  • Eliminated person-time to monitor transfer and
    recover from failures
  • Reduced error rates from about 1 to 0.02(50
    fold reduction)

http//www.ppdg.net/docs/oct04/ppdg-star-oct04.d
oc
86
Summary lessons learned
  • Scientific workflow is an important paradigm
  • Coordination of tasks AND Management of data flow
  • Managing repetitive steps
  • Tracking, estimation
  • Efficient I/O is often the bottleneck
  • Technology essential for efficient computation
  • Mass storage need to be seamlessly managed
  • Opportunities to interact with Math packages
  • General analysis tools are useful
  • Parallelization is key to scaling
  • Visualization is an integral part of analysis
  • Data movement is complex
  • Network infrastructure is not enough can be
    unreliable
  • Need robust software to manage failures
  • Need to manage space allocation
  • Managing format mismatch is part of data flow
  • Metadata emerging as an important need
  • Description of experiments/simulation
  • Provenance

87
Data and Storage Challenges Still to be Overcome
  • Fundamental technology areas
  • From the report from the DOE Office of Science
    Data-Management Workshops (March May 2004)
  • Efficient access and queries, data integration
  • Distributed data management, data movement,
    networks
  • Storage and caching
  • Data analysis, visualization, and integrated
    environments
  • Metadata, data description, logical organization
  • Workflow, data flow, data transformation
  • General Open Problems
  • Multiple parallel file systems
  • A common data model
  • Coordinated scheduling of resources
  • Reservations and workflow management
  • Multiple data formats
  • Running coupled codes
  • Coordinated data movement (not just files)
  • Data Reliability / monitoring / recovery
  • Tracking data for long running jobs
  • Security authentication authorization

URL http//www-user.slac.stanford.edu/rmount/
dm-workshop-04/Final-report.pdf
88
SDM Mini-Symposium
  • High-Performance Parallel Data and Storage
    Management
  • Alok Choudhary (NWU) and Rob Ross (ANL), Robert
    Latham (ANL)
  • Mining Science Data
  • Chandrika Kamath (LLNL)
  • Accelerating Scientific Exploration with Workflow
    Automation Systems
  • Ilkay Altintas (SDSC), Terence Critchlow (LLNL),
    Scott Klasky (ORNL), Bertram Ludaescher
    (UCDavis), Steve Parker (UofUtah), Mladen Vouk
    (NCSU)
  • High Performance Statistical Computing with
    Parallel R and Star-P
  • Nagiza Samatova (ORNL) and Alan Edelman (MIT)
Write a Comment
User Comments (0)
About PowerShow.com