Title: An Architecture for Large Scale Data
1An Architecture forLarge Scale Data
- Dave Nadeau
- SDSC Scientific Visualization Group
2Motivation
- Support analysis, filtering, and compositing
- Larger-than-core (and swap) data sets
- Multi-modal and time-varying data
- Multiple data sets simultaneously
- And...
- Do efficient data movement
- Execute well on parallel architectures
- Integrate easily w/existing applications
toolkits - Support Alpha project applications
CT
Cryosection
Classification
3Layered Toolkit Architecture
Application
Expression Tree Toolkit
Orchestrate filter execution
Mesh Toolkit
Bind a coord. system to data
Data Grid Toolkit
Manage an N-space data grid
Data Management
Cache pages for lazy I/O
File Format Handling
Support specific file formats
SRB, ADR, etc.
Manage file storage
4Managing Data Grids
Data Grid Toolkit
- Manage a paged data grid (array-like)
- An N-dimensional grid of cells
- Spatial data time-series
- Arbitrary cell data content
- Handle larger-than-core data
- Transparently pages data in/out
- Support from ADR DataCutter
- Compressed data (disk memory)
5Pre-fetching Intelligently
Data Grid Toolkit
- Random access (slow)
- Get/set cells in any order
- Structured access (faster)
- Get/set cells in a pre-defined order
- Data-order access (fastest)
- Get/set cells in the datas storage order
5
1
3
2
4
6
7
9
8
5
1
3
2
4
6
7
8
9
5
1
3
2
4
6
7
8
9
6Paging Intelligently
Data Grid Toolkit
- Neighborhood-aware paging
- Page in nearby cells in N dimensions
- Support convolution filtering, rendering,
marching-cubes, ...
Current center cell
Filter window
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Keep neighboring cells paged-in as well
7Using Coordinate Systems
Mesh Toolkit
- Bind a coordinate system to a data grid
- Euclidean, cylindrical, spherical, time-series,
... - Uniform, structured, unstructured
- Handle coordinate system-based operations
- Resampling with interpolation
- Lazy-evaluation
- Multiple file format handlers
8Operating on Data
Expression Tree Toolkit
- Define an expression tree for data operations
- Leaf nodes are data sets, functions, ...
- Interior nodes are composite, filter, ...
- Transforms align overlapping data sets
- Execute it to generate samples
- Client defines the expression
- Server on big iron executes it
Client
Server
9Operating on Expressions
Expression Tree Toolkit
- Expressions can be optimized
- Re-order operators
- Similar to optimizing compilers databases
- Sample order can be optimized
- Re-order data accesses for better cache
efficiency - Data can be staged intermediate results cached
10Combining Brain Data Sets
Expression Tree Toolkit
Composite
Mask by Hue
Extract Hue
Scalar to RGB
RGB to HSI
ScalarCT-scan
Color Cryosection
Color Segmentation
512 x 512 x 230
547 x 710 x 672
547 x 710 x 672
11Combining Brain Data Sets
Expression Tree Toolkit
Composited
CT
Cryosection
12Combining Stellar Data Sets
Expression Tree Toolkit
- Complex expression trees
- 60 nodes in the Orion body
- 90 separate expression trees
- Orion, proplyds, shock fronts, ...
13And more toolkits...
Other Toolkits
- Interactive imaging with...
- Mitsubishi VolumePro cards
- Point clouds 3D texture mapping with graphics
pipelines - High-quality imaging with VISTA...
PointCloud
3DTexture
VolumePro
VISTA
14Design Team
Scripps Research Art Olson Mike Pique Michel
Sanner
SDSC Bernard Pailthorpe Dave Nadeau Jon
Genetti John Moreland Mike Bailey Rich
Charles Alex Decastro
U. Texas Chandrajit Bajaj Ariel Shamir
15Data-Visualization Pipeline
Get data from disk efficiently
Manage data in memory efficiently
Compute on data efficiently
Visualize data efficiently
Data Orchestration
Computation
Visualization
MCAT (Metadata)
KeLP FloorPlan
. . .
SRB Server
ADR DataCutter
SRB Server
16Data-Visualization Pipeline
Get data from disk efficiently
Manage data in memory efficiently
Compute on data efficiently
Visualize data efficiently
Data Orchestration
Computation
Visualization
VISTA Renderer
. . .
Data - Vis Toolkits
Interaction Tools