An Architecture for Large Scale Data - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

An Architecture for Large Scale Data

Description:

Larger-than-core (and swap) data sets. Multi-modal and ... John Moreland. Mike Bailey. Rich Charles. Alex Decastro. U. Texas. Chandrajit Bajaj. Ariel Shamir ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 17
Provided by: nad140
Category:

less

Transcript and Presenter's Notes

Title: An Architecture for Large Scale Data


1
An Architecture forLarge Scale Data
  • Dave Nadeau
  • SDSC Scientific Visualization Group

2
Motivation
  • Support analysis, filtering, and compositing
  • Larger-than-core (and swap) data sets
  • Multi-modal and time-varying data
  • Multiple data sets simultaneously
  • And...
  • Do efficient data movement
  • Execute well on parallel architectures
  • Integrate easily w/existing applications
    toolkits
  • Support Alpha project applications

CT
Cryosection
Classification
3
Layered Toolkit Architecture
Application
Expression Tree Toolkit
Orchestrate filter execution
Mesh Toolkit
Bind a coord. system to data
Data Grid Toolkit
Manage an N-space data grid
Data Management
Cache pages for lazy I/O
File Format Handling
Support specific file formats
SRB, ADR, etc.
Manage file storage
4
Managing Data Grids
Data Grid Toolkit
  • Manage a paged data grid (array-like)
  • An N-dimensional grid of cells
  • Spatial data time-series
  • Arbitrary cell data content
  • Handle larger-than-core data
  • Transparently pages data in/out
  • Support from ADR DataCutter
  • Compressed data (disk memory)

5
Pre-fetching Intelligently
Data Grid Toolkit
  • Random access (slow)
  • Get/set cells in any order
  • Structured access (faster)
  • Get/set cells in a pre-defined order
  • Data-order access (fastest)
  • Get/set cells in the datas storage order

5
1
3
2
4
6
7
9
8
5
1
3
2
4
6
7
8
9
5
1
3
2
4
6
7
8
9
6
Paging Intelligently
Data Grid Toolkit
  • Neighborhood-aware paging
  • Page in nearby cells in N dimensions
  • Support convolution filtering, rendering,
    marching-cubes, ...

Current center cell
Filter window
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Keep neighboring cells paged-in as well
7
Using Coordinate Systems
Mesh Toolkit
  • Bind a coordinate system to a data grid
  • Euclidean, cylindrical, spherical, time-series,
    ...
  • Uniform, structured, unstructured
  • Handle coordinate system-based operations
  • Resampling with interpolation
  • Lazy-evaluation
  • Multiple file format handlers

8
Operating on Data
Expression Tree Toolkit
  • Define an expression tree for data operations
  • Leaf nodes are data sets, functions, ...
  • Interior nodes are composite, filter, ...
  • Transforms align overlapping data sets
  • Execute it to generate samples
  • Client defines the expression
  • Server on big iron executes it

Client
Server
9
Operating on Expressions
Expression Tree Toolkit
  • Expressions can be optimized
  • Re-order operators
  • Similar to optimizing compilers databases
  • Sample order can be optimized
  • Re-order data accesses for better cache
    efficiency
  • Data can be staged intermediate results cached

10
Combining Brain Data Sets
Expression Tree Toolkit
Composite
Mask by Hue
Extract Hue
Scalar to RGB
RGB to HSI
ScalarCT-scan
Color Cryosection
Color Segmentation
512 x 512 x 230
547 x 710 x 672
547 x 710 x 672
11
Combining Brain Data Sets
Expression Tree Toolkit
Composited
CT
Cryosection
12
Combining Stellar Data Sets
Expression Tree Toolkit
  • Complex expression trees
  • 60 nodes in the Orion body
  • 90 separate expression trees
  • Orion, proplyds, shock fronts, ...

13
And more toolkits...
Other Toolkits
  • Interactive imaging with...
  • Mitsubishi VolumePro cards
  • Point clouds 3D texture mapping with graphics
    pipelines
  • High-quality imaging with VISTA...

PointCloud
3DTexture
VolumePro
VISTA
14
Design Team
Scripps Research Art Olson Mike Pique Michel
Sanner
SDSC Bernard Pailthorpe Dave Nadeau Jon
Genetti John Moreland Mike Bailey Rich
Charles Alex Decastro
U. Texas Chandrajit Bajaj Ariel Shamir
15
Data-Visualization Pipeline
Get data from disk efficiently
Manage data in memory efficiently
Compute on data efficiently
Visualize data efficiently
Data Orchestration
Computation
Visualization
MCAT (Metadata)
KeLP FloorPlan
. . .
SRB Server
ADR DataCutter
SRB Server
16
Data-Visualization Pipeline
Get data from disk efficiently
Manage data in memory efficiently
Compute on data efficiently
Visualize data efficiently
Data Orchestration
Computation
Visualization
VISTA Renderer
. . .
Data - Vis Toolkits
Interaction Tools
Write a Comment
User Comments (0)
About PowerShow.com