Title: Very Large Dataset Access and Manipulation: Active Data Repository (ADR) DataCutter and MetaChaos
1Very Large Dataset Access and ManipulationActive
Data Repository (ADR)DataCutter and MetaChaos
- Joel Saltz
- University of Maryland, College Park
- and
- Johns Hopkins Medical Institutions
- http//www.cs.umd.edu/projects/adr
2What we do
- Develop database tools for interacting with large
multi-scale, multi-resolution datasets - Ad-hoc queries, produce data products, support
visualization of disk and tape based datasets - Query, subset and filter very large archival
datasets - Operating system and middleware for very large
active network attached storage systems - Compilers that allow users to easily specify user
defined data transformations (e.g. using Java
dialect) - Tools targeted at distributed multi-architecture
platforms
3Tools to Manage Storage Hierarchy
- Fast secondary storage (Active Data Repository)
- Tools for on-demand data product generation,
interactive data exploration, visualization - Target closely coupled sets of processors/disks
- Archival Storage (DataCutter)
- Load subset of data from tertiary storage into
disk cache or client - Access data from distributed data collections
- Preprocess close to data sources
- Stand-alone and Integrated into NPACI Storage
Resource Broker
4Tool to Couple Applications
- MetaChaos
- Parallel programs distribute data structures
between processor memories - Separately developed programs will use different
schemes to distribute data - MetaChaos coordinates movement of data between
separately developed, compiled parallel programs - Layered on standard message passing layer such as
MPIch-g, PVM - Garlik integration of MetaChaos with KeLP floor
plans
5Irregular Multi-dimensional Datasets
- Spatial/multi-dimensional multi-scale,
multi-resolution datasets - Applications select portions of one or more
datasets - Selection of data subset makes use of spatial
index (e.g., R-tree, quad-tree, etc.) - Data not used as-is, generally preprocessing is
needed - often to reduce data volumes
6Querying Irregular Multi-dimensional Datasets
- Irregular datasets
- Think of disk-based unstructured meshes, data
structures used in adaptive multiple grid
calculations, sensor data - indexed by spatial location (e.g., position on
earth, position of microscope stage) - Spatial query used to specify iterator
- computation on data obtained from spatial query
- computation aggregates data - resulting data
product size significantly smaller than results
of range query
7Loading Datasets into ADR
- A user
- should decompose dataset into data chunks
- optionally can distribute chunks across the
disks, and provide an index for accessing them - ADR, given data chunks and associated minimum
bounding rectangles in a set of files - can distribute data chunks across the disks using
a Hilbert-curve based declustering algorithm, - can create an R-tree based index on the dataset.
8Loading Datasets into ADR
- ADR Data Loading Service
- Distributes chunks across the disks in the system
(e.g., using Hilbert curve based declustering) - Constructs an R-tree index using bounding boxes
of the data chunks
Disk Farm
9Data Loading Service
- User must decompose the dataset into chunks
- For a fully cooked dataset, User
- moves the data and index files to disks (via ftp,
for example) - registers the dataset using ADR utility programs
- For a half cooked dataset, ADR
- computes placement information using a Hilbert
curve-based declustering algorithm, - builds an R-tree index,
- moves the data chunks to the disks
- registers the dataset
10Query Execution in Active Data Repository
- An ADR Query contains a reference to
- the data set of interest,
- a query window (a multi-dimensional bounding box
in input datasets attribute space), - default or user defined index lookup functions,
- user-defined accumulator,
- user-defined projection and aggregation
functions, - how the results are handled (write to disk, or
send back to the client). - ADR handles multiple simultaneous active queries
11ADR Query Execution
query
Send output to clients
Index lookup
Combine partial output results
Aggregate local input data into output
Generate query plan
Initialize output
12Dataset Structure
- Spatial and temporal resolution may depend on
spatial location - Physical quantities computed and stored vary with
spatial location
13Processing Irregular DatasetsExample --
Interpolation
Output grid onto which a projection is carried out
Specify portion of raw sensor data
corresponding to some search criterion
14Active Data Repository (ADR)
- Set of services for building parallel databases
of multi-dimensional datasets - enables integration of storage, retrieval and
processing of multi-dimensional datasets on
parallel machines. - can maintain and jointly process multiple
datasets. - provides support and runtime system for common
operations such as - data retrieval,
- memory management,
- scheduling of processing across a parallel
machine. - customizable for various application specific
processing.
15Data Processing Scenario
source data elements
16Data Processing Scenario
result data elements
mapping function
source data elements
17Data Processing Scenario
result data elements
intermediate data elements (accumulator elements)
reduction function
source data elements
18Data Processing Scenario
P0
P1
P2
P0
P1
P2
Data elements declustered across disks attached
to processors of distributed memory machines
19Data Processing
- Source and result datasets are multi-dimensional
- Result dataset often smaller than source dataset
- Perform processing near where source datasets
live - Correctness of the reduction functions does not
depend on the order source elements are processed
20Order-independent Reduction Functions
P2
P0
P1
P2
P1
P0
combine phase
reduction phase
P0
P1
P2
P0
P1
P2
Correctness of reduction function does not depend
on the order elements are processed
21Data Processing Strategies
- Fully Replicated Accumulator (FRA)
- initialization replicate accumulator on all
processors - reduction
- read src elements from local disks
- process src elements with local accumulator
elements - combine merge replicated accumulator elements
- Sparsely Replicated Accumulator (SRA)
- initialization only replicate accumulator where
required - Distributed Accumulator (DA)
- initialization partition accumulator among
processors - reduction
- read src elements on local disks
- send src elements to processor that owns mapped
accumulator for processing
high memory requirement
low memory requirement
22 ADR Applications
- Visualize Thematic Mapper (TM) Landsat images
- Global Land Cover Facility
- Enhanced the capabilities of the GLCF TM meta
data browser to allow browsing of the raw TM
images - Visualize astronomy data using MPIRE
- MPIRE/ADR implementation extended the
functionality of MPIRE to allow out-of-core
computations - MPIRE runs on very large data sets even on
relatively small numbers of processors. - Applications were demonstrated at SC99.
23ADR Applications
- Energy and Environment NPACI Alpha project
- Data repository for flow data, mesh interpolation
used in coupling flow results to projection,
transport codes - History matching -- examining differences and
similarities in a set of simulation realizations - Virtual Microscope
- Exploration of large microscopy datasets
24Pathology
Volume Rendering
Applications
Surface/Groundwater Modeling
Satellite Data Analysis
25Example ADR and MetaChaosCoupling of Surface
Water Codes
- Carry out a surface water pollution remediation
using a chain of flow codes and reactive
transport codes. - Codes run on separate platforms and their
results are stored in ADR which, along with
MetaChaos, provides the coupling. - Parallelization of Projection/Ground Water Code
using KeLP
26Projection Code UTPROJ
- Couples 3D surface water flow model to
contaminant and salinity transport models, can be
used as ground water code - Implements conservative velocity projection
method - Improves local mass conservation
- Projection formulation based on mixed finite
element method
Upper Chesapeake Bay
Philadelphia CND Canal
Aberdeen Proving Grounds Baltimore US Naval
Academy
Delaware Bay
Atlantic Ocean
Delaware Bay, CND Canal, and Chesapeake Bay
27Current state of project
ADR
28Water Contamination Studies
Simulation Time
29 30 31Example Split Parsim
- UT Austin code PARSIM models flow and reactive
transport - Applications Bay and estuary, reservoir, blood
flow - Computationally intensive flow calculations
- Data intensive reactive transport (20
components) - Flow and Reactive Transport run on different
platforms, coupled using MetaChaos - Data archived on ADR in I/O cluster
- Reactive Transport data analyzed using ADR
(isosurface contour)
32ADR Subsets Data, Carries out Iso-surface
Rendering Over Range of Timesteps (vtk client)
33Other Research
- DataCutter
- Supports data subsetting, filters connected by
streams (coarse grained dataflow). - Integrated in NPACI SRB end to end tests
included spatial subsetting, decompression,
clipping of 5TB (uncompressed) datasets - Middleware for large scale data storage
- Building large (50TB) disk based clusters
- Active disk disklet model for placing
processing near disks - Compilers for user-defined functions
- Data parallel model
- Users write procedures and customized runtime
support is generated - Interprocedural and slicing analysis
34New IBM Collaborations
- Active Network Attached Storage
- HPSS
- Assume dedicated storage cluster(s) and zero, one
or more large SP configurations - SDSC
- Hopkins
- Florida State
35HPSS
- Collaborators Bob Coyne, Otis Graf
- Stage high end computing and large scale data
manipulation on a collection of clusters and
parallel machines linked by a high bandwidth
local area network - Deploy HPSS to use the very large tape store at
SDSC for tertiary storage but instantiate the
data cache in the disk cluster at the University
of Maryland - OC-48 network connection (Abilene) will make it
possible to separate HPSS disk cache and tape
library - Library routines to invoke filters on tape data
obtained from tape. - Library will use IBM client API to open files
and to bypass disk cache to directly access data - DataCutter filters will process data
36Software for Network Attached Storage
- Douglas Pase -- Netfinity Network attached
Storage (NAS) - Extend filesystems to support pipelined
communicating processes to perform computation as
data is stored or retrieved. - Filter data to implement a database operation
such as a join or datacube, or to support a more
specialized data mining or data intensive
scientific calculation - Determine whether and how to replicate frequently
accessed files, or how to change file placement
or file striping - Related work in context of GPFS filesystem (Roger
Haskin, IBM Almaden)
37Details on Collaborative Work with Doug Pase
- Work distributed using Java-based software agents
or disklets. - Software transported from client to a server,
executed on server. - Client would be the application, and the server
would be the disk or NAS server. - Agent processes local data,sends results back to
the client as needed. - Disk or NAS server can maintain its configuration
as an appliance, while still offering the
opportunity to move computations to data. - The agent server would restrict any agent's
access to data or other resources appropriately. - Close link with Ongoing Maryland work --
DataCutter, Active Disk and Java based ADR
compiler
38Research Group
- Alan Sussman
- Tahsin Kurc
- Umit Catalyurik
- Chialin Chang
- Renato Ferreira
- Mike Beynon
- Henrique Andrade
Collaborators Mary Wheelers group Scott
Badens group
39Architecture of Active Data Repository
Client 2 (sequential)
Client 1 (parallel)
Query
Front End
Results
Application Front End
Query Interface Service
Query Submission Service
Query Execution Service
Query Planning Service
Dataset Service
Attribute Space Service
Data Aggregation Service
Indexing Service
Back End
40ADR Query Execution
Client
Output Handling Phase
Global Combine Phase
Local Reduction Phase
41DataCutter
42DataCutter
- A suite of Middleware for subsetting and
filtering multi-dimensional datasets stored on
archival storage systems - Integrated with NPACI Storage Resource Broker
(SRB) - Standalone Prototype
43DataCutter
- Spatial Subsetting using Range Queries
- a hyperbox defined in the multi-dimensional space
underlying the dataset - items whose multi-dimensional coordinates fall
into the box are retrieved. - Two-level hierarchical indexing -- summary and
detailed index files - Customizable --
- Default R-tree index
- User can add new indexing methods
44Processing
- Processing (filtering/aggregations) through
Filters - to reduce the amount of data transferred to the
client - filters can run anywhere, but intended to run
near (i.e., over local area network) storage
system - Standalone system allows multiple filters placed
on different platforms - SRB release allows only a single filter which can
be placed anywhere - Motivated by Uysals disklet work
45Filter Framework
- class MyFilter public AS_Filter_Base
public int init(int argc, char argv )
int process(stream_t st) int
finalize(void)
46DataCutter -- Subsetting
- Datasets are partitioned into segments
- used to index the dataset, unit of retrieval
- Indexing very large datasets
- Multi-level hierarchical indexing scheme
- Summary index files -- to index a group of
segments or detailed index files - Detailed index files -- to index the segments
47Placement
- The dynamic assignment of filters to particular
hosts for execution is placement (mapping) - Optimization criteria
- Communication
- leverage filter affinity to dataset
- minimize communication volume on slower
connections - co-locate filters with large communication volume
- Computation
- expensive computation on faster, less loaded hosts
48Integration of DataCutter with the Storage
Resouce Broker
49Storage Resource Broker (SRB)
- Middleware between clients and storage resources
- Remote Access to storage resources.
- Various types
- File Systems - UNIX, HPSS, UniTree, DPSS (LBL).
- DB large objects - Oracle, DB2, Illustra.
- Uniform client interface (API).
50Storage Resource Broker (SRB)
- MCAT - MetaData Catalog
- Datasets (files) and Collections (directories) -
inodes and more. - Storage resources
- User information - authentication, access
privileges, etc. - Software package
- Server, client library, UNIX-like utilities, Java
GUI - Platforms - Solaris, Sun OS, Digital Unix, SGI
Irix, Cray T90.
51SRB/DataCutter
- Support for Range Queries
- Creation of indices over data sets (composed set
of data files) - Subsetting of data sets
- Search for files or portions of files that
intersect a given range query - Restricted filter operations on portions of files
(data segments) before returning them to the
client (to perform filtering or aggregation to
reduce data volume)
52SRB/DataCutter System
Application (SRB client)
File SID
DBLobjID
Range Query
ObjSID
Resource
Storage Resource Broker (SRB)
User
Indexing Service
DataCutter
MCAT
Filtering Service
SRB I/O and MCAT API
Application Meta-data
DB2, Oracle, Illustra, ObjectStore
HPSS, UniTree
UNIX, ftp
Distributed Storage Resources
53SRB/DataCutter Client Interface
- Creating and Deleting Index
int sfoCreateIndex(srbConn conn, sfoClass class,
int catType, char
inIndexName, char outIndexName,
char resourceName)
int sfoDeleteIndex(srbConn conn, sfoClass class,
int catType, char
indexName)
54SRB/DataCutter Client Interface
- Searching Index -- R-tree index
int sfoSearchIndex(srbConn conn, sfoClass class,
char indexName,
void query, indexSearchResult
myresult, int
maxSegCount)
typedef struct int dim double
min, max rangeQuery
int sfoGetMoreSearchResult(srbConn conn, int
continueIndex,
indexSearchResult myresult,
int
maxSegCount)
55SRB/DataCutter Client Interface
- Searching Index -- R-tree index
typedef struct int dim /
bounding box dimensions / double min
/ minimum in each dimension / double max
/ maximum in each dimension / sfoMBR /
Bounding box structure / typedef struct
sfoMBR segmentMBR / bounding box of the
segment / char objID / object
in SRB that contains the segment / char
collectionName / collection where object is
stored / unsigned int offset /
offset of the segment in the object /
unsigned int size / size of segment
/ segmentInfo / segment meta-data
information / typedef struct int
segmentCount / number of segments
returned / segmentInfo segments /
segment meta-data information / int
continueIndex / continuation flag
/ indexSearchResult / search result
structure /
56Applying Filters
int sfoApplyFilter(srbConn conn, sfoClass class,
char hostName,
int filterID, char filterArg,
int numOfInputSegments,
segmentInfo inputSegments,
filterDataResult
myresult, int
maxSegCount)
int sfoGetMoreFilterResult(srbConn conn, int
continueIndex,
filterDataResult myresult,
int maxSegCount)
57Applying Filters
typedef struct segmentInfo segInfo / info
on segment data buffer after filter oper. /
char segment / segment data buffer
after filter is applied /
segmentData typedef struct int
segmentDataCount / segments in segmentData
array / segmentData segments /
segmentData array / int
continueIndex / continuation flag /
filterDataResult
58Application Virtual Microscope
- Interactive software emulation of high power
light microscope for processing/visualizing image
datasets - 3-D Image Dataset (100MB to 5GB per focal plane)
- Client-server system organization
- Rectangular region queries, multiple data chunk
reply - pipeline style processing
59Virtual Microscope Client
60VM Application using SRB/DataCutter
read
Indexing
SRB/DataCutter
Distributed Collection of Workstations
zoom
clip
decompress
Distributed Storage Resources
Local Area Network
read image chunks
read
convert jpeg image chunks into RGB pixels
decompress
view
clip
clip image to query boundaries
Client
sub-sample to the required magnification
zoom
view
stitch image pieces together and display image
61Experimental Setup
- UMD 10 node IBM SP (1 4CPU, 3 2CPU, 6 1CPU)
- HPSS system (10TB tape storage, 500GB disk cache)
- 4GB JPEG compressed dataset (90GB
uncompressed),180k x 180k RGB pixels (200 x 200
jpeg blocks of 900x900 pixels each) - 250GB JPEG compressed dataset (5.6TB
uncompressed), 1.44Mx1.44M RGB pixels (1600x1600
jpeg blocks) - Rtree index based query lookups
- server host SP 2CPU node
- Read, Decompress, Clip, Zoom, View distributed
between client and server
62Dataset --250 GB (Compressed) All Computation on
Server
63Breakdown of DataCutter Costs250 GB dataset,
9600x9600 query
64Effect of Filter Placement 9600x9600 Query Warm
Cache
65Effect of Dataset Size4.5Kx4.5K QueryServer
does Everything but ViewWarm Cache
66The Future
- Integrated suite of tools for handling very deep
memory hierarchies - Common set of tools for grid and disk cache
computations - Programmability
- Use XML metadata
- Ongoing data parallel compiler project -- uses
Java based user defined functions - Applications development toolkit (Visual
DataCutter) - Implementation
- NPACI
- Private sector (?)