Title: Northwestern University
1Access Patterns, Metadata, and Performance
- Alok Choudhary and Wei-Keng Liao
- Department of ECE, Northwestern University
- Collaboration with ANL
SDM kickoff meeting July 10-11, 2001
2Virtuous Cycle
Simulation (Execute app, Generate data)
Problem setup (Mesh, domain Decomposition)
Manage, Visualize, Analyze
Measure Results, Learn, Archive
3I/O Flow for Scientific Simulation
- Many scientific applications perform
simulation/analysis - 3 types of output Checkpoint, visualization,
data for analysis - Checkpoint keeps re-writing to the same files
- Visualization output
- 2 types of input new run or restart
4Data Access Sequence Dependency
- Temporal dependency
- Access the same data set at different time stamp
- Spatial dependency
- Access different data sets at the same time stamp
- Resolution dependency
- Access the same data set at different resolution
- Sequence is useful for I/O performance
improvement, eg. Pre-fetch, pre-stage, storage
continuity
5Spatial Data Access Patterns
- Parallel partition patterns
- Regular, irregular
- Static, dynamic during simulation
- Access sequence
- Spatial, temporal, resolution
- Access frequency
- Once only, multiple times (overwrite for restart)
- Access amount
- Large, medium, small chunks
6Access Patterns for Visualization/Analysis
- Generated from real data during simulation or in
post-simulation process - Smaller size than real data
- Type conversion,
- eg. float ? unsign char
- Reduce/increase resolution
- Projection 3D to 2D
- 3 types of data generate and display sequence
7Architecture
Simulation Data Analysis Visualization
User Applications
I/O func (best_I/O (for these param)) Hint
Query Input Metadata Hints, Directives Association
s
Data
OIDs parameters for I/O
Schedule, Prefetch, cache Hints (coll I/O)
Storage Systems (I/O Interface)
MDMS
Performance Input System metadata
Metadata access pattern, history
MPI-IO (Other interfaces..)
8Approach
- Management meta data using OR-DBMS
- Collect and organize meta data in relation tables
- Design meta data query interface using SQL
- Access to HSS
- Obtain current storage layout, configuration
- Native I/O interfaces or MPI-IO
- I/O optimization
- Determine optimal I/O calls
- Overlap I/O with computation, communication, and
I/O - Pre-fetch, pre-stage, migrate, purge in HSS
- Sub-filing for large file, file container for
small files
9Objective and Goal
- Meta data management
- Collect historical meta data, process user
provided meta data, update meta data w.r.t
environment changes - Efficient query for meta data
- High performance I/O
- Automatically determine optimal I/O calls from
data access patterns - Improve performance by prefetching, caching,
layout, inter-object association, striping, etc. - Support for Hierarchical Storage Systems (HSS)
10Metadata
- Application Level
- Algorithms, compiling, execution environments
- Time stamps, parameters, result summary
- Programming Level
- Data types, structures, association of datasets,
partition patterns - Storage System Level
- File locations, file structure, I/O modes, host
names, device types, path names, storage
hierarchy - Performance Level
- I/O bandwidth of HSS for local and remote access
- Data access sequence, frequency, other access
hints - Collective or non-collection I/O
11Applications
- Asto3D -- study the highly turbulent convective
- layers of late-type star
- Write only
- regular partition on all data sets
- ENZO -- simulate the formation of a cluster of
- galaxies consisting of gas
and stars - Both read and write
- Both regular and irregular partition
- Adaptive Mesh Refinement dynamic load balancing
- Common feature
- Checkpoint / restart
- Post-simulation data analysis
- Visualizing the process of the computation in the
form of a movie
12Interface
13Run Application
14Dataset and Access Pattern Table
15Data Analysis
16Integrating Analysis
Simulation (Execute app, Generate data)
On-line analysis And mining
Problem setup (Mesh, domain Decomposition)
Manage, Visualize, Analyze
Measure Results, Learn, Archive
17Visualize
18Meta Data Representation in Database
19Future Directions and Challenges
- H/W and S/W mainly driven by commercial
applications (e.g., web, DB, Content Delivery
etc.) - H/W architectures (e.g., Infiniband)
- S/W architectures (e.g., ODB, ORDB, DW, mining
tools) - Challenge Can we adapt and enhance these to
satisfy scientific computing file systems,
storage, and data management requirements? - E.g., Parallel file systems on Infiniband
architectures so that uniform UI and access may
be provided from different systems? - Can we incorporate DM and analysis capabilities
as well as efficient I/O techniques within DM
systems - File Systems and DM Challenges
- Can it be be customized with high performance?
- Can it learn from user access patterns?
- Can it optimize accesses automatically?
- Can it provide high-level interface which is
uniform?