Northwestern University - PowerPoint PPT Presentation

About This Presentation
Title:

Northwestern University

Description:

Northwestern University – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 15
Provided by: alok7
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: Northwestern University


1
Access Patterns, Metadata, and Performance
  • Alok Choudhary and Wei-Keng Liao
  • Department of ECE, Northwestern University
  • Collaboration with ANL

SDM kickoff meeting July 10-11, 2001
2
Virtuous Cycle
Simulation (Execute app, Generate data)
Problem setup (Mesh, domain Decomposition)
Manage, Visualize, Analyze
Measure Results, Learn, Archive
3
I/O Flow for Scientific Simulation
  • Many scientific applications perform
    simulation/analysis
  • 3 types of output Checkpoint, visualization,
    data for analysis
  • Checkpoint keeps re-writing to the same files
  • Visualization output
  • 2 types of input new run or restart

4
Data Access Sequence Dependency
  • Temporal dependency
  • Access the same data set at different time stamp
  • Spatial dependency
  • Access different data sets at the same time stamp
  • Resolution dependency
  • Access the same data set at different resolution
  • Sequence is useful for I/O performance
    improvement, eg. Pre-fetch, pre-stage, storage
    continuity

5
Spatial Data Access Patterns
  • Parallel partition patterns
  • Regular, irregular
  • Static, dynamic during simulation
  • Access sequence
  • Spatial, temporal, resolution
  • Access frequency
  • Once only, multiple times (overwrite for restart)
  • Access amount
  • Large, medium, small chunks

6
Access Patterns for Visualization/Analysis
  • Generated from real data during simulation or in
    post-simulation process
  • Smaller size than real data
  • Type conversion,
  • eg. float ? unsign char
  • Reduce/increase resolution
  • Projection 3D to 2D
  • 3 types of data generate and display sequence

7
Architecture
Simulation Data Analysis Visualization
User Applications
I/O func (best_I/O (for these param)) Hint
Query Input Metadata Hints, Directives Association
s
Data
OIDs parameters for I/O
Schedule, Prefetch, cache Hints (coll I/O)
Storage Systems (I/O Interface)
MDMS
Performance Input System metadata
Metadata access pattern, history
MPI-IO (Other interfaces..)
8
Approach
  • Management meta data using OR-DBMS
  • Collect and organize meta data in relation tables
  • Design meta data query interface using SQL
  • Access to HSS
  • Obtain current storage layout, configuration
  • Native I/O interfaces or MPI-IO
  • I/O optimization
  • Determine optimal I/O calls
  • Overlap I/O with computation, communication, and
    I/O
  • Pre-fetch, pre-stage, migrate, purge in HSS
  • Sub-filing for large file, file container for
    small files

9
Objective and Goal
  • Meta data management
  • Collect historical meta data, process user
    provided meta data, update meta data w.r.t
    environment changes
  • Efficient query for meta data
  • High performance I/O
  • Automatically determine optimal I/O calls from
    data access patterns
  • Improve performance by prefetching, caching,
    layout, inter-object association, striping, etc.
  • Support for Hierarchical Storage Systems (HSS)

10
Metadata
  • Application Level
  • Algorithms, compiling, execution environments
  • Time stamps, parameters, result summary
  • Programming Level
  • Data types, structures, association of datasets,
    partition patterns
  • Storage System Level
  • File locations, file structure, I/O modes, host
    names, device types, path names, storage
    hierarchy
  • Performance Level
  • I/O bandwidth of HSS for local and remote access
  • Data access sequence, frequency, other access
    hints
  • Collective or non-collection I/O

11
Applications
  • Asto3D -- study the highly turbulent convective
  • layers of late-type star
  • Write only
  • regular partition on all data sets
  • ENZO -- simulate the formation of a cluster of
  • galaxies consisting of gas
    and stars
  • Both read and write
  • Both regular and irregular partition
  • Adaptive Mesh Refinement dynamic load balancing
  • Common feature
  • Checkpoint / restart
  • Post-simulation data analysis
  • Visualizing the process of the computation in the
    form of a movie

12
Interface
13
Run Application
14
Dataset and Access Pattern Table
15
Data Analysis
16
Integrating Analysis
Simulation (Execute app, Generate data)
On-line analysis And mining
Problem setup (Mesh, domain Decomposition)
Manage, Visualize, Analyze
Measure Results, Learn, Archive
17
Visualize
18
Meta Data Representation in Database
19
Future Directions and Challenges
  • H/W and S/W mainly driven by commercial
    applications (e.g., web, DB, Content Delivery
    etc.)
  • H/W architectures (e.g., Infiniband)
  • S/W architectures (e.g., ODB, ORDB, DW, mining
    tools)
  • Challenge Can we adapt and enhance these to
    satisfy scientific computing file systems,
    storage, and data management requirements?
  • E.g., Parallel file systems on Infiniband
    architectures so that uniform UI and access may
    be provided from different systems?
  • Can we incorporate DM and analysis capabilities
    as well as efficient I/O techniques within DM
    systems
  • File Systems and DM Challenges
  • Can it be be customized with high performance?
  • Can it learn from user access patterns?
  • Can it optimize accesses automatically?
  • Can it provide high-level interface which is
    uniform?
Write a Comment
User Comments (0)
About PowerShow.com