Infrastructure for OptIPuter Storage Systems Research - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Infrastructure for OptIPuter Storage Systems Research

Description:

We need tools to allow us to simulate (or emulate) transactions over distributed ... Widely used for model/simulator/emulator validation. 5. Demerit Values ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 21
Provided by: Just48
Category:

less

Transcript and Presenter's Notes

Title: Infrastructure for OptIPuter Storage Systems Research


1
Infrastructure for OptIPuter Storage Systems
Research
  • Simulators, Trace Files, and Models for
    Experimentation
  • Summer 2003
  • CSAG presentation
  • Justin Burke and Andrew Chien

2
Goals of Infrastructure
  • We need tools to allow us to simulate (or
    emulate) transactions over distributed file and
    storage systems.
  • These tools need to complement actual
    experimental testbeds.
  • We need to understand which are useful for
    various types of experimentation and proof of
    concept.

3
Overview of Storage Subsystem
4
Modeling Storage Systems
  • A strong argument for detailed models An
    Introduction to Disk Drive Modeling.
  • Complexities of disk subcomponents necessitate a
    detailed disk model.
  • Establishes a base-line for metrics demerit.
    Widely used for model/simulator/emulator
    validation.

5
Demerit Values
From An Introduction to Disk Drive Modeling,
Ruemmler and Wilkes, 1994
6
Disk Characterization
  • Difficult to obtain a useful simulation without
    accurate disk parameters.
  • Manufacturers do not publish detailed
    characterizations.
  • ? Have to obtain empirically.
  • Unfortunately, standard characterization tools
    do not exist and may have to be custom-made.

7
Simulation Tool Evaluation
  • Simulators require a lot of time and effort to
    build, debug, and validate.
  • Ideal simulator includes programming interface,
    easy and flexible configuration, validated model,
    and support. )
  • Availability and ongoing development
  • Capabilities Trace Files, Ease of Use, Parallel
    Drive Configuration, Flexibility

8
Available Simulators
  • Pantheon
  • Disksim
  • Kotz's Simulation Model of the HP 97560 Disk
    Drive and Yiming Hu's added support for multiple
    disks
  • Nils Nieuwejaar's Disk Simulator
  • STARFISH

http//www-csag.ucsd.edu/individual/jburke/Storage
/internal/simulators.txt
9
Personal Selection
  • Disksim can meet most of our requirements fairly
    well flexible design, very accurate.
  • Extensible
  • It is also under active development.
  • Frequently used in FAST papers.

10
Use of Trace Files
  • When using a simulator, there are two main
    options for simulation synthetic workload and
    trace file.
  • Synthetic workload good for broad evaluation.
    Traces good for targeted evaluation.
  • Disk traces normally include
  • start/stop timestamp, sector (or
  • block) , request length

11
Trace File Availability
  • Fantasy plentiful traces taken from variety of
    production environments using parallel/distributed
    file systems.
  • Reality frequent use of synthetic workloads
    most popular traces are old (early 90s) very
    limited availability of parallel/distributed
    traces.

12
Trace Files in the Wild
  • HP Cello 92, 96, 99, 02(?)
  • HP TPC-C
  • HP TPC-D
  • HP Snake
  • Storage Performance Council
  • OpenMail email server
  • Postmark
  • Custom Traces

http//www-csag.ucsd.edu/individual/jburke/Storage
/internal/traces.txt
13
Models for Experimentation
  • Data Server selection, high performance
  • User data placement (for above)
  • System Manager incremental benefit of resources
  • System Manager fixed application deployment

14
Server selection
  • Given an application computational and data
    access specification
  • The set of resources which can satisfy these
  • Select the set of resources which give best
    performance
  • Select the set of resources which gives most
    robust performance (headroom, interference)
  • How do these properties/choices vary based on
    resource configuration?
  • Complement where should you place your replicas
    based on known access patterns? What about
    declustering?

15
Application Deployment
  • Given an application computational and data
    access specification
  • The set of resources which can satisfy these
  • Existing set of utilizations and capacities
  • Select the set of resources which give best
    performance
  • Select the set of resources which gives most
    robust performance (headroom, interference)
  • How do these properties/choices vary based on
    resource configuration? (and data placement)

16
Resource Deployment
  • Given an existing workload, utilizations, and
    capacities
  • Where to deploy additional resources (storage) to
    give maximum benefit to existing workload
    (increased performance)
  • Where to deploy to give most robust performance
    (headroom, interference)
  • Where to deploy to give greatest application
    choice freedom in future? Or maximum increase in
    capacity?
  • May need some model of future workload (or may be
    same as current)

17
Application Examples SAR
  • Earthscope SAR Application
  • High speed data integration/visualization
  • 32 gigabytes, delivered in less than 0.5 seconds
  • Presumed to be sourced from MANY disks
    distributed throughout the OptIPuter network
  • Which/How many disks? What are the critical
    performance factors?
  • How are these affected by various sharing and
    security models?

18
Application BIRN
  • High speed data integration/visualization of
    brain images
  • 10 terabytes, delivered in seconds
  • Comparative physiology and visualization
  • Presumed to be sourced from distributed sources
    (control)
  • Each source may consist of MANY disks distributed
    throughout the OptIPuter network

19
Requirements
  • Large scale system modeling (high performance)
  • On parallel resources?
  • Flexible configuration
  • Relation to real experimental platforms

20
Specific Hypothetical Experiments
Write a Comment
User Comments (0)
About PowerShow.com