PerformanceRobust Parallel IO - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

PerformanceRobust Parallel IO

Description:

Graduated Declustering (GD): a Virtual Streams ... Evaluation of original GD implementation: progress-based ... Proposed solution: response-rate-based GD ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 17
Provided by: zmao
Category:

less

Transcript and Presenter's Notes

Title: PerformanceRobust Parallel IO


1
Performance-Robust Parallel I/O
Virtual Streams
  • Z. Morley Mao, Noah Treuhaft
  • CS258
  • 5/17/99
  • Professor Culler

2
Introduction
  • Clusters exhibit performance heterogeneity
  • static dynamic, due to both hardware and
    software
  • Consistent peak performance demands adaptive
    software
  • building performance-robust parallel software
    means keeping heterogeneity in mind
  • This work explores
  • adaptivity appropriate for I/O-bound parallel
    programs
  • how to provide that adaptivity

3
Heterogeneity demands adaptivity
Cluster Node
...
  • Physical I/O streams are simple to build and use
  • But their performance is highly variable
  • different drive models, bad blocks, multizone
    behavior, file layout, competing programs, host
    bottlenecks
  • I/O-bound parallel programs run at rate of
    slowest disk

4
Virtual Streams
  • Performance-robust programs want virtual streams
    that...
  • eliminate dependence on individual disk behavior
  • continually equalize throughput delivered to
    processes

Virtual Streams Layer
Disk
5
Graduated Declustering (GD) a Virtual Streams
implementation
  • data replicated (mirrored) for availability
  • use replicas to provide performance availability,
    too
  • fast network makes remote disk access comparable
    to local
  • distributed algorithm for adaptivity
  • client provides information about its progress
  • server reacts by scheduling requests to even out
    progress

client A
client B
Process
GD server
GD client library
server
server
B
A
6
GD in action
  • Local decisions yield global behavior

7
Evaluation of original GD implementation
progress-based
  • Seek overhead due to reading from all replicas

8
Deficiency of original GD implementation seek
overhead
  • Under the assumption of sequential data access
  • Seek occurs even when there is no perturbation
  • seeks are becoming more significant as disk
    transfer rate increases
  • Need a new algorithm, that ...
  • reads mostly from a single disk under no
    perturbation
  • dynamically adjusts to perturbation when
    necessary
  • achieves both performance adaptivity and minimal
    overhead

9
Proposed solution response-rate-based GD
  • Number of requests clients send to server based
    on server response rate
  • servers use request queue lengths to make
    scheduling decisions
  • uses implicit information, historyless
  • no bandwidth information transmitted between
    server and client
  • advantage each client has a primary server

10
Evaluation of response-rate-based GD
  • Graph of bandwidth vs. disk nodes perturbed

11
Historyless vs. History-based adaptiveness
  • History-based (progress based)
  • Adjustment to perturbation occurs gradually over
    time
  • Close to perfect knowledge, if the information
    not outdated
  • extra overhead in sending control information
  • Historyless (response-rate based)
  • primary server designation possible
  • to increase sensitivity to real perturbation by
    creating artificial perturbation
  • considers varying performance of data consumers
  • takes longer to converge

12
Stability and Convergence
  • How long does it take for the system to converge?
  • Linear with the number of nodes
  • Depends on the last occurrence of perturbation
  • Influenced by the style of communication
    (implicit vs. explicit)

13
Server request handoff
  • If a server finishes all its requests, it will
    contact other servers with the same replicas to
    help serve their clients (workstealing)
  • server request handoff keeps all disks busy when
    possible
  • design decisions?
  • How many requests to handoff? Depending on the BW
    history of both servers, depending on the size of
    request queue.
  • Benefit vs. Cost tradeoff

14
Writes
  • Identical to reads except...
  • Create incomplete replicas with holes
  • track holes in metadata
  • afterward, do hole-filling both for
    availability and for performance robustness

15
Conclusions
  • What did we achieve?
  • New load balancing algorithm--response-rate based
  • Deliver equal BW to parallel-program processes in
    face of performance heterogeneity
  • demonstrate the stability of the system
  • reduce seek overhead
  • server request handoff
  • writes
  • creates a useful abstraction for steaming I/O in
    clusters

16
Future Work
  • Future work
  • hot file replication
  • get peak BW after perturbation ceases
  • achieve orderly replies
  • multiple disks abstraction
Write a Comment
User Comments (0)
About PowerShow.com