The Impact of Disk IO on Multiprocessor CheckpointRecovery - PowerPoint PPT Presentation

About This Presentation
Title:

The Impact of Disk IO on Multiprocessor CheckpointRecovery

Description:

Takes time to validate a checkpoint. Only validated fault-free data can be communicated outside sphere of recovery ... 11ms seek time, request scheduling ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 29
Provided by: jar46
Category:

less

Transcript and Presenter's Notes

Title: The Impact of Disk IO on Multiprocessor CheckpointRecovery


1
The Impact of Disk I/O on Multiprocessor
Checkpoint/Recovery
  • CS 736 Project Presentation
  • Friday, December 13, 2002
  • Jarrod Lewis
  • Lixin Su

2
Overview Motivation
  • SafetyNet checkpoint/recovery mechanism
  • Fault-tolerant computing
  • Maintains consistent checkpoints of system state
  • Currently a hardware-only availability solution
  • Output Commit Problem
  • Takes time to validate a checkpoint
  • Only validated fault-free data can be
    communicated outside sphere of recovery
  • Cant rollback the outside world!
  • I/O writes cannot be committed in normal way

3
Overview Approach
  • Addressing the output commit problem for disk I/O
  • Standard solution is to delay/buffer writes
  • Performance impact?
  • Buffering mechanism?
  • Evaluate with commercial workloads
  • Hardware traces
  • IBM RS/6000 server (AIX 4.3.3)
  • Simulation
  • Simics full system multiprocessor simulator
  • DiskSim disk simulator

4
Overview Summary
  • Performance impact of disk write latency
  • Increased time to disk write completion (0.04 ms
    ? 4 ms)
  • Can have significant impact on performance!
  • Highly dependent on I/O characteristics
  • Up to 5.8x slowdown for 4 ms delay in TPC-C
  • Delaying/buffering mechanism
  • Implemented buffer at disk controller
  • Modest buffer size (10s of KB) needed to support
    buffering
  • Distance (time) between disk writes is large (10s
    of ms)

5
Outline
  • Overview
  • Motivation and Approach
  • Performance Impact
  • Buffering Mechanism
  • Conclusions

6
Motivation
  • SafetyNet checkpoint/recovery mechanism
  • Globally consistent checkpoints
  • Processors, memories, coherence permissions
  • Recovers to a pre-fault checkpoint re-executes
  • Checkpoint validation
  • Determines which checkpoint is recovery point
  • Determines when to interact with I/O devices
  • Output Commit Problem
  • Increasing checkpoint validation time
  • () Reduces logging overhead (overwrites)
  • () Tolerate longer latency faults
  • (-) Longer output commit delays

7
Approach
  • Standard solution Delay I/O writes
  • How does this affect performance? (Jarrod)
  • Evaluate in Simics full system simulator
  • Intercept and delay timing of disk writes
  • Evaluate microbenchmark, commercial workloads
  • Is delaying feasible? (Lixin)
  • Collect disk traces on IBM RS/6000 server
  • Evaluate traces with DiskSim simulator
  • I/O characteristics
  • Data buffering requirements

8
Outline
  • Overview
  • Motivation and Approach
  • Performance Impact
  • Buffering Mechanism
  • Conclusions

9
Experiment I Performance Impact
  • Simics multiprocessor simulator
  • Functional simulator only
  • Assumes each instruction takes 1 cycle to execute
  • I/O can have different timing
  • Access to source for devices (DMA, SCSI, etc)
  • Intercepting disk writes
  • Add fixed delay to each write
  • Delays disk content update, processor
    notification
  • Observe impact on execution time

10
Simics/sun4u System Overview
11
Simics/sun4u System Overview
12
Simics/sun4u System Overview
Issue DMA Read (Disk Write)
1
13
Simics/sun4u System Overview
DMA Controller issues request to SCSI Controller
2
NOTE This is the point where a disk write will
be delayed
14
Simics/sun4u System Overview
Transfer Data from RAM onto Disk
3
15
Simics/sun4u System Overview
SCSI Controller notifies DMA Controller when
write is complete
4
16
Simics/sun4u System Overview
DMA Controller interrupts the Processor to notify
the write is done
5
17
Performance Impact of Delayed Disk Writes
18
Rate of Writing Data to Disks
19
Outline
  • Overview
  • Motivation and Approach
  • Performance Impact
  • Buffering Mechanism
  • Conclusions

20
Trace Collection
  • Commercial Workloads
  • multi-user, multi-tier, multi-threaded,
    multi-client
  • SPECmail2001, SPECweb99, TPC-b
  • IBM RS/6000 server
  • running AIX 4.3.3

21
Workloads - SPECmail
  • Benchmarking mail server performance
  • Write intensive small writes
  • Running Configuration
  • 5 machines, 200 users, running 1 and a half days

22
SPECweb and TPC-b
  • SPECweb 99
  • benchmarking HTTP server
  • I/O intensive
  • dynamic GET, POST, etc.
  • TPC-b
  • benchmarking online banking database
  • Not I/O intensive but the first one I got
    running
  • Multiple banks, user accounts, threads

23
DiskSim 2.0
  • Disk simulator from CMU
  • Include device drivers, buses, controllers, etc
  • Request queuing, block caching, etc
  • Implemented a write buffer at controller level
  • Important simulation parameters
  • 11ms seek time, request scheduling/collapsing,
    block caching, etc.
  • Compared with IBM UltraStar 2 disk series

24
Write Interval Analysis
  • Average Write Interval
  • SPECMail 13 ms, SPECWeb 151 ms, TPC-b 3511
    ms

25
Buffer Size Sensitivity
  • Factors that affect TTF (time to fill)
  • I/O write intensity
  • Write buffer size

26
Outline
  • Overview
  • Motivation and Approach
  • Performance Impact
  • Buffering Mechanism
  • Conclusions

27
Conclusions
  • Delaying disk writes does affect performance
  • Performance degrades rapidly for larger delays
  • Multiprocessor system (multiple disks) more
    sensitive
  • Practical to implement a buffer at disk
    controller level
  • Must be SafetyNet-aware
  • For the current constraints of SafetyNet, only
    conservative amount of buffer is needed

28
Future Work
  • Buffering mechanism at SafetyNet
  • Buffer size and hardware complexity
  • Mechanisms of I/O interception
  • Develop solutions other than just delaying I/O
  • e.g., logging
Write a Comment
User Comments (0)
About PowerShow.com