Title: Graceful Operation of DiskDrives under Thermal Emergencies
1Graceful Operation of Disk-Drives under Thermal
Emergencies
- Y Kim, J. Choi, S. Gurumurthi, A.
Sivasubramaniam - January 4, 2007
- Dept. of Computer Science and Engineering
- The Pennsylvania State University
- Dept. of Computer Science
- University of Virginia
2Thermal Problems of Data Centers
3Thermal Impact on Disk-Drive Reliability
- Heat-Related Problems
- Data corruption
- High off-track errors
- Head-crashes
- Thermal Emergency
- HVAC/fan break-down
- Hot neighbor components
- Possible Solutions
- Powerful cooling systems
- Duplicated over-cooling systems
- Thermal Management of Disk-Drive
- D. Anderson et al, More than an Interface
SCSI vs. ATA, in FAST 2003.
4Conventional Disk-Drive under Thermal Emergency
T E M P E R A T U R E
Thermal Emergency Line
No available service due to disk off !
TIME
5Multi-speed Disk-Drive ISCA03
RPM
TIME
Power ( Platters)(RPM)2.8(Diameter)4.6
6Power Source of Disk-Drive
7Using Multi-Speed Disk-Drive under Thermal
Emergency
T E M P E R A T U R E
Thermal Emergency Line
Service still available for the requests under
thermal emergency !
TIME
8Design Issue of Multi-Speed Disk-Drive under
Thermal Emergency
- Goals of thermal management with multi-speed
disk-drive? - Avoid thermal emergency
- Maximize performance by continuously servicing
the requests
- Design issue of efficient multi-speed disk-drive?
- How long should it run at low speed?
- Can it service the requests at low speed?
- MDISKsimple spinning at multi-speeds while
servicing only at one speed - MDISKopt spinning and servicing at different
speeds
- Control policies of multi-speed disk-drive?
- Time-based policy
- Watermark-based policy
9Time-based Policy for MDISKsimple
T E M P E R A T U R E
Thermal Emergency Line
- Service is not available during a pre-defined
delay. - Long delay still causes performance-degradation
!
TIME
10Watermark-based Policy for MDISKopt
T E M P E R A T U R E
Thermal Emergency Line
- Service available under thermal emergency as
opposed to MDISKsimple. - Performance highly depends on Tlow !
TIME
11Experiments
- We used a Performance-Thermal Simulator for
Storage Systems (called STEAM).
Time Stamps
DiskSim (Event-Driven)
Thermal Model (Time-Step based)
- DiskSim2.0
- Performance simulator for conventional disk-drive
- Thermal Model
- To generate temperature distribution of
disk-drive by using finite difference method to
calculate heat flow
12Workloads
- Scenarios for thermal emergency with different
real I/O traces
Workloads
Initial Tamb(C)
Increased Tamb(C)
Start (s)
End (s)
Disk ()
Thermal Emergency
Simulated Time(s)
HPL Openmail
29 (C)
42 (C)
500
2,500
3,607
8
OLTP Application
29 (C)
33 (C)
5,000
30,000
43,712
24
Search Engine
29 (C)
33 (C)
2,000
12,000
15,395
6
TPC-C
29 (C)
33 (C)
2,000
10,000
15,851
4
13Time-based policy for MDISKsimple
- Large RPM transition time degrades the
performance. - More cooling unit times prevent
performance-degradation.
14Watermark-based Policy for MDISKopt
- RPM transitions are crucial in performance
- Low Tlow gives sufficient cooling effect on
disk-drive to leverage the overhead of RPM
transition time.
15Conclusions
- We proposed a multi-speed disk-drive approach for
thermal management of disk-drive under thermal
emergency. - It avoids thermal emergency while maintaining
service availability under thermal emergency. - RPM transition time in a multi-speed disk-drive
should be minimized for better application.
16Thank you !
17Additional Slides
18Growth in Drive Performance
Source Hitachi GST Technology Overview Charts,
http//www.hitachigst.com/hdd/technolo/overview/s
toragetechchart.html
19Thermal Profiles (DRPM)
(a) HPL Openmail
(b) OLTP
(c) Search-Engine
(d) TPC-C
20Control Policies for DRPM
1. Time-based Policy T temperature P
pre-defined time T read_temp() X
read_time() while(1) if( T gt
Tthreshold ) drop_rpm()
// wait for predefined time do
X read_time() while( X lt
P) increase_rpm()
else service()