Title: Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration
1Scheduling in Heterogeneous Grid
EnvironmentsThe Effects of Data Migration
- Leonid Oliker, Hongzhang Shan
- Future Technology Group
- Lawrence Berkeley Research Laboratory
- Warren Smith, Rupak Biswas
- NASA Advanced Supercomputing Division
- NASA Ames Research Center
2Motivation
- Geographically distributed resources
- Difficult to schedule and manage efficiently
- Autonomy (local scheduler)
- Heterogeneity
- Lack of perfect global information
- Conflicting requirements between users and system
administrators
3Current Status
- Grid Initiatives
- Global Grid Forum, NASA Information Power Grid,
TeraGrid, Particle Physics Data Grid, E-Grid, LHC
Challenge - Grid Scheduling Services
- Enabling multi-site application
- Multi-Disciplinary Applications, Remote
Visualization, Co-Scheduling,Distributed Data
Mining, Parameter Studies - Job Migration
- Improve Time-to-Solution
- Avoid dependency on single resource provider
- Optimize application mapping to target
architecture - But what are the tradeoffs of data migration?
4Our Contributions
- Interaction between grid scheduler and local
scheduler - Architecture distributed, centralized, and ideal
- Real workloads
- Performance metrics
- Job migration overhead
- Superscheduler scalability
- Fault tolerance
- Multi-resource requirements
5Distributed Architecture
Communication Infrastructure
Info
Job
Job
Middleware
Grid Scheduler
Grid Env
Local Env
Local Scheduler
Compute Server
PE PE PE
6Interaction between Grid and Local Schedulers
- AWT Approximate Wait Time
- CRU Current Resource Utilization
- JR Job Requirements
If AWT lt ?
7Sender-Initiated (S-I)
Host
Partner 1
Partner 2
Jobi
Jobi Requirements
Jobi Requirements
ART0 CRU0
ART1 CRU1
ART2 CRU2
Jobi
Resultsi
Select the machine with the smallest Approximate
Response Time (ART), Break tie by CRU ART
Approx Wait Time Estimated Run Time
8Receiver-Initiated (R-I)
Host
Partner 1
Partner 2
Jobi
Free Signal
Free Signal
Jobi Requirements
Jobi Requirements
ART0 CRU0
ART1 CRU1
ART2 CRU2
Jobi
Querying begins after receiving free signal
9Symmetrically-Initiated (Sy-I)
- First, work in R-I mode
- Change to S-I mode if no machines volunteer
- Switch back to R-I after job is scheduled
10Centralized Architecture
Middleware
Grid Scheduler
Advantages Global View Disadvantages Single
point of failure, Scalability
11Performance Metrics
12 Resource Configuration and Site Assignment
ServerID Number of Nodes CPUs per Node CPU Speed Site Locator Site Locator Site Locator
ServerID Number of Nodes CPUs per Node CPU Speed 3 Sites 6 Sites 12 Sites
S1 184 16 375 MHz 0 0 0
S2 305 4 332 MHz 1 1 1
S3 144 8 375 MHz 2 3 2
S4 256 4 600 MHz 1 0 3
S5 32 2 250 MHz 2 2 4
S6 128 4 400 MHz 2 5 5
S7 64 2 250 MHz 2 5 6
S8 144 8 375 MHz 1 2 7
S9 256 4 600 MHz 0 4 8
S10 32 2 250 MHz 0 1 9
S11 128 4 400 MHz 0 3 10
S12 64 2 250 MHz 1 4 11
- Each local site network has peak bandwidth of
800Mb/s (gigabit Ethernet LAN) - External network has 40Mb/s available
point-to-point (high-performance WAN) - Assume all data transfers share network equally
(network contention is modeled) - Assume performance linearly related to CPU speed
- Assume users pre-compiled code for each of the
heterogeneous platforms
13Job Workloads
WorkloadID Time Period(Start-End) ofJobs Avg. InputSize (MB)
W1 03/2002-05/2002 59,623 312.7
W2 03/2002-05/2002 22,941 300.8
W3 03/2002-05/2002 16,295 305.0
W4 03/2002-05/2002 8,291 237.3
W5 03/2002-05/2002 10,543 28.9
W6 03/2002-05/2002 7,591 236.1
W7 03/2002-05/2002 7,251 86.5
W8 09/2002-11/2002 27,063 293.0
W9 09/2002-11/2002 12,666 328.3
W10 09/2002-11/2002 5,236 29.3
W11 09/2002-11/2002 11,804 226.5
W12 09/2002-11/2002 6,911 53.7
- Systems located at Lawrence Berkeley Laboratory,
NASA Ames Research Center,Lawrence Livermore
Laboratory, San Diego Supercomputing Center - Data volume info not available. Assume volume
is correlated to volume of work - B is number if Kbytes of each work unit (CPU
runtime) - Our best estimate is B1Kb for each CPU second
of application execution
14Scheduling Policy
12 Sites Workload B
- Large potential gain using grid superscheduler
- Reduced average wait time by 25X compared with
local scheme! - Sender-Initiated performance comparable to
Centralized - Inverse between migration (FOJM,FDVM) and timing
(NAWT, NART) - Very small fraction of response time spent moving
data (DMOH)
15Data Migration Sensitivity
Sender-I 12 Sites
- NAWT for 100B almost 8X than B, NART 50 higher
- DMOH increases to 28 and 44 for 10B and 100B
respectively - As B increases, data migration (FDVM) decreases
due to increasing overhead - FOJM inconsistent because it measures of jobs
NOT data volume
16Site Number Sensitivity
Sender-I
- 0.1B causes no site sensitivity,
- 10B has noticeable effect as sites decrease from
12 to 3 - Decrease in time (NAWT, NART) due to increase in
network bandwidth - Increase in fraction of data volume migrated
(FDVM) - 40 Increase in fraction of response time moving
data (DMOH)
17Communication ObliviousScheduling
Sender-I
- For B10 If data migration cost is not considered
in scheduling algorithm - NART increases 14X, 40X for 12Sites, 3Sites
respectively - NAWT increases 28X,43X for 12Sites, 3Sites
respectively - DMOH is over 96! (only 3 for B set)
- 16 of all jobs blocked from executing waiting
for data - Compared with practically 0 for
communication-aware scheduling
18Increased WorkloadSensitivity
Sender-I12 Sites Workload B
- Grid scheduling 40 more jobs, compared with
non-grid local scheme - No increase in time NAWT NART
- Weighted Utilization increased from 66 to 93
- However there is fine line, when of jobs
increase by 45 - NAWT grows 3.5X, NART grows 2.4X!
19Conclusions
- Studied impact of data migration, simulating
- Compute servers
- Grouping of serves into sites
- Inter-server networks
- Results showed huge benefits of grid scheduling
- S-I reduced average turnaround time by 60
compared with local approach, even in the
presence of input/output data migration - Algorithm can execute 40 more jobs in grid
environment and deliver same turnaround times as
non-grid scenario - For large data files, critical to consider
migration overhead - 43X increase in NART using communication-oblivious
scheduling
20Future Work
- Superscheduling scalability
- Resource discovery
- Fault tolerance
- Multi-resource requirements
- Architectural heterogeneity
- Practical deployment issues