Title: HC: Goals and Problems
1- An Introduction to Research Issues in
Heterogeneous Parallel and Distributed Computing - H. J. Siegel
- Professor of Electrical and Computer Engineering
- and Professor of Computer Science
- Colorado State University
- Fort Collins, Colorado, 80523 USA
- HJ_at_ColoState.edu
- Outline
- 1. Anecdote
- 2. Goal of Heterogeneous Computing
- 3. Mixed-Machine Heterogeneous Computing
Environment - 4. Model of Automatic Heterogeneous Computing
- 5. Example Resource Allocation Research
- 6. Open Problems
- 7. Alligators
2 Goal of Heterogeneous Computing
- heterogeneous computing system has varied
computational capabilities - numerous applications of various types are to be
executed - each application consists of tasks with
differentcomputational requirements - match each task to appropriate componentof the
heterogeneous computing system - goal optimize some performance criterion
- ex. minimize execution time of set of
applications
3 Goal of Heterogeneous Computing - Example
- hypothetical example - application with four
tasks - based on total time 100 units on a workstation
example on workstation
4 Goal of Heterogeneous Computing - Example
- hypothetical example - application with four
tasks - based on total time 100 units on a workstation
example on workstation
25 distr. mem. multi-proc.
30distr. shared mem. machine
10small, shared mem. proc.
35 largecluster
5 Goal of Heterogeneous Computing - Example
- hypothetical example - application with four
tasks - based on total time 100 units on a workstation
example on workstation
25 distr. mem. multi-proc.
30distr. shared mem. machine
10small, shared mem. proc.
35 largecluster
execution on a large cluster
20 20 0.3 8
about 2? faster
6 Goal of Heterogeneous Computing - Example
- hypothetical example - application with four
tasks - based on total time 100 units on a workstation
- heterogeneous suite time includes any needed
inter-machine communication (not drawn to scale) - need workload to keep all machines reasonably busy
example on workstation
25 distr. mem. multi-proc.
30distr. shared mem. machine
10small, shared mem. proc.
35 largecluster
execution on a heterogeneous suite
execution on a large cluster
20 20 0.3 8
about 2? faster
7Outline
- 1. Anecdote
- 2. Goal of Heterogeneous Computing
- 3. Mixed-Machine Heterogeneous Computing
Environment - 4. Model of Automatic Heterogeneous Computing
- 5. Example Matching and Scheduling Research
- 6. Open Problems
- 7. Alligators
8Heterogeneous Computing Systems
- mixed-machine heterogeneous computing (HC) system
- network of different machines
- also known as
- heterogeneous parallel computing
- heterogeneous distributed computing
- heterogeneous multicomputer
- presentation also applies to a cluster of
different types (or different ages) of machines - presentation also applies to some kinds of grid
computing environments - two machines are heterogeneous if their
- performance differs on any given task
- differences could result from differences in
- CPU clock speed, instruction set,
- memory size, speed, organization,
- operating system type, version,
- ...
9Using HC Systems
- each application to be executedcomposed of one
or more tasks - ideally, each task computationally homogeneous
- different tasks may have different computational
needs - there may be inter-task communication
- each task must be assigned (matched) to a machine
- execution of tasks and inter-task communication
must be ordered (scheduled) - mapping matching scheduling
- also called resource allocation or resource
management - mapping attempts to optimize a performance metric
- in general, known NP-complete problem
- use heuristic to find near-optimal solutions
10Mapping Tasks to Machines in an HC System
- map tasks to machines considering
- quality of match (computational requirements to
machine capabilities exploit heterogeneity) - inter-machine communication overhead (code,
initial and generated data) - concurrent use of multiple machines when
appropriate - estimated machine and network availability
- inter-task precedence constraints on scheduling
- typically, user must
- decompose application into tasks
- match tasks to machines
- schedule execution order of tasks
- schedule inter-machine data transfers
11Example of Use of Mixed-Machine HC System
- simulation of turbulent convection atMinnesota
Supercomputer Center - calculation of velocity and temperature fields
CM5 - calculation of particle traces Cray 2
- calculation of particle statistics CM-200
- visualization SGI VGX
12Outline
- 1. Anecdote
- 2. Goal of Heterogeneous Computing
- 3. Mixed-Machine Heterogeneous Computing
Environment - 4. Model of Automatic Heterogeneous Computing
- 5. Example Matching and Scheduling Research
- 6. Open Problems
- 7. Alligators
13User Specified Versus Automatic Mappings
- some tools exist to help user map an application
- long term goal
- automatic decomposition,matching, and scheduling
- encourage, facilitate, and improveperformance of
HC system use
14 Conceptual Model of Automatic HC
generation of parametersrelevant to both
applications and machines
machines in suite
applications
categories for machine capabilities
categories for computational needs
15 Conceptual Model of Automatic HC
generation of parametersrelevant to both
applications and machines
machines in suite
applications
categories for machine capabilities
categories for computational needs
task profiling fordecomposing applications
characteristics of each app. task
16 Conceptual Model of Automatic HC
analyticalbenchmarking for machines
characteristics of each app. task
machine characteristics,inter-machine comm.
17 Conceptual Model of Automatic HC
generation of parametersrelevant to both
applications and machines
machines in suite
applications
categories for machine capabilities
categories for computational needs
perf. measure, initial status of machines, network
task profiling fordecomposing applications
analyticalbenchmarking for machines
static resource allocation
characteristics of each app. task
machine characteristics,inter-machine comm.
matching of tasks to machines, execution
schedule
execution
18 Conceptual Model of Automatic HC
generation of parametersrelevant to both
applications and machines
machines in suite
applications
categories for machine capabilities
categories for computational needs
task profiling fordecomposing applications
analyticalbenchmarking for machines
static and dynamic resource allocation
characteristics of each app. task
machine characteristics,inter-machine comm.
matching of tasks to machines, execution
schedule
status of machines, network, and workload
monitor
19 Conceptual Model of Automatic HC
generation of parametersrelevant to both
applications and machines
machines in suite
applications
categories for machine capabilities
categories for computational needs
task profiling fordecomposing applications
analyticalbenchmarking for machines
static and dynamic resource allocation
characteristics of each app. task
machine characteristics,inter-machine comm.
matching of tasks to machines, execution
schedule
status of machines, network, and workload
monitor
20Outline
- 1. Anecdote
- 2. Goal of Heterogeneous Computing
- 3. Mixed-Machine Heterogeneous Computing
Environment - 4. Model of Automatic Heterogeneous Computing
- 5. Example Matching and Scheduling Research
- 6. Open Problems
- 7. Alligators
21Static Mapping in Ad Hoc Grids
- ad hoc grid
- heterogeneous computing system consisting of
different mobile deviceswith wireless
communication - group of individuals with mobile computing
devices - application consisting of numerous communicating
subtasks - extensive computation and communication among ad
hoc grid components - total battery energy available for each machine
limited - often in difficult environments - examples
- disaster management
- wildfire fighting
- defense operations
22Simplified Wildfire Fighting Example
23Problem Statement
- map the S communicating subtasks of the
application task to the M machines in the ad hoc
grid - constraints
- all subtasks of the application must be executed
- must complete application in ? ? seconds
- battery capacity constraint for each machine
- wall clock time for mapper itself to execute 60
minutes - goal
- design resource allocation (mapping) heuristics
- assign the communicating subtasks of the
application task to the machines in ad hoc grid - performance metricminimize the average over
all machines of thepercentage of the energy
consumed
24Energy Model for Computation
- two classes of machines fast machines slow
machines - initial (maximum) battery capacity on machine j
B(j) - B(j) for fast machines 580 energy units
- B(j) for slow machines 58 energy units
- estimated time to compute subtask i on machine j
ETC(i, j) - each machine has a unique ETC value for each
subtask - differ among fast machines
- differ among slow machines
- rate at which machine j consumes energy
for subtask execution, per
ETC time unit E(j) - E(j) for fast machines 0.1 energy units per
second - E(j) for slow machines 0.001 energy units per
second - energy consumed for executing subtask i on
machine j ETC(i, j) ? E(j)
25Energy Model for Communication
- communication bandwidth for machine j BW(j)
- BW(j) for fast machines 8 megabits per second
- BW(j) for slow machines 4 megabits per second
- CMT(j,k) per bit time to transfer data item
from machine j to machine k - CMT(j,k) 1 / min (BW(j), BW(k))
- rate at which machine j consumes energy for
transmitting subtask output, per communication
time unit C(j) - C(j) for fast machines 0.2 energy units per
second - C(j) for slow machines 0.002 energy units per
second - energy consumed to send data item of size gfrom
machine j to machine k - CMT(j,k) ? g ? C(j)
26Assumptions
- each machine can communicate while computing
- ignore energy consumed by subtask to receive data
item - ignore energy consumed by machine when idle
27Performance Metric
- total battery energy consumed by machine j after
entire task completed EC(j) - recall
- B(j) is maximum battery capacity on machine j
- M is number of machines
- goal
- minimize Bpavg
- complete application must execute in ? seconds
- obey battery capacity constraint for each machine
- wall clock time for mapper 60 minutes
28Simulation Setup
- each application task composed of S 1,024
subtasks - data dependencies among subtasks represented by
a random directed acyclic graph (DAG) - 10 different DAGs generated for this study
- sizes of transferred data items sampled from a
Gamma distribution - two classes of eight machines
- fast machines (machines 0 to 3)
- slow machines (machines 4 to 7)
- 10 different ETC matrices generated
- used coefficient of variation (COV) method
- 100 different scenarios
- each scenario combination of DAG and ETC matrix
- time constraint ? for all subtasks in the
application task to execute 34,075
seconds
29Heuristics Overview
- six static mapping schemes studied in this
research - Levelized Weight Tuning, Bottoms Up, Min-Min,
Genetic Algorithm, A, and Simplified Lagrangian - makespan defined as overall execution time of
entire application task on machines in ad hoc
grid - for final mapping of all six heuristics
- energy constraint
- B(j) not exceeded for any machine
- time constraint
- execution time (makespan) of application does
not exceed ?
30Levelized Weight Tuning (LWT) Assigning Levels
- all subtasks assigned levels depending on
precedence constraints - lowest level consists of subtasks with no
predecessors - highest level consists of subtasks with no
successors - each of rest of the subtasks at one level below
lowest producer of its global data items - example
-
31Levelized Weight Tuning (LWT) Procedure
- within each level, list subtasks in descending
order based on total size of output data items - let ? (current level number 1) / (total
number of levels) - for each level from lowest to highest for
subtask Sj in level - F is a weighting factor experimentally determined
- ? ratio of partial makespan to ?
- if ? gt (? F) assign subtask Sj to machine
that increases current makespan by least amount - else map subtask to machine that increases
current Bpavg by least amount - update time and energy availability across
machines - repeat steps 2 to 4 until all subtasks mapped
-
- F varied from 1 to 2 in steps of 0.1 for each
complete mapping for each scenario keep best
value of Bpavg - average value of F was 1.6
32Bottoms Up (BU) Fitness Value
- based on Min-Min (greedy) concept Ibarra, 1977
- subtasks in DAG mapped bottom up from child to
parent - from the highest level to lowest level
- mappable subtasks successors mapped
- normalized time for subtask i on machine j
- fitness value ( ? ? NT(i, j) ) ( (1 ? ?) ?
NE(i, j) ) - weighting factor ? varied from 0 to 1 in steps
of 0.1 for each complete mapping for each
scenario - value of 0.5 gave best value of Bpavg in all
scenarios
33Bottoms Up (BU) Procedure
- list all mappable subtasks (successors already
mapped) - for each mappable subtask in subtask list,while
ignoring other subtasks in list - find machine that gives the subtask its minimum
fitness value - among all subtask/machine pairs found from
(2),find pair that gives minimum fitness value - ties broken arbitrarily
- assign subtask to its paired machine
- remove that subtask from mappable subtask list
- update time and energy availability for that
machine - repeat steps 1 to 6 until all subtasks are
assigned machines - subtasks scheduled to execute in reverse order
they were assigned to machines
34Lower Bound
- lower bound (LB) on Bpavg
- Bpavg for optimal mapping ? this lower bound
- ignores data precedence constraints,
inter-machine communications, battery power
constraint, and ? - for each subtask in any random order
- find minimum percentage energy consumed over all
machines to execute the subtask - sum above values for all subtasks and average
them across all machines - LB recall
35Simulation Results Bpavg Exec. Times (sec)
results averaged over 100 scenarios
36Summary of Static Mapping in Ad Hoc Grids
- designed, evaluated, and compared six static
heuristics for an ad hoc grid environment to
minimize average energy consumed across all
machines - Genetic Algorithm performed the best only 14
greater than unattainable lower bound - Levelized Weight Tuning and Bottoms Up performed
comparably and did second best - Genetic Algorithm used Levelized Weight Tuning,
Bottoms Up, and Min-Min as seeds - on average performed 3 to 4 better than seeds
- heuristic execution time very large for Genetic
Algorithm relative to Levelized Weight Tuning
and Bottoms Up - Levelized Weight Tuning and Bottoms Up good
choice for given type of problem
37Outline
- 1. Anecdote
- 2. Goal of Heterogeneous Computing
- 3. Mixed-Machine Heterogeneous Computing
Environment - 4. Model of Automatic Heterogeneous Computing
- 5. Example Matching and Scheduling Research
- 6. Open Problems
- 7. Alligators
38 Open Problems for HC Mappers
- conceptual model for automaticheterogeneous
computing - generation of parameters relevant to
bothapplication domain and machines - task profiling of application
- analytical benchmarking of machines
- mapping matching and scheduling
- handling uncertainty in estimated system
parameter values - evaluating impact of uncertainty on performance
of mapper - incorporating uncertainty robustness in mapper
- allowing redundant computations on different
machines - incorporate power consumption issues
- deriving standard set of benchmark applications
- minimizing dollar cost of set of machines to
meet performance constraints
39 Open Problems for HC System Software
- machine-independent languageswith user-specified
directives to - allow compilation into efficientcode for any
machine in suite - aid in decomposing application into homogeneous
tasks - facilitate determination of task computational
needs - interface with machine-dependent libraries
- operating system interfaces to support
heterogeneous computing and inter-task
communications - local (machines) and global (network)
- interactive applications
- debugging and performance tuning
- programming tools and environments
- visualization tools
40 Open Problems for HC Network Issues
- inter-machine data transport
- hardware support needed
- software protocols needed
- network topology
- computing minimum path between two machines
- rerouting in case of faults or heavy loads
- modeling the sharing of links and bandwidth among
tasks
41 Open Problems for HC QoS Requirements
- static and dynamic mappers for applications when
- system is overloaded
- applications have
- deadlines (soft and hard)
- priority levels with relative weightings
- multiple versions
- different computational needs
- different quality results
- different worths to users (e.g., 2nd choice only
25) - security and other application dependent QoS
requirements - performance measure is sum of priority weights
of tasks that meet deadlines, degraded if - lesser version used
- soft deadline not met
- partial QoS received
42 Open Problems for HC Dynamic Issues
- machine and network loading and status
information(dynamic mapping) - how to measure non-intrusively
- how often to take new measurements
- how to communicate and update information
- how to incorporate effectively into mappings
- how to estimate task/transfer completion time
- methods for dynamic task migration at execution
time(dynamic mapping) - how to checkpoint and move an executing
taskbetween different types of machines - how and when to use task migration for load
balancing - how to use task migration for fault tolerance
43 Open Problems for HC Paradigms
- mapping different classes of applications
- execute once (ex. compress an image)
- execute continuously (ex. monitor inputs from
sensors and control actuators) - subtasks communication pattern not represented
by DAG - ex. co-routines
- multi-tasking on each machine
- how to estimate task computation time
- how to model sharing of machine I/O
- machines that are not under complete control of
mapper - scalability
- centralized versus distributed implementations
of mappers - hierarchically structured mappers
44 Reference - Automatic HC and Open Problems
- Heterogeneous Computing Goals, Methods, and
Open Problems - by T. D. Braun, H. J. Siegel, and A. A.
Maciejewski - 8th International Conference on High
PerformanceComputing (HiPC 2001), Dec. 2001 - one of the keynote presentations
45 Reference - Static Mapping in an Ad Hoc Grid
- Static Mapping of Subtasks in a Heterogeneous
Ad Hoc Grid Environment - by S. Shivle, R. Castain, H. J. Siegel, A. A.
Maciejewski, T. Banka, K. K. Chindam, S.
Dusinger, P. K. Pichumani, P. M. Satyasekaran,
W. W. Saylor, D. Sendek, J. C. Sousa, J.
Sridharan, P. V. Sugavanam, J. A. Velazco - 13th Heterogeneous Computing Workshop (HCW
2004),Apr. 2004 - in the IEEE Computer Society proceedings of the
18th International Parallel and Distributed
Processing Symposium (IPDPS 2004)
46Outline
- 1. Anecdote
- 2. Goal of Heterogeneous Computing
- 3. Mixed-Machine Heterogeneous Computing
Environment - 4. Model of Automatic Heterogeneous Computing
- 5. Example Matching and Scheduling Research
- 6. Open Problems
- 7. Alligators
47 Concluding Remarks
- heterogeneous parallel and distributed computing
is an important research area, including clusters
and grids - presented brief introduction to heterogeneous
computing - showed model of automatic heterogeneous computing
- gave example of heterogeneous computing mapping
research - discussed some open problems in the field
- please see our papers listed as references for
more information and references to other relevant
research