Title: A New Approach for Task Level Computational Resource Bi-Partitioning
1A New Approach for Task Level Computational
Resource Bi-Partitioning
- Gang Wang, Wenrui Gong, Ryan Kastner Express Lab,
Dept. of ECE, - University of California, Santa Barbara
2Overview
- Resource Partitioning Problem
- Ant System (AS) Heuristic
- AS for Task Level Resource Partitioning
- Experiment Results
- Future Work
3Resource Partitioning Problem(1)
- Heterogeneous architecture is getting more and
more popular - Partitioning problem is a fundamental challenge
- Automatically assign application onto different
computation resources - Optimizing system performance under constraints
- Two resource case hardware/software co-design
4Resource Partitioning Problem(2)
- NP-hard
- Different heuristic methods have been developed
- Simulated annealing
- Genetic Algorithms
- Tabu Search
- Expert System
- Kernighan/Lin
5Overview
- Resource Partitioning Problem
- Ant System (AS) Heuristic
- AS for Task Level Resource Partitioning
- Experiment Results
- Future Work
6Ant System Heuristic (1)
- First introduced for optimization problems by
Dorigo et. al. 1996 - Inspired by ethological study on the behavior of
ants Goss et. al. 1989 - A meta heuristic
- A multi-agent cooperative searching method
- A new way for combining global/local heuristics
7Ant System Heuristic (2)
8Ant System Heuristic (2)
9Ant System Heuristic (2)
10Ant System Heuristic (2)
11Ant System Heuristic (2)
12Ant System Heuristic (2)
13Ant System Heuristic (2)
14Ant System Heuristic (2)
15Ant System Heuristic (2)
16Key Observations
- Autocatalytic effect
- Indirect communication (stigmergy)
- Ants deposit pheromones on the ground
- different the quality of the paths
- Pheromone trails encode a long-term global memory
about the search process - When the ants reach a decision, they are biased
by the amount of pheromone (maybe
probabilistically )
17Overview
- Resource Partitioning Problem
- Ant System (AS) Heuristic
- AS for Task Level Resource Partitioning
- Experiment Results
- Future Work
18AS Algorithm for HW/SW Co-Design
- Problem For a given application, find the
optimal resource partition under certain system
constraints - Task level abstraction
- Task can map to GPP or Configurable Logic
- Pre-knowledge about the computational resources
19Modeling the Task/Resource Partitioning Problem
- Application is modeled as Task Graph (DAG)
- Sequential scheduling (not pipelined)
20Partitioning as Graph Bi-coloring
- Task 1, 2, 7 and 8 are assigned to the GPP
- Task 3, 4, and 6 onto the configurable logic
- The inbound edges are colored accordingly
- We dont care the coloring for virtual nodes t0
and tn - We dont care the coloring for edge e8n
21Partitioning as Graph Bi-coloring
- Each computing resource is assigned with a color
ck - Each edge eij is associated with a set of global
heuristics (pheromone trails) ?ij(k) indicating
the favorableness for tj to be colored with ck - A coherent coloring is defined as
- Each task node in the DAG is colored
- All the inbound edges of a task node have the
same coloring as that of the corresponding task
node
22AS algorithm for resource partitioning (1)
- Initially, assign each of the edges in the task
graph with a fixed pheromone ?0 for both color c1
and c2, where c1 corresponds to GPP, while c2 for
the configurable logic - Put m ants on t0
- Each ant traverses the task graph to create a
feasible bi-coloring solution si for the task
graph, where i 1, . . . ,m - Evaluate all the m solutions. The quality of the
solution s is measured by the overall execution
time time(s). Among all solutions, find the best
solution sbest which provides the minimum
execution time and satisfies the configurable
logic area constraint
23AS algorithm for resource partitioning (2)
- Update the pheromone for each color on the edges
as follows - ?ij(k) ? (1 - ?)?ij(k) ??ij(k)
(1) - where
- 0 lt ? lt 1 is the evaporation ratio, escape from
local minima - k 1 or 2,
- ??ij(k) Q/time(sbest ) if eij is colored with
ck in sbest - 0 otherwise
- If the ending condition is reached, stop and
report the best solution found. Otherwise go to
step 2.
24Step 3 How to construct individual coloring
- Each ant traverses the graph in topologically
sorted order - Guarantees that each inbound edge to the current
node has been already examined - At each node, the ant will
- Make guesses for the coloring of the successor
nodes - Make decision on the coloring of the current node
25Make guesses for the successor task nodes
- At task node ti, the ant makes guesses the
coloring for each of the successor nodes tj - ?ij(k) global heuristic on coloring tj with ck
- ?j(k) local heuristic on coloring tj with ck
26Make decision on the coloring of the current node
- Upon entering a new task node ti, the ant makes a
decision on the coloring of ti - probabilistically based on the guesses made by
all the immediate precedents of ti - Inbound edges are correspondingly colored once
this decision is made
27t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
28t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
29t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
30t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
31t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
32t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
33t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
34t
t
1
0
t
t
2
3
t
t
4
5
t
6
t
7
t
t
8
n
35Find the best and update the pheromone trails
based on the solutions quality
36Extensibility
- Easy to extend to multi-way partitioning
- Different performance/constraint pair
- Different task level cost model
37Overview
- Resource Partitioning Problem
- Ant System (AS) Heuristic
- AS for Task Level Resource Partitioning
- Experiment Results
- Future Work
38Experiment System (1)
- Target system contains
- One GPP ( PowerPC 405 RISC)
- One configurable logic (Xilinx Virtex II with
1232 CLBs) - Sequential scheduling
- Precedence level has to be respected
- Tasks without precedence constraint can run
concurrently given the resource partitioning
allows
39Experiment System (2)
- Testing benchmark
- DAGs of different sizes are generated randomly
with average branching factor of 5 - Real functions (in C/C) extracted from the
MediaBench suits are mapped onto the task nodes - Tasks are analyzed using SUIF and Machine SUIF
tools to achieve detailed CDFG level description - Simplified communication interface between tasks
- Goal Find the optimal resource partition that
achieves the best worst case execution time under
FPGA area constraint
40Evaluating AS algorithm
- Compare the AS results with
- Brute force search
- Offers definitive measurement for the quality
- Theoretical performance for Random Sampling
- Helps to filter out EASY test cases
- Stimulated annealing
- Popularly used
- Allow much bigger problem size
41Experiment Settings
- Each DAG has 25 task nodes, over 33 million
possible assignments! - 50 testing instances are generated originally
- After filtering out the easy cases using the
brute force search, 25 difficult testing cases
left - Number of ants is set to 5, which equals to the
average branching factor of the task graph - Force AS algorithm stop after 100 iterations in
each run
42Typical ant search run
43Result Quality Assessment (I)
- 91.7 of the results are within the top 3
- 77 of the results of AS are within the top 2
- 63.5 of the results are within top 0.1
44Result Quality Assessment (II)
- The absolute performance of the majority of the
results found by AS are within 10 range
comparing with the optimal
45Result Quality Assessment (III)
- The ability for finding one of the optimal
partitions - 460 times for 2,500 instances (18.4)
- While random sampling approach with the same
computation time only has a chance of 8.5E-7 - For significant portion (gt20) of the tested
examples, AS discovers the optimal partition with
probability gt1/2
46Result Quality Assessment (IV)Multi-way SA
- Extended to the 3-way partitioning problem
- 33 difficult testing cases
- 325 possible partitions
- SA-50 has comparable run time as the AS
- SA-500 and SA-1000 runs at 10 and 20 times
47Contributions
- For the first time, introduced AS heuristic for
HW/SW co-design problem - Constructed a novel AS algorithm that achieved
robust results that are qualitatively close to
the optimal with minor computational cost for the
testing benchmark - Provided definitive quality assessment by
comparing the proposed algorithm with the
theoretical random sampling results - Experiments shows the proposed algorithm
surpasses popularly used SA heuristic
48Future work
- Extend to the multi-way resource partitioning
problem ? - More comprehensive comparison with other
heuristic methods (such as GA, Tabu) ? - Hybrid approach (e.g. AS followed by SA) ?
- Applying to more realistic and complex system
model, e.g. more realistic communication model - Extend AS from static partitioning to dynamic
partitioning problem ( truly reconfigurable)
Thanks the your attention. Questions?