Title: Estimation and Exploration of Embedded System Designs
1Estimation and Exploration of Embedded System
Designs
- Abhik Roychoudhury
- School of Computing
- National University of Singapore
2A Research Problem in ES
- Embedded Systems typically run an application for
its entire lifetime. - How to profile the application ?
- Given the application profile, how to optimize
the hardware/software for it ? - Answering the second question would typically
involve a search among the various possible
designs. - The chosen design should satisfy
power/performance constraints.
3In this talk
- Design Space Exploration
- Using novel search techniques
- Need estimation for Exploration
- Estimation of Embedded Code
- Performance estimation of seq. pgm.
- Cost of context switch between tasks
4Organization
- Design Space Exploration Problem
- State-of-the-art
- Some new search techniques
- Estimating a design point
- Concluding remarks
5System-on-chip (SoC)
- Computing component integrates many functional
modules into one chip - Processors
- Memory
- Peripherals
- Design a System-on-Chip by using pre-designed
cores (IP cores)
6Parameterized Components
- A component is typically parameterized.
- To design a cache memory, must fix
- Size
- Block size
- Associativity
- Replacement policy
- Writing policy for data caches
7Connecting components
- Constructing a design involves gluing such
components. - Involves instantiating the parameters of each
component. - Often thousands of parameters to consider.
- The instantiated design must satisfy the
requirements in time-power etc.
8Design Space Exploration
- Explore a very large, finite design space
- Each point in this space is an instantiation of
the IP core parameters. - Find the optimal design point
- Multi-criteria optimization
- Performance, Power, Area, Cost ()
- Often consider all Pareto-optimal points
- (t,p) is Pareto optimal if no other point (t,
p) with tgt t and pgt p
9Organization
- Design Space Exploration Problem
- State-of-the-art
- Some possibilities
- Estimating a design point
- Concluding remarks
10Naïve search
Let the parameters be x1,,xk Fix an initial
assignment to x1,,xk For i 1 to k do
Xiopt optimal values of xi assuming other xj
fixed Endfor Return points in X1opt ? ? Xkopt
11Problems
- How to fix the initial assignment ?
- In what order to visit the parameters ?
- Should we iterate the parameter assignment ?
- How to avoid optimizing one parameter at a time
??
12Sensitivity analysis
- Similar to a training mechanism.
- Given a set of reference applications, a full
search is conducted to find the best point. - For each arch. parameter p find how much a change
in p changes the minimum cost. - Construct a total order of the parameters based
on sensitivity.
13Sensitivity based search
Newcost Cost from x1,,xk obtained by
training. Repeat Oldcost Newcost For i
1 to k do // in order of sensitivity
find optimal value of xi assuming other xj
fixed Endfor Newcost Cost of above
assignment Until NewCost OldCost /OldCost
is small enough ! Return NewCost and its
assignment.
14Problems with sensitivity
- Need a set of training applications.
- Must guarantee that your appl. has same
characteristic to training applications. - Full design space search conducted for training
applications. - Multi-criteria Cost function
- Vector or scalar ?
- Need to compute NewCost OldCost
- Still optimizing one dimension at a time.
15Dependence based analysis
- Encode dependence among parameters as a graph.
- Compute Strongly Connected Components (SCC) of
the dependence graph. - If parameters p1,,pk belong to the same SCC,
perform exhaustive search to find optimal values
of p1,,pk - Combine the results from the various SCCs in the
topological order.
16Dependence analysis
p1 p2 p4 p3 p5 p6
p1
p3
p2
- Need dependence info.
- Can return all Pareto-optimal configurations in
each SCC.
p5
p4
p6
17Broad perspective
- Our problem is a multi-objective optimization
problem. - Multi-objective search space typically have many
local minima. - Heuristic based search techniques are converging
to a local minima. - An exhaustive explicit global search is
infeasible.
18Organization
- Design Space Exploration Problem
- State-of-the-art
- Some possibilities
- Estimating a design point
- Concluding remarks
19Possibility 1 Special cases
- Exploit problem structure.
- Convergence to the global optima is guaranteed by
exploiting the structure of the objective
function and space to be traversed. - Example Simplex method.
- Symbolic searches being investigated for
restricted versions of Design Space Exploration
problem - Search over boolean function representations
20Possibility 2
- Employ global searches which return a good approx
of the global optima. - For example
- Each SCC in the parameter dependence graph is
optimized by exhaustive search. - What if most of the parameters are interdependent
? (lot of parameters in a component, say cache) - Off-the-shelf global opt. ? (Genetic Algorithms)
21Possibility 2
- Combine exhaustive and global searches
- For example
- Each SCC in the parameter dependence graph is
optimized by exhaustive search. - Employ global optimization techniques to optimize
parameter configurations across SCCs. - Dependence of parameters is only 0/1 ??
- Off-the-shelf global opt. ? (Genetic Algorithms)
22Possibility 2 Stochastic searches
- Stochastic search techniques can serve as the
global optimization technique. - Do not assume any problem structure.
- Suitable for multi-objective searches with many
local minima (Mutation operator in G.A.) - Can program the next step and stopping rules
- Known to produce good approximations of global
optima after running it for sufficient of
iterations.
23Possibility 3
- Given the application, perform more aggressive
optimization (a larger search space) - Examples
- Optimize the memory organization
- How much of On/off chip SRAM/DRAM
- AND the memory allocation
- Allocating variables to On/off chip SRAM/DRAM
- Subject to upper bounds in Cost ()
24Possibility 3 Combined Opt.
- Another example
- Optimize the functional unit configuration
- How many adders/ multipliers ??
- AND instruction scheduling
- Scheduling of instructions per clock cycle
- Subject to maximum number of adders/multipliers
25Research Issues in Exploration
- Need a combined search over
- Architectural parameters for choosing the system
configuration (how much resource) - Memory configuration
- Available Functional Units
- Applications parameters for choosing resource
allocation. - Data layout
- Instruction Scheduling
26Research Issues in Exploration
- If the optimization problem is of a specific form
employ well-known non-exhaustive searches - Linear Constraints (LP, ILP)
- Boolean Constraints Search over Binary decision
Diagram (BDD) representation Catching up - Otherwise, use stochastic optimization
techniques, possibly with exhaustive searches - Simulated Annealing
- Genetic Algorithms.
27Estimation for Exploration
- Need to estimate the criteria (power,
performance) for each design point traversed. - Involves estimating power/performance of an
application, given the micro-architecture. - Application may involve single/multiple tasks
executed periodically. - Simulation too expensive. Need analytical
techniques as well for estimation. - Let us see a few of them (more rigorous ones)
28Organization
- Design Space Exploration Problem
- State-of-the-art
- Some possibilities
- Estimating a design point
- Concluding remarks
29Timing Analysis Single Task
- Given a program and a micro-architecture,
estimate the worst case execution time (WCET) of
the program on the ?architecture for all possible
inputs - Micro-architecture includes
- Pipeline
- Instruction / Data cache
- Branch Prediction
- Even for task with one input, tight timing
estimation is important if executed periodically
with time bounds.
30Why WCET ?
- Schedulability analysis of hard real time systems
- Example Input to Rate Monotonic Analysis (RMA)
- Execution time should be worst case bound so that
the tasks are guaranteed to be schedulable
irrespective of inputs - Bound should be tight to make the tasks
schedulable and to avoid idle processor cycles - Should use WCET for design space exploration of
safety critical embedded systems
31Estimating WCET Naïve Method
- Given a program, find the worst-case input
- Run the program or simulate the execution of the
program with worst-case input on the
micro-architecture and measure the cycle count
32Finding Worst Case Input
- For programs with input-data independent
execution time, such as matrix multiplication,
computing WCET is trivial (Any input will do) - For others , we can find the worst-case input in
the absence of micro-architectural features
(i.e., unit execution time per instruction) - e.g. worst-case input for insertion sort is the
reversed list - Even this is impossible for non-trivial programs
33Finding Worst Case Input
- Worst-case input changes as micro-architectural
features are taken into consideration, i.e., with
variable execution time for instructions - e.g. For an implementation of insertion sort,
with particular speculation scheme, worst-case
input is lt1,100,99,,3,2gt - Finding worst-case input in the presence of
micro-architectural features is impossible - Solution static analysis techniques
34WCET Analysis
- Two steps
- Program path analysis
- Micro-architectural modeling
- Dependency between the two steps
- Program path analysis requires the results of
micro-architectural analysis - Micro-architectural analysis performs better with
the knowledge about infeasible paths - Integer Linear Program (ILP) formulation can
integrate the two steps
35Timing Analysis Many tasks
- Multiple tasks may interfere with each other via
shared micro-architecture. - If they share a cache
- Lower priority task pre-empted.
- Delay introduced in clearing cache lines used by
higher priority task, once lower priority task
resumes. - Need to consider this Cache Related Pre-emption
delay for schedulability / exploration.
36Timing Analysis Many Tasks
- Multiple tasks may be executed on different
processors. - Need to extend timing model
- Computation communication costs.
- Estimating communication costs may be involved
- Serialization of communication (via a bus) leads
to additional delays. - These issues need to be studied for Power
Estimation as well.
37Research Issues in Estimation
- Combined modeling of micro-architectural features
- Instruction Cache
- Speculation
- Precise Modeling of Context switch delays
- Cache related Pre-emption delay
- Designing more predictable architectures
- Cache Locking
- Making code optimizations WCET aware
- Code Layout
38Organization
- Design Space Exploration Problem
- State-of-the-art
- Some possibilities
- Estimating a design point
- Concluding remarks
39Remarks
- Design space exploration is essential step in ES
design. - Lots of work in design point estimation more
needed. - Little work in search techniques
- Using off-the-shelf searches in other areas
- Even programming searches
40Remark
- Need to proceed with caution
- Combining the architectural/ appl. Parameters for
more aggressive design pt. Opt. more suitable for
CS people. - Can use off-the-shelf symbolic/stochastic
searches at first. - Later work can involve developing stochastic
search techniques for arbitrary search problems