Hardware Estimation in Embedded Systems - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Hardware Estimation in Embedded Systems

Description:

Since the above algo. ... A new algo. ' Resource Use Method' was developed and implemented. ... length then time complexity of above algo. is given by O(n c2) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 24
Provided by: pun41
Category:

less

Transcript and Presenter's Notes

Title: Hardware Estimation in Embedded Systems


1
Hardware Estimation in Embedded Systems
2
Motivation
  • A computation intensive part of application is
    mapped to hardware.
  • Cost of embedding depends on various metrics like
    functional elements used ,memory elements
    ,interconnects and execution time.
  • Keeping the above metrics in mind various
    alternatives in design space have to be explored.

3
Contd
  • Finally the best suited to the end user can be
    synthesised.
  • To make this partition decision a cost estimate
    of various alternatives is required.

4
Objective
  • Given a C application estimate the cost in terms
    of chip area (functional units etc).
  • Characteristics of estimator required
  • Speed
  • Fidelity
  • Accuracy

5
Performance Metrics
  • Data Storage registers (files or distributed),
    functional units (pipelined or non-pipelined),
    size and level of cache.
  • Optimum Clock minimize the idle time of FU which
    in turn will minimize the execution time.

6
Contd
  • Area Datapath, Control Units and Interconnects.
  • Communication Rate and Number of pins depends on
    frequency of external accesses and data width

7
SUIF Compiler
  • SUIF is used, as here different passes are
    implemented as separate programs.
  • Each pass performs analysis or transformation and
    then writes back its output to a file.
  • Makes it easy to reorder or insert new passes.

8
Information Extractor Module
  • It traverses the SUIF tree and generates a
    Control Data Flow Graph (CDFG) and a list of all
    the operations.
  • When the IE encounters a condition (like if,
    while) it inserts a new node in the CDFG and
    assigns a suitable operation type depending upon
    the instruction.

9
Profiler
  • It is used to determine execution frequencies of
    different Basic Blocks to obtain an estimate of
    total execution time.
  • These frequencies are data dependent.

10
Clock Estimation
  • Aim is to minimize the slack or the idle time of
    FUs.
  • To increase accuracy number of occurrences of
    each operation type is taken into account.
  • Firstly clk_min and clk_max are determined based
    on delays of various FUs that will implement
    operations in the behavior.

11
Contd..
  • Then along the critical path frequency of
    occurrence of various operations is obtained.
  • Avg_slack(clk) sum(occur(ti)slack(clk,ti
    ))/sum(occur(ti))
  • An operator with large delay but a small
    occurrence count cannot influence arbitrarily.

12
Execution Time Estimation
  • This estimate helps to eliminate totally
    redundant designs, hence reducing the design
    space of an embedded system.
  • Exectime(B)sum(exectime(bi)freq(bi))
    where exectime(bi)csteps(bi)clk
  • Csteps(nj,ti)max((occur(ti)/num(ti))clk,
    num(ti)p(ti)portus(ti)/(rfp
    memp))

13
Contd..
  • Since the above algo. does not ensure maximum use
    of FUs or of available read, write ports at a
    given instant.
  • A new algo. Resource Use Method was developed
    and implemented.
  • As in List Scheduling a ready list which is a
    set of data independent operations is obtained.

14
Contd
  • While all the operations are not scheduled,
    shecdule_operation is called if read_ports are
    available and FU is available.
  • Priority functiondelay(ti)traffic(ti)/num(ti)
  • Further it is observed that the choice of
    priority is application dependent.

15
Lower Bound Estimate
  • By implementing scheduling algos.
  • For each resource type t, operations of that type
    are grouped into three non-overlapping intervals
    and then a lower bound is computed on the sum of
    lengths of those intervals.
  • The final lower bound is maximum among all
    resource types and all possible groupings of
    operations of that type.

16
Contd..
  • MSAT(v) is defined as the minimum number of steps
    that any schedule of DFG G is going to take after
    the completion of operation v, assuming unlimited
    resources.
  • ASAP is the earliest time step at which v can be
    scheduled to start execution.
  • Critical path length is maxvASAP(v)dv-1

17
Contd
  • Pt number of operations of type t in DFG
  • SA(i,t) operations of type t with ASAP value
    less than or equal to i.
  • Then there are at least Pt-SA(i,t) type t
    operations that cannot be scheduled in the first
    i time steps.
  • SM(j,t) operations of type t with MSAT value
    less than or equal to j.

18
Contd..
  • Then there are at least Pt-SM(i,t) type t
    operations that cannot be scheduled in the last j
    time steps.
  • The three intervals considered are first i
    steps(I1), last j steps ( interval I3) and
    interval (I2) between the two.
  • The length of I2 depends on the minimum number of
    type t operations and the number of resources
    available to for type t.

19
Contd..
  • A lower bound on the completion of any schedule
    is given by max(t,ijltc)h(i,j,t)ij
  • If n is the number of operations in DFG and c is
    the critical path length then time complexity of
    above algo. is given by O(nc2).

20
Area
  • A weighted sum of functional units area with
    number of functional units gives an estimate of
    datapath area.
  • Number of states is given by sum of control steps
    in all the basic blocks.
  • One-hot encoding was used.
  • FPGA was used as target technology for
    implementation of schedule.

21
Contd
  • Number of LUTs required to realize logic for a
    state depends on the number of ways that state
    can be reached.
  • CLB(LUT)(total no. of LUT)/(no. of LUT
    per CLB)

22
Storage Area
  • Mutually exclusive operations of the same type
    can always share an FU, even if they are
    scheduled in the same time step.
  • Based on CDFG an AND-OR representation is
    obtained which makes it convenient to estimate
    the number of registers in a bottom-up manner.

23
Contd..
  • An AND node of the AND-OR tree represents one
    branch of a Conditional Construct (CC) and an OR
    node represents all the branches of a CC.
Write a Comment
User Comments (0)
About PowerShow.com