HardwareSoftware Cosynthesis for Digital Systems - PowerPoint PPT Presentation

1 / 57

About This Presentation

Title:

HardwareSoftware Cosynthesis for Digital Systems

Description:

HardwareSoftware Cosynthesis for Digital Systems – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 58

Provided by: Adm952

Category:

more less

Transcript and Presenter's Notes

Title: HardwareSoftware Cosynthesis for Digital Systems

1
Hardware-Software Co-synthesis for Digital Systems

The study of embedded computing system design
Gupta Micheli,
IEEE Design Test of Computers 10, no 3. (Sept
93)29-41
P5 of HW/SW Co-design Book
Prepared by Dr. Kocan

2
The Problems in HW/SW Co-design

Co-specification
Creating specifications that describe
Hardware elements
Software elements
The relationships between the elements
Co-synthesis
Automatic or semi-automatic design of hardware
and software to meet a specification
Co-simulation
Simultaneous simulation of hardware and software
elements (often at different levels of
abstraction)

3
Co-synthesis Phases

Scheduling choosing times at which computations
occur
Allocation determining the processing elements
(PEs) on which computations occur
Partitioning dividing up the functionality into
units of computation
Mapping choosing particular component types for
the allocated units

These phases are related.
4
HW-SW Co-synthesis for Digital Systems

Embedded System Applications
general-purpose processors ASICs memory
Application-specific the relative timing of
their actions
Real-time embedded systems

5
Challenges in Real-time ESD

Performance estimation
Selection of appropriate parts for system
implementation
Verification of temporal and functional
properties of the system

6
(No Transcript)
7
Synthesis-oriented approach
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Capturing specification of system

Capture system functionality using a HDL, e.g
HardwareC, Verilog, VHDL
HardwareC a programming language with correct
unambiguous hardware modeling
HardwareC description a set of interacting
concurrent processes
A process restarts itself on completion
Nested concurrent sequential operations in the
body

12
Example HDL functionality specification

Two data input operations,
a conditional operation to generate counter seed
z
a while-loop to implement a down-counter
A graph-based representation captures this spec

13
(No Transcript)
14
System model

A system model consists of a set of
hierarchically related sequencing graphs
Vertices represents language-level operations
Edges represent dependencies between operations
Advantages of graph representation
Makes explicit the concurrency inherent in the
input specification
Makes it easier to reason about properties of the
input description
Allow analysis of timing properties of the input
description

15
Graph Model Properties

Sink / source vertices that represent
no-operations
A set of variables defines the shared memory
between operations in the graph
Storage common to the operations
Facilitates communication between operations
Exactly one execution of an operation with
respect to each execution of any other operation
Single Rate Execution of Operations in a graph

16
Multiple Graph Models

Operation across graph models follow multi-rate
execution semantics
Variable numbers of executions of an operation
for an operation in another graph model
Use message-passing primitives (send/receive) to
implement communication across graph models
Specification of inter-model communication made
simple

17
(No Transcript)
18
Modeling Heterogeneous Systems

Use multirate spec
E.g. ASIC and processor run on different speeds
clocks

19
Nondeterministic Delay (ND) operations

Operations to represent synchronization to
external events
E.g. receive() operation
Data-dependent loop operations
ND unknown execution delays
Modeling ND operations is vital for reactive
embedded system descriptions

Double circles for ND ops
20
Many possible implementations per system model

Timing constraints for defining performance
requirements of the desired implementation
Two types of timing constraints
Min/max delay constraint
Execution rate constraint

21
(No Transcript)
22
Timing constraints

Min/max delay constraints
Execution rate constraints
Sufficient to capture constraints needed by most
real-time systems

23
Modeling of delay constraints
rate
Min/max
Edge weight delay of the source operation
Backward edges maximum delay const.
24
Model Analysis

Estimate system performance
Verify the consistency of specified constraints
Performance measures
Estimation of operation delays
Separate estimations for hardware and software
implementations
Based on the processor to run the software
Based on the type of the hardware to be used

25
Processor cost model

Execution delay function for basic set of
processor operations
Memory address calculation function
Memory access time
Processor interruption response time

26
Timing constraint analysis

Can imposed constraints be satisfied for a given
implementation?
Assign appropriate delays to the operations with
known delays in the graph model
CONSTRAINT SATISFIABILITY
Relating structure, actual delay and constraint
values
Some structural properties of graphs may make a
constraint unsatisfiable (ND operations)
Some constraints may be mutually inconsistent
E.g. maximum delay constraint between two
operations that also have a larger minimum delay
constraint
No assignment of nonnegative operation delay
values can satisfy such constraints

27
Presence of ND operations

A timing constraint is satisfiable if it is
satisfied for all possible (and may infinite)
delay values of the ND operations
A timing constraint is marginally satisfiable if
it can be satisfied for all possible values
within the specified bounds on the delay of the
ND operations
Some implementation assumptions (acceptable
bounds on ND operation delays)

28
Timing Analysis by graph analysis

(1) No ND operations in the graph
Edges with finite/known weight
Cant satisfy a min/max delay constraint if a
positive cycle in the graph model exists
(sum of the weights on the cycle is positive)
(2) ND operations exist
Satisfiable if no cycle contains ND operations
Cycle contains ND ops, impossible to determine
the satisfiability of timing cosntraints only
marginal satisfiability can be guaranteed
Cycle breaking by graph transformation

29
Timing analysis

Nonpipelined implementations
Rate constrains can be min/max delay constraints
between corresponding source sink operations of
the graph model
Apply min/max constraint satisfiability criterion
to the analysis of rate constraints

30
Example Rate constraints (graphs with ND ops)

process test(p,)
in port pSIZE
Boolean vINT-SIZE
v read p
while (v gt0)
ltloop-bodygt
vv-1

Rate constraint on read operation Unbounded while
operation ? ND operation
v Boolean array to represent an integer
31

Overall execution time of the while loop
determines
the interval between successive executions of the
read operation
This variable-delay while loop operation
The input rate at port p is variable
Cannot be always guaranteed to meet the required
rate constraint
Ensure marginal satisfiability of rate constraint
by graph transformation and by using a
finite-size bufffer

32
P transformed into fragments Q R
Rate constraint from sink to source
33
Software Implementation of Ex. A

Two threads for each execution of T1, T2
executes v times
Thread T1 Thread T2
read v loop synch
detach ltloop_bodygt
v v-1
detach

34
Process P with ND operation

ND operation due to an unbounded loop
ND operation induces a bipartition of the calling
process P
PF U B
F e.g. read operation
The set of operations in F must be performed
before invoking the loop body
The set of operations in B can only be performed
after completing executions of the loop body
Functional Pipeline F ? B? Loop to improve the
reaction rate of P
Note we assume nonpipelined hardware, therefore
the pipelining done in software

35
Constraint Analysis and Software

Linear execution semantics imposed by software
running on single-processor
Complicates constraint analysis for software
implementation of graph model
Complete order of operations necessary to perform
delay analysis
Complete ordering (may) create unbounded cycles ?
make constraints unsatisfiable

36
Example for Completely Ordering of Operations
37
Communication Ops in SW

Computation ops must be performed serially
Communication ops can proceed concurrently
Overlap execution of ND ops (wait for
synchronization or communication) with some
(unrelated) computation
Requires dynamic software scheduling
Simultaneous active ND operations may complete in
orders that cannot be determined statically

38
Software model a set of fixed-latency concurrent
threads
Delay overheads of dynamic scheduling
39
Thread

A linearized set of operations
May or may not begin with ND operation (indicated
by a circle)
A thread does not contain any ND operation (other
than beginning with one)
The delay of the initial ND operation is part of
the scheduling delay (not included in the latency
of the thread)
Multiple threads avoid complete serialization of
all operations ? may create unbounded cycles
SW model enables checking of marginal
satisfiability of constraints on operations
belonging to different threads
Assume fixed and known delay of scheduling
operations associated with ND operations

40
System Partitioning

system-level partitioning problem
The assignments of operations to hardware or
software
Assignment determines the delay of the ops
Communication overheads due to ASIC or processor
assignment
Min. comm. Delay
Increase ops in SW to increase the processor
utilization
Overall System Performance
The effect of HW/SW partition on the utilization
of processor AND the bandwidth of the bus between
the processor and ASIC hardware.
Devise a partitioning cost function
Sizes of hw/sw parts
Timing behavior capture the timing performance
during the partitioning
Hard to capture timing behavior (use
approximation techs)

41
Hardware/Program partitioning

HW partitioning divide circuits that implement
scheduled operations
Program-level partitioning addresses operations
that are scheduled at runtime
Use statistical timing properties to drive
partitioning algorithms

42
Use of timing properties in partition cost
function
Use deterministic bounds on timing properties
that are incrementally computable in the
partition cost function
43
Characterization of SW

Thread latency (L) execution delay of a program
thread
Thread execution rate (R) the invocation rate of
the thread
Processor utilization PSum (LxR)
Bus utilization (B) total amount of
communication between the HW and SW.
To transfer m variables Bsum rj
rj the inverse of the minimum time interval
between two consecutive samples for variable j
Calculate static bounds on SW performance with
L,R,B,P
Overestimating performance parameters Why ?
Distribution of thread invocations, and
communications based on actual data values

44
Hardware Size, Interface Characterization

Sum of the size estimates of the resources
implementing the operations
Assign ports for communication between HW and SW.
one port per variable
Bus bandwidth captures the overhead of
communication

45
Partitioning a specification into HW and SW
implementations

Given the cost model for software, hardware, and
interface
Given a set of sequencing graph models and timing
constraints between operations, create two sets
of sequencing graph models s. t. one can be
implemented in hardware and the other in software

46
Constraints after Partitioning

Timing constraints are satisfied for the two sets
of graph models
Processor utilization P lt 1
Bus utilization B lt B
A partition cost function
min f(Size_HW,B,P(-1),m)

47
Institutive features of partitioning algorithm

Identify operations can be implemented in
software s.t.
Constraint graph implementation can be satisfied
The resulting software meets rate constraints on
its inputs/outputs.
Initial partition
ND ops of the data dependent loop operations
define the beginning of the program threads
All other operations in HW
Compute the reaction rates of the threads
Maximum reaction rate the inverse of its
latency
Latency of a program thread is computed from the
processor delay cost model and a fixed scheduling
overhead delay
Iterative improvement
Migrate an operation (affects (1) execution
delay, (2) latency, (3) reaction rate of the
thread into which the ops moved)
Compute its effect on processor and bus bandwidth

48
System Synthesis

Synthesize individual HW and SW components
Here generation of interface and software from
partitioned models
We know the program threads
Use coroutine scheme for program generation
Limit all external dependencies to the first and
last statements of the threads to have convex
threads
Concurrency might be reduced!

49
Rate Constraints and Software

The presence of dependencies on ND operations
Sw implementation may not meet the data rate
constraints on its I/O ports
Synchronization-related ND operations
Assign a context-switch delay to the respective
wait operations
Check for marginal satisfiability of timing
constraints
Unbounded loop-related ND operations
Estimate loop index values for marginal
satisfiability analysis

50
Example C

To obtain a deterministic bound on the reaction
rate of the calling thread T1.
Unroll the looping thread by a variable number
program threads
Scheduling overhead per new thread
Dynamic creation of the threads may lead to
violation of processor utilization constraint
Overlap execution of T1 and T2 to ensure marginal
timing constraint satisfiability
Remove wait2 op if T2 does not modify a common
variable use a buffer to maintain the reaction
rate

51
No unbounded-delay operations

Simplify a SW component into one single program
thread and a single data channel
All data transfers are serialized
Disadvantage of the approach no support for
reordering or branching

52
Example D

HW/SW interface
Data queues on each channel
A control FIFO
(holds the thread_ids in the order in which their
input data arrives)
FIFO depth the number of threads of execution
Nonpreemptive, priority-based scheduling with
FIFO control

53
Example E

Actual interconnection schematic between HW and
SW for single data queue
Implement ControlFIFO and associated control
logic as a part of the ASIC or in software

54
Marginal timing satisfiability analysis
The input rate at port p is variable . Cannot
guarantee the reaction rate of T1
55
Hardware-software interface

Data transfer from HW to SW must be explicitly
synchronized
Polling strategy
Accommodation of different rates of execution
among HW and SW components (and due to
unbounded-delay ops)
A dynamic scheduling of different threads of
execution
Use Control FIFO for scheduling
Data items are consumed in the order in which
they are produced

56
Interface Protocol for Graphics Controller
Two threads generates line and circle
coordinates in software Control FIFO hold the
ideas of the threads
57
(No Transcript)

Write a Comment

User Comments (0)