A PerformanceOriented HardwareSoftware Partitioning for Datapath Applications - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

A PerformanceOriented HardwareSoftware Partitioning for Datapath Applications

Description:

Fabio Salice Politecnico di Milano. In collaboration with the European Technology Center, Altera Corporation, High Wycombe, UK ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 26
Provided by: hsienhs
Category:

less

Transcript and Presenter's Notes

Title: A PerformanceOriented HardwareSoftware Partitioning for Datapath Applications


1
A Performance-Oriented Hardware/Software
Partitioningfor Datapath Applications
  • Laura Frigerio Politecnico di Milano
  • Fabio Salice Politecnico di Milano

In collaboration with the European Technology
Center, Altera Corporation, High Wycombe, UK
2
Outline
  • Introduction
  • Overview of the proposed approach
  • Used formalism
  • Timed Petri Net Model
  • Performance Evaluation
  • Exploration of the solution space
  • Experimental results

3
Introduction
  • Embedded system design challenges
  • Increasing complexity
  • Conflicting requirements timing, area,
    flexibility, time to market,
  • Need for methodologies and tools to support the
    design choices
  • Datapath applications, where dataflow elaboration
    dominates over the control flow constructs, are
    gaining increasingly popularity in embedded
    system design
  • DSP applications
  • Packet processing applications
  • Meet strict timing constraints without
    sacrificing too much the flexibility and at a
    reasonable cost.

4
Proposed approach
To manage the exploration of the solution space
YCHART approach
Application Modelling
Architecture Modelling
Reference Platform
Mapping
Performance Analysis
Timed Petri Net
Decision
Bounds
Branch and Bound
5
Proposed approach
  • Timed Petri Net to model the application-architect
    ure mapping
  • Petri Nets are a simple and powerful graphical
    and mathematical tool for system modelling
  • Petri Net allows to represent concurrent,
    asynchronous, distributed, parallel systems
  • Suitable for HW/SW systems
  • Timed Petri Net allows to evaluate the system
    performance
  • The mathematical formalism allows to extract
    properties of the system that can be used to
    automatically explore the solution space

6
Formalism
  • A Petri Net is defined by places, transitions,
    arcs, weight function and initial marking
  • The dynamic behavior of a PN is described in
    terms of two rules the enabling rule and the
    firing rule.

The incidence matrix A atp is a nxm matrix of
integers (m places, n transitions) where atp
atp - atp- , atp w(t,p), atp- w(p,t) A
-1 -1 1 1 An integer solution y to Ay 0 is
called S-invariant. An integer solution of ATx
0 is called T-invariant.
e
In Timed Petri Net, each transition is associated
with a time.
7
Application modeling
  • Datapath applications are dominated by dataflow
    behavior, with few control-flow constructs.
  • Decomposed in distinct tasks at a coarse level of
    granularity (called functions)
  • The tasks are computation intensive and
    internally strongly interconnected.
  • They have iterative nature repeatedly execute
    over different sets of input data
  • Each independent chunk of input data is referred
    as data unit.
  • Data dependent based task graph at coarse
    granularity.

8
Architecture modeling
  • The architecture is composed of executors (called
    resources)
  • Processors
  • Hardware modules.
  • There can exist multiple instances of the same
    executor, in order to satisfy the performance
    requirements
  • The availability is the number of instances of an
    executor

9
Mapping
  • A mapping (g) associates application functions
    (F) to architecture resources (R).
  • g F ? R
  • The execution of a function Fi on a resource Rj
    requires a certain execution time eij .
  • Values eij are known if the design process is
    based on IPs (Intellectual Property) or can be
    estimated on the basis of previous and similar
    implementations.
  • A Timed Petri Net is used to model the mapping

10
Timed Petri Nets for the mapping
  • F-Place represents a function
  • R-Place represents a resource
  • The initial marking is the availability
  • Q-Place represents a queue
  • Transitions are annotated with the timing

11
Timed Petri Nets for the mapping
  • Pipelined resource
  • Execution time eij total time to execute the
    function.
  • Stage time sij rate at which the input data can
    be accepted (usually equal to one clock cycle).

12
Timed Petri Nets for the mapping
  • Limit on the number of data units that can be
    processed simultaneously.
  • It is explicit or implicit depending on the
    platforms
  • P-Place having as initial marking a number of
    tokens equal to the maximum number of data units
    allowed in the system.
  • In case the communication introduces substantial
    overheads, it can be modeled using the same
    framework
  • for example, a data transfer becomes a function
    and a bus becomes a resource

13
Performance Evaluation
  • The Petri Net model is consistent (? xgt0 ATx0)
  • The minimum cycle time of the net can be computed
    as
  • Over the set of all the S-Invariants (solutions
    of equation Ay 0) with D diagonal matrix of
    times associated to transitions, M0 initial
    marking
  • S-invariants are the rows of Bf

A11 is a non singular rxr matrix, with r the rank
of A
Partition of A
14
Performance Evaluation
F-Places and Q-Places
R-Places and P-Place
15
Performance Evaluation
  • There are m - r S-invariants
  • m - r - 1 corresponding to the m - r - 1
    R-places in the system. Each vector has elements
    equal to 1 for the R-Place and for the F-Places
    using that resource (other elements are equal to
    0).
  • One S-invariant corresponding to the P-Place.
    This vector has elements equal to 1 for the
    P-Place and all the F-Places and Q-Places in the
    system (other elements are equal to 0).
  • Intuitively, the minimum cycle is related to the
    processing time required by the resources to
    process a data unit.
  • Resources that execute functions requiring long
    processing time are more likely to influence the
    minimum cycle time.
  • A long computational path, even if supported by
    several resources, can affect the system
    performance.

16
Performance Evaluation
  • Considering a semantical interpretation, the
    minimum cycle can be expressed as
  • M0(Rj) is equal to the marking of the place
    associated to Rj
  • If the resource is pipelined we consider time
    sij instead of eij when computing rlj

v functions z resources
17
Resources selection algorithm (hw/sw partitioning)
  • The exploration of the solution space is
    automated considering the previous equations
    given a throughput constraint
  • v functions, z resources ? zv alternatives
  • Coarse grain (10-20 functions)
  • Not all the resources can execute all the
    functions
  • Real alternatives ltlt zv
  • Branch and bound algorithm

18
Exploration of the solution space
  • Algorithm Branch and Bound approach
  • At each level a new function is assigned to the
    available resources (branching operation)
  • The bounds provided by the semantic framework are
    evaluated (bounding operation)
  • The generated tree is pruned according to the
    result of the bounding

F1uP
F1HW1
F2uP
F2HW2
F2HW1
F2uP
At each node Kill the branch if Required
throughput gt 1/
19
Experimental results
  • Packet processing application performing an IP
    (Internet protocol) packet forwarding function
  • The system receives a MAC (Medium Access Control)
    input packet, verifies that the packet is valid,
    modifies some packet fields, computes the
    destination MAC address and issues the packet
  • Reference platform HW/SW architecture for
    datapath applications developed by Altera
  • Two phases
  • verify the suitability of the description of a
    system with the presented Timed Petri Net
    approach.
  • application of the algorithm for the solution
    space exploration.

20
Reference architecture
  • Altera hw/sw solution for high performance
    datapath applications
  • Processor that can execute 8 threads
    simultaneously by means of a non conventional
    multithreading
  • Represented as a resource having availability
    equal to eight and frequency equal to Fsoft/8.
  • Asynchronous execution paradigm that combines
    Tasks that are executed in software and Events
    executed by dedicated hardware blocks

21
Petri Net Model verification
  • Comparison of the value of the minimum cycle by
  • defining and simulating the Timed Petri Net with
    a PN simulation tool (CPN tool)
  • implementing and simulating the system through
    the Altera toolchain that combines an ISS for the
    processor with software models of hardware blocks
  • applying the performance analysis

22
Petri Net Model verification
CPN model
23
Petri Net Model verification
  • Fixed partitioning
  • F1, F3, F5 and F7 are executed by hardware
    modules (a shared
  • module is used for F3 and F5) and F2, F4, F6 and
    F8 are
  • executed on the multithreaded processor (with 8
    threads)
  • Different configurations

24
Exploration algorithm
  • Identify the solution that minimizes the area,
    while maintaining the flexibility and satisfying
    the throughput constraints

25
Conclusion
  • This paper presents a method for the solution
    space exploration of datapath applications with
    stringent throughput constraints.
  • Timed Petri Nets to represent the mapping of the
    application onto the architecture, in a Y-chart
    approach.
  • A set of bounds that are exploited by an
    exploration algorithm, based on a branch and
    bound approach, to search the solution space for
    a suitable performance/area configuration.
  • Experimental results on a packet processing
    application show
  • the Petri Net Model can accurately represent the
    behavior of a real system
  • the exploration algorithm is able to find a
    suitable compromise in terms of area/throughput
    in reasonable times
Write a Comment
User Comments (0)
About PowerShow.com