HardwareSoftware Interface CoSynthesis and Latency Insensitive Design - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

HardwareSoftware Interface CoSynthesis and Latency Insensitive Design

Description:

Pai Chou, Ross B. Ortega, Gaetano Borriello. Summary ... After all input values are received, passes to M and fires computation. Gets results of M ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 40
Provided by: mart89
Category:

less

Transcript and Presenter's Notes

Title: HardwareSoftware Interface CoSynthesis and Latency Insensitive Design


1
Hardware/Software Interface Co-Synthesisand
Latency Insensitive Design
  • Marty Nicholes / Ra Roath
  • EEC282

2
Presentation
  • HW/SW Interface for HW/SW codesign
  • Allows simulation of a hw/sw system
  • Eases design space exploration effort
  • Decreases risk of interface overdesign
  • Interconnect Wire Delay Issues
  • With Deep Sub-Micron process technologies
  • Delay of long wire plays a big role

3
HW/SW Interface Basics
  • Connect processor with devices
  • Allocate processor ports
  • Design glue logic for device access
  • Meet timing/throughput constraints

4
1st Paper
  • Interface Co-Synthesis Techniques for Embedded
    Systems
  • Pai Chou, Ross B. Ortega, Gaetano Borriello
  • Summary
  • Set of techniques for HW/SW interface synthesis
  • Generate communication links
  • Minimize glue-logic required
  • Meet timing contraints

5
IO Types
  • Direct
  • Simple connection, no glue logic
  • Indirect
  • Needs glue logic
  • When used
  • Insufficient IO resources
  • Fast device signaling requirements
  • Offload work from main processor

6
Design Flow
System Behavioral Description
Processor Library
Device Library
Processor List
Device List
Interface Co-synthesis
HW GlueLogic (Verilog)
Processor Software
Hardware Netlist
7
Design Flow (cont.)
The behavioral description is a high-level,
imperative language program written by the user
describing the necessary components of the
circuit and its functionality. This program has
a declarative section and an operational
section. The declarative section allocates static
storage for data and instantiates peripheral
devices. The operational section computes
functions and communicates with the peripheral
devices via driver calls. This file is a
Verilog hybrid, allowing a device listing in the
structural part, and a behavioral part describing
the device interaction. The Chinook HW/SW
Cosynthesis System, Chou et. al.
System Behavioral Description
processing
CFGhw
CFGsw
processing
HW SEQs
HW Access Routines
8
Chinook Flow
9
Processor Library Information
  • IO Resources
  • IO Ports with direction, addressability
  • Serial controller (I2C, UART)
  • Access routines
  • Port expander templates
  • Memory bus description

10
Device Library Info
  • Ports
  • Guarded (can isolate)
  • Not guarded
  • Interface properties
  • Low level access routine info - SEQs
  • Processor independent format
  • Represents signaling for processor comm.

11
Design Data
  • Control Flow Graphs
  • Produced from behavioral description
  • Default is CFGsw
  • Designer or tool can mark for CFGhw
  • Output formats
  • Hardware connections as a netlist
  • Processor software including access routines
  • Interface glue logic in Verilog
  • Main Algorithm
  • Synthesize HW access routines
  • Allocate IO resources
  • IO ports first
  • Generate device drivers

12
IO Port Allocation
  • N device ports in decreasing size
  • Guarded can share a port
  • Unguarded requires dedicated port
  • Not enough ports?
  • Make unguarded share
  • Forced sharing
  • Add latch/tri-state glue logic
  • Costs glue logic and a control bit
  • Encoding
  • address Decode to provide an address for
    dedicated pins

13
Port Splitting
  • Algorithm assumes device port smaller than
    processor port
  • Split guarded ports
  • Un-guarded not able to split

14
MMIO
  • Used if IO port allocation FAILS
  • Requires glue logic
  • Algorithm
  • Assume all devices can share
  • Use forced sharing if needed
  • Assign bits on data and address buses
  • Allocate address bits for device selection
  • One hot single address bit for a device
  • Binary n address bits for 2n devices
  • Huffman encoding variable length address
  • Address fields
  • IO prefix used to specify an IO access vs.
    memory
  • Device select used for guard control
  • Device control for non-guard devices

15
MMIO updated SEQ example
  • SEQ updated with MMIO access code.

16
MMIO Example
17
IO Sequencer
  • Created from CFGs marked for hw
  • Communication on processor behalf
  • Generator uses CFGsw and CFGhw
  • Outputs
  • HW description of sequencer
  • SW routines in assembly
  • Minimizes pins and hardware

18
IO Sequencer Generation
  • Protocol synthesis
  • One device SEQ is hw, then all hw
  • Limits pins with bandwidth calculation
  • W port width, Pe minimum time to pass,
    Se data size. Make sure W Pe gt Se
  • FSM generation
  • CFGhw is translated into FSM
  • Connections made from FSM to device
  • CFGsw is updated to talk to FSM

19
IO Sequencer Template
20
Summary for 1st paper
  • These techniques can produce
  • Glue logic to interconnect processor and devices
  • Device drivers
  • Meets the need to assisting with design space
    exploration
  • Allows design to try hw/sw partitions without
    designing the interface
  • Issues
  • Manual marking of CFG for hardware
  • Requires extended device/processor libraries
  • Great idea that the device access routines are
    ISA independent

21
2nd Paper
  • A Methodology for Correct-by-Construction Latency
    Insensitive Design, Luca P. Carloni et. Al.
  • (Presented by Ra Roath)

22
Overview
  • Latency-Insensitivity Protocol
  • Implementation of the protocol
  • Channels
  • Relay Stations
  • Module shells

23
Introduction
  • Advent of Deep Sub-Micron process tech.
  • Generated concerns/predictions of inevitable
    dominance of wire delay.
  • Unanimity that long wire will play significant
    role in logic synthesis optimization.
  • How to rectify this? Interconnect optimization
    techniques.
  • Interconnect Topology Optimization
  • Optimal Buffer Insertion
  • Optimal Wire Sizing
  • When Delay(wire) gt Delay(gates)?

24
Papers purpose
  • Implement a latency insensitive communication
    protocol
  • Given a synchronous design composed of
    communicating modules ? synchronous design that
    tolerates arbitrary communication latency.
  • No need to think of digital system in a
    completely different way(e.g. asynchronous
    design).

25
The Methodology
  • Given Complete synchronous specification of
    system and collection of Modules
  • Communication channels with relay stations
  • Encapsulate each module with a shell
  • Layout obtained by standard PlaceRoute tools
  • Post-Layout Optimization. Necessary number of
    Relay Stations inserted into each critical
    channel.

26
Latency Insensitive vs. Asynchronous
  • Delay insensitive circuit operates correctly
    regardless of delays on gates and wires
  • Arbitrary delay is a multiple of the clock
    period
  • A specified synchronous system
  • Not asynchronous hand-shaking
  • Asynchronous systems require designer to think
    digital systems completely differently

27
Latency Insensitive Protocols
  • Is a protocol that governs the exchange of
    information in a patient system
  • Patient system
  • A synchronous system of functions that depends on
    the order of events, not on their timings
  • Onto Implementation ?

28
Channels
  • Channels are point-to-point unidirectional links
  • Source/Sink Modules
  • Packet Fields
  • Payload
  • Void
  • True Packets

29
Channel Example
30
Channels cont.
  • Data transmitted by packets.
  • Source
  • Puts true packet(void0) or void1 packet on
    channel.
  • Sink
  • Decides to store/discard(based on void)
  • If stalling, sends a stop flag
  • Stop flag tells source that packet cannot be
    received.

31
Relay Stations
  • Packets
  • Payload
  • Void
  • StopOut
  • StopIn
  • Latches

32
Relay Stations cont.
  • At each clock cycle t
  • Takes as input, packetIn, stopIn
  • Outputs packetOut, stopOut
  • Decides whether packetOut packetIn (stalling
    0)
  • Or if packetOut packetOut_prev (stalling1)
  • Internal storage capacity 2 packets

33
Shells
  • A shell is a wrapper that encapsulates module M
  • Interfaces with channels so that M becomes a
    patient process
  • To do so, make M stallable
  • Guarantee input synchronization Internal
    computation fired only if all inputs have arrived
  • Output Propagation Send true packets

34
Shells and Modules
  • CX Channels
  • MX Modules
  • SX Shells
  • ? refer to diagram

35
Shells cont.
  • Shells
  • Get incoming packets from input channels, filters
    void packets
  • After all input values are received, passes to M
    and fires computation
  • Gets results of M
  • If no stop flag is received, sends result

36
Back to Wire Segmentation
  • Why is this related to wire interconnects?
  • Every wire with latency greater than Clk period
    can be segmented
  • Use relay stations to buffer wire
  • Pipelining a wire

37
Procedure
  • Start with collection of synchronous modules
  • Synthesize layout
  • Segment every wire with latency greater than
    Clock period, and add relay stations
  • Build shell around each module to obtain patient
    processes
  • Patient processes interact with relay stations

38
Conclusions of 2nd paper
  • Interconnect Delay
  • Alleviated by segmenting interconnect wire
  • Add Relay stations to segmented wire
  • Add shells to modules to interact with Relay
    Stations and other shelled modules
  • Questions?

39
The End.
Write a Comment
User Comments (0)
About PowerShow.com