An InterconnectCentric Approach to Cyclic Shifter Design - PowerPoint PPT Presentation

About This Presentation
Title:

An InterconnectCentric Approach to Cyclic Shifter Design

Description:

Ternary shifting. Comparison between barrel shifter and log shifter. 10 ... Extend the fanout splitting idea and ILP formulation to ternary shifter ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 34
Provided by: hai64
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: An InterconnectCentric Approach to Cyclic Shifter Design


1
An Interconnect-Centric Approach to Cyclic
Shifter Design
David M. Harris Harvey Mudd College.
Haikun Zhu, Yi Zhu C.-K. Cheng Harvey Mudd
College.
2
Outline
  • Motivation
  • Previous Work
  • Approaches
  • Fanout-Splitting
  • Cell order optimization by ILP
  • Conclusions

3
Motivation
  • Interconnect dominates gate in present process
    technology
  • Delay, power, reliability, process variation,
    etc.
  • Conventional datapath design focuses on logic
    depth minimization

Source ITRS roadmap 2005
4
Technology Trends
  • Device (ITRS roadmap 2005, Table 40a)
  • Updated Berkeley Predictive Interconnect Model

5
Shifter Taxonomy
  • Functionality
  • Logical Shift MSBs stuffed with 0s
  • Arithmetic Shift Extend original MSB
  • Cyclic Shift (rotation)
  • Bidirectional Shift
  • Circuit Topology
  • Barrel Shifter
  • Logarithmic Shifter

6
Barrel Shifter
Schematic
layout
  • Pros
  • Every data signal pass only one transmission gate
  • Cons
  • Input capacitance is
  • transistors
  • Requires additional decoder for control signals

7
Logarithmic Shifter
layout
Schematic
  • Pros
  • transistors
  • Cons
  • Long inter-stage wires, especially for cyclic
    shifters

Target of Optimization
8
Cyclic Shifter -- Applications
  • Finite Field Arithmetic
  • In normal basis, squaring is done by cyclic
    shifting.
  • Encryption
  • ShiftRows operation in Rijndael algorithm.
  • DCT processing unit
  • Address generator
  • Bidirectional shifting
  • Can be implemented as a cyclic shifter with
    additional masking logic
  • CORDIC algorithm
  • etc

9
Previous Work
  • Bit interleave
  • Two dimensional folding strategy
  • Gate duplicating
  • Ternary shifting
  • Comparison between barrel shifter and log shifter

10
Cyclic Shifter Traditional Design
  • MUX-based

11
Fanout Splitting Shifter
  • Use DEMUXes instead of MUXes

12
Example
Right rotate 5 bits
Red lines are signal lines
Green lines are quiet lines
13
Dynamic Power Consumption
  • Dynamic Power
  • Switching Probability

MUX based design
DEMUX based design
SP 3/16
SP 1/4
SP Switching Probability
14
Gate Complexity
  • Re-factoring design
  • No extra complexity at gate level, both are

DEMUX-based
MUX-based
15
Duality
NAND gates network
NOR gates network
  • Duality provides flexibility for low level
    implementation
  • NAND gates are good for static CMOS.
  • NOR gates are good for dynamic circuits.

16
Cell Permutation
  • Datapath usually assumes bit-slice structure
  • The cell order of the input/output stages must be
    fixed
  • However, the cells in the intermediate stages are
    free to permute.

free
17
Problem Statement
  • Given
  • A N-bit rotator
  • Fixed linear order of the input/output stages
  • Find
  • An optimal permutation scheme of the intermediate
    stages such that the longest path is minimized
    (or, the total wire length s.t. delay
    constraint).

18
ILP Formulation
  • Introduce a set of binary decision variables
  • if and only if logic cell is at
    physical location on level
  • The solution space is fully defined by constraints

19
ILP Formulation (cont)
  • Minimum delay formulation
  • Minimum power formulation

Which can be expanded into
objective
20
ILP Formulation (cont)
  • Represent the length of a single wire segment
  • Formulating absolute operation

Psuedo-linear constraints discarded because were
trying to minimize
21
Complexity
  • Minimum total wire length formulation
  • The case of one level of free cells is a minimum
    weight bipartite matching problem
  • For the case of two or more levels of free cells,
    optimal polynomial algorithm is unknown
  • Hardness of minimum delay formulation
    un-established.

Logic index
Physical location
22
ILP Complexity
  • The ILP formulation does not scale well
  • Both integer variables and constraints are
  • CPLEX uses on branch bound exponential growth
  • Sliding window scheme
  • Only cells in the window are allowed to permute
  • Consists of multiple passes terminate when there
    is no improvement between passes

WW 8 columnsWH 3 rowsHS 4 columnsVS
1 column
23
Power Delay Evaluation
  • Overall Flow

24
Optimal solution for 8-bit case
A global optimal solution
25
16-bit and 32-bit cases
16-bit, global optimal solution
32-bit, suboptimal solution by sliding window
method
26
Results
27
Power-Delay Tradeoff
  • Given Tmax constraint, optimize Ttotal

8-bit
16-bit
8-bit 16-bit are global optimum by cplex
32-bit
64-bit
32-bit 64-bit are suboptimal result by sliding
window scheme
28
Outline
  • Motivation
  • Previous Work
  • Approaches
  • Fanout-Splitting
  • Cell order optimization by ILP
  • Conclusions and future work

29
Conclusions Future Work
  • We have proposed
  • Fanout-splitting design
  • ILP based layout optimization
  • Future directions
  • Extend the fanout splitting idea and ILP
    formulation to ternary shifter
  • Try alternative hierarchical approach to tackle
    the ILP complexity issue

30
The End
Thank you!
31
Derivation of switching probability
Truth table
Gate level implementation ofMUX and DEMUX
For MUX
For DEMUX
32
Evaluating interconnect effect
  • Based on logical effort model
  • Technology independent
  • Easy to incorporate interconnect effect

Gate delay
Logical effort, only depends on gate type
Electrical effort, depends on load cap
Parasitic delay, only depends on gate type
Wire load is integrated into h
Electrical effort contributed by wire per column
spanned
Wire length normalized to cell width
33
Deciding
  • Evaluate for a set of technology nodes
  • It is safe to assume hw1
Write a Comment
User Comments (0)
About PowerShow.com