Clustering of Large Designs for ChannelWidth Constrained FPGAs - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Clustering of Large Designs for ChannelWidth Constrained FPGAs

Description:

Clique. Outputs of each IP block are uniformly distributed to inputs of all other IP blocks ... Clique MetaCircuit. P&R channel width results closely match ' ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 37
Provided by: marvi7
Category:

less

Transcript and Presenter's Notes

Title: Clustering of Large Designs for ChannelWidth Constrained FPGAs


1
Clustering of Large Designs forChannel-Width
Constrained FPGAs
Marvin Tom Guy Lemieux University of British
Columbia Department of Electrical and Computer
Engineering Vancouver, BC, Canada
2
Overview
  • Introduction, Goals and Motivation
  • Reduce channel width, lower cost, make circuits
    routable
  • Reducing Channel Width By Depopulation
  • Large Benchmark Circuits
  • New Clustering Technique
  • Selective Depopulation
  • Conclusions and Future Work

3
Mesh-Based FPGA Architecture
  • Channel width
  • Number of routing tracks per channel
  • Larger FPGA devices more tiles
  • Channel width is fixed

4
Motivation Area of FPGA Devices
MCNC Circuits Mapped onto an FPGA
Total Layout AREA SIZE Number
5
Motivation Channel Width Demand
MCNC Circuits Mapped onto an FPGA
Devices built for worst-casechannel width (fixed
width)
Interconnect cost dominates (gt70)
6
Goal Reduce Channel Width
But apex4, elliptic, frisc, ex1010, spla, pdc
are unroutable. Can we make them routable in
a Constrained FPGA?
7
Possible Solution
  • Trade-off logic utilization for channel width
  • User can always buy more logic. (not more wires)

Trade-off CLB count for Channel width
FPGA 1
FPGA 2
But.. can we achieve lower Total Area? (
SIZE CLB Count)
8
Logic Element BLE and CLB
BLE 1
  • Basic Logic Element (BLE)
  • k-input LUT FF
  • Clustered Logic Block (CLB)
  • N BLEs, N outputs
  • I shared inputs

BLE 2
BLE 3
N Outputs
I Inputs
BLE 4
Note I lt kN
BLE 5
CLB
9
CLB Depopulation
BLE 1
  • Normally CLBs fully packed
  • Reduces total of CLBs needed for circuit
  • CLB Depopulation Tessier, DeHon
  • Do not use all BLEs ?
  • Increase CLBs used ?
  • Decrease channel width ?
  • Decrease overall area
  • Problem
  • Increase in CLBs high for large circuits
  • Our work limits CLB increase

BLE 2
BLE 3
N Outputs
I Inputs
BLE 4
BLE 5
CLB
10
Uniform Depopulation
  • Previous work
  • Depopulate each CLB by equal amount
  • But circuit observations
  • regions of high routing demand
  • regions of low routing demand
  • Depopulate in low congestion areas ??
  • Unnecessary increase in area

11
Non-Uniform Depopulation
  • Our depopulation method
  • Assume congestion is localized
  • Depopulate only congested areas
  • We show non-uniform de-population
  • Effective method of channel width reduction
  • Graceful tradeoff between channel width and area
  • Makes unroutable circuits routable

12
Depopulation MethodstoReduce Channel Width
13
CLB Depopulation
BLE 1
  • General Approach
  • Use existing clustering tools
  • Do not fill CLB while clustering
  • Input-Limited
  • Eg. Maximum 67 inpututilization per CLB
  • Might use all BLEs
  • BLE-Limited
  • Eg. Maximum 60 BLE utilization per CLB
  • Might use all Inputs

BLE 2
N Outputs
BLE 3
I Inputs
BLE 4
BLE 5
CLB
14
Reducing Channel Width Results(max cluster size
16)
  • Input-Limited
  • No channel width control
  • BLE-Limited
  • (almost) monotonically increasing ? good channel
    width control

15
Benchmark Circuit Creation
  • (We want BIG circuits!)
  • (What do REALLY BIG circuits look like?)

16
Benchmarking Circuits Some Observations
  • Altera has bigger benchmarks than academics
  • We noted similar characteristics
  • Some LARGE circuits routable with NARROW routing
    channels
  • Some SMALL circuits need WIDE routing channels
  • What if each circuit is IP Block in larger
    system ??

17
Benchmark Creation IP Blocks
  • Mimic process of creating large designs
  • IP Blocks ltgt MCNC Circuits
  • SoC ltgt Randomly integrate/stitch together IP
    Blocks
  • IP Blocks have varied interconnect needs
  • Real-life large designs System-on-Chip
    Methodology
  • IP blocks (own, 3rd party)
  • Re-use improves productivity
  • Primarily integration and verification effort

18
Benchmark Creation Large Designs
  • Considered 3 stitching schemes
  • Independent
  • IP Blocks are not connected to each other
  • Pipeline
  • Outputs of one IP block connected to inputs of
    next IP block
  • Clique
  • Outputs of each IP block are uniformly
    distributed to inputs of all other IP blocks

19
MetaCircuitReducing Routed Channel Width?
  • Observations
  • IP blocks are tightly-connected internally
  • IP blocks have varied channel width needs
  • Hypotheses
  • Placement keeps each IP block together
  • IP blocks has large routed channel width ?
    MetaCircuit has large routed channel width

20
Hypothesis TestingMetaCircuit PR Results
  • Use VPR FPGA tools from University of Toronto
  • Hypothesis 1
  • VPR placer successfully groups IP blocks from
    random initial placement
  • Hypothesis 2
  • VPR router confirms channel width of MetaCircuit
    is dominated by a few IP blocks pdc, clma,
    ex1010

21
Consequences of Hypothesis 2
  • Question
  • Shrink channel width of few IP blocks ??? shrink
    channel width of MetaCircuit?
  • How to shrink channel widths?
  • Selective CLB Depopulation !!
  • Depopulate hard-to-route IP blocks the most
  • How much to depopulate?
  • Channel width profiling of IP block

22
Meeting Channel Width ConstraintsSelective
Depopulation
  • Step 1 Channel Width Profiling of IP Blocks
    (Congestion Estimation)
  • Step 2 Re-cluster Only Congested IP Blocks
    (Selective Depopulation)

23
IP Block Properties
  • Cluster IP Blocks into N16, k6
  • VPR determine minimum channel width for each IP
    Block
  • Sort IP Blocks based on channel width

Hard-to-Route Circuits
Easy-to-Route Circuits
24
Channel Width Profiling of IP Block
  • Cluster sizes
  • NA FPGA Architecture Cluster Size (fixed)
  • NC BLE-Limit Size (variable)
  • Sweep NC for each IP block

25
Analysis with Constraint
  • Given channel-width constraint of 60 tracks
  • tseng routable (easy)
  • clma routable for NC lt 10
  • clma not routable for NC gt 10

26
Our Technique Selective Depopulation
  • Step 1 Channel Width Profiling of IP Blocks
    (Congestion Estimation)
  • Step 2 Re-cluster Only Congested IP Blocks
    (Selective Depopulation)

27
Uniform Depopulation
  • Minimum NC Cluster Size
  • De-populate all clusters equally
  • Eg, use NC10 for both IP Blocks

28
Non-Uniform Depopulation
  • Maximal NC Cluster Size
  • Depopulate each IP block according to maximal
    cluster size
  • Eg, clma NC10, tseng NC16

29
Uniform vs. Non-Uniform
  • Non-Uniform depopulation better than Uniform
  • Lower CLB count
  • Higher LUT utilization

LUT Utilization
Total CLBs Needed
Uniform
Non-Uniform
Uniform
Non-Uniform
x 1,000
Channel Width Constraint
Channel Width Constraint
30
MetaCircuit Clustering Results
  • Depopulate the most-congested IP blocks
  • (BLE-Limit) of each IP block shown(max16)
  • Some IP blocks are depopulated more than others

31
MetaCircuit PR Results
  • Clique MetaCircuit
  • PR channel width results closely match
    constraints

Constraint
Routed
Channel Width
Normalized Area
1
Channel Width Constraint
Channel Width Constraint
  • Shrink Channel Width by 20 (from 95 to 75), NO
    AREA INCREASE
  • by 50
    (from 95 to 50), 1.7x area increase

32
Other MetaCircuit Results
These latest results are better than those
given in paper
33
Critical Path Delay and Average Wirelength
  • Expect critical path delay to increase under
    tighter constraints
  • Delay noise due to instability of floorplan
    locations
  • Average wirelength / net increases under tighter
    constraints

34
Conclusion
  • System-level technique to map large
    System-on-Chip (SoC) designs to channel-width
    constrained FPGAs using fewer routing resources
  • Depopulating CLBs effective at reducing channel
    width
  • Non-uniform depopulation important to limit area
    inflation
  • Channel width reduced
  • by 0-20 with lt 5 area increase
  • by up to 50 with 3.3 X area increase
  • Effective solution to trade-off CLBs for
    Interconnect !!!
  • UNROUTABLE circuits (channel width TOO LARGE)can
    be made ROUTABLE (reduced channel width)by
    buying an FPGA with MORE LOGIC!!!

35
End of Talk
36
Future Work
  • Real-Life SoC Benchmark
  • Licensed IP Bluetooth baseband processor
  • 325,000 ASIC gates
  • Numerous IP blocks of varying complexity
  • Needed to authenticate Synthetic results
  • Automated technique to find hard IP blocks
  • Granularity is based on design hierarchy (?)
  • Replaces time-consuming Step 1 of process

37
Motivation Reduce Cost
  • Observations
  • Interconnect dominates, layout area gt 70
  • Fixed interconnect architecture
  • Designed for near-worst-case demand
  • Same interconnect architecture across entire
    family
  • Eg, Altera Cyclone 80 tracks-per-channel for all
    devices
  • Choice for logic capacity (device selection)
  • No choice for interconnect capacity
  • Result
  • Overcapacity in interconnect
  • Interconnect dominates cost
  • User has no way to reduce dominant cost

38
Fixed Channel-Width Constraints
  • Real FPGA Device fixed Channel Width
  • Some hard-to-route circuits (routing intensive)
    wont (reword?) fit
  • Problem
  • Find way to make circuit fit
  • Our Approach
  • Divide circuit into large-sized chunks, eg IP
    Blocks
  • Make hard-to-route IP Blocks easy-to-route by
    CLB depopulation
  • This increases CLB usage
  • Leave easy ones alone limit CLB increase

39
Overview of Clustering Approach
  • Two methods for choosing NC
  • Uniform Depopulation use fixed NC lt NA
  • Non-Uniform Depopulation use best NC lt NA
  • As expected, Non-Uniform gives better results
  • Cluster each IP block separately
  • Compare with 2 clustering tools
  • T-VPACK vs. iRAC replica
  • Channel Width Prediction
  • Largest Channel Width of IP blocks lt Channel
    Width of MetaCircuit
Write a Comment
User Comments (0)
About PowerShow.com