Congestion-Driven Re-Clustering for Low-cost FPGAs - PowerPoint PPT Presentation

About This Presentation
Title:

Congestion-Driven Re-Clustering for Low-cost FPGAs

Description:

Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of Electrical ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 36
Provided by: Comp1316
Category:

less

Transcript and Presenter's Notes

Title: Congestion-Driven Re-Clustering for Low-cost FPGAs


1
Congestion-Driven Re-Clustering for Low-cost FPGAs
  • MASc Examination
  • Darius Chiu
  • Supervisor
  • Dr. Guy Lemieux
  • University of British Columbia
  • Department of Electrical and Computer Engineering
  • Vancouver, BC, Canada

2
Outline
  • Motivation and background
  • Algorithm
  • Results
  • Conclusion
  • Future Work

3
Example Unroutable
  • Situation Run circuit through VPR
  • Circuit is unroutable at specified target
    channel-width

4
Example Unroutable
  • Situation Run circuit through VPR
  • Circuit is unroutable at specified target
    channel-width
  • Only localized area is actually unroutable
  • Routing congestion happens locally

5
Motivation
  • Goal Must meet hard channel-width constraint
  • Number of routing tracks is fixed at manufacture
    time
  • Must meet channel-width constraint everywhere on
    the FPGA
  • Presented with an unroutable circuit
  • Increase available interconnect (use larger
    device)
  • More interconnect everywhere more expensive
    FPGA device
  • Decrease local interconnect demand
  • Create more aggregate interconnect for congested
    regions only

6
Reclustering Congested Regions
  • Find congested regions and reduce routing demand
  • Increase CLB usage to spread interconnect usage
  • Controlled tradeoff between CLB and interconnect
    usage
  • Cost savings can use the same FPGA, just need to
    recluster

7
Un/DoPack CAD Flow
  • Previous work by Marvin Tom ICCAD2006
  • Target a channel width constraint
  • Spread regional logic to reduce local routing
    demand
  • Identify congested local regions
  • Iteratively recluster, replace, reroute
  • Whitespace insertion recluster with reduced
    cluster size
  • Leave uncongested regions alone

8
Background Un/DoPack
VPR
Cluster
Place
Route
Target CW Met?
Re-cluster
Identify
NO
YES
Place
Route
Un/DoPack
9
Contributions
  • Improve Reduce/Area tradeoff of Un/DoPack Flow
  • Simultaneous area and runtime savings
  • Use congestion information to perform better
    reclustering
  • New approach to selecting congested regions
  • Use of interconnect-demand model to determine how
    much to spread logic
  • Findings
  • Up to 5x runtime speedup versus Baseline
  • Up to 25 area savings versus Baseline

10
Contributions
  • Recently accepted to FPT 2009
  • D. Chiu, G. Lemieux, S. Wilton,
    Congestion-Driven Re-Clustering for Low-cost
    FPGAs

11
Previous Depopulation Schemes
  • Single versus Multiregion Region Selection
  • Select all CLBs in area centered on most
    congested CLB (Single Region)
  • Select all CLBs in area centered on most
    congested CLB, not already chosen (Multiregion)
  • Whitespace insertion
  • Baseline insert CLB in 1 row and 1 column in
    FPGA
  • Fine-grained insert CLB in 1 row and 1 column
    in region
  • Excellent area tradeoff, but slow
  • Multiregion insert CLBs proportional to
    congestion
  • Good runtime performance

12
Algorithm
  • Region Selection Try to select regions more
    intelligently
  • Capture congested regions instead of just CLBs
  • Whitespace Insertion Try to estimate appropriate
    cluster size
  • Use interconnect demand model to predict outcome
    for depopulation

13
Un/DoPack
VPR
Cluster
Place
Route
Region selection
Whitespace Insertion
Target CW Met?
Re-cluster
Identify
NO
YES
Place
Route
Un/DoPack
14
Benchmark Circuits
  • Metacircuits designed to emulate large SOC
    circuits
  • Cluster size 16
  • Built using benchmark generator GNL
  • Large circuit composed of smaller subcircuits
    (SoC style)
  • Each subcircuit emulates the interconnect
    complexity (Rent parameter) of individual MCNC
    circuits
  • The standard deviation of the rent parameter is
    varied to create benchmark suite

15
Region Selection
  • Find congested regions
  • Post Routing congestion information
  • Center region on most congested CLB

16
Region Selection
  • Use congestion values to generate direction to
    move region

17
Region Selection
  • Binary Search
  • Find region with highest average congestion

18
Region Selection
  • Mark Next Region
  • Sort by average congestion and depopulate in
    sorted order

19
Budgeted Multiregion Un/DoPack (BMR)
  • Multiple region approach
  • Grow number of CLBs according to budget at each
    iteration
  • Number of CLBs in a row and column of the FPGA
  • Each region grows equal to 1 row and 1 column of
    region

20
Adding Whitespace
  • Congestion-Model Driven
  • Use interconnect demand information to estimate
    how much whitespace to add
  • Interconnect Model
  • Estimate post clustering channel width for
    region

21
Regional Interconnect
  • Adapt demand model for regions of CLBs instead of
    whole FPGA
  • Most wiring is from inside the region
  • Cannot affect wiring across region directly
    through depopulation

22
Regional Interconnect
  • Assume external interconnect demand stays fixed
  • Solve for internal interconnect demand

region interconnect demand Internal demand
external demand
23
Interconnect-Demand Model
where
W. Fang and J. Rose. Modeling routing demand for
early-stage FPGA architecture development
24
Interconnect-Demand Model
  • Use congestion map to determine equation
    constants
  • Calibrate equation separately for each region
  • Solve for lambda that gives desired channel width
  • Re-cluster region with lower cluster size until
    lambda target is met

25
Congestion-Model Multiregion Un/DoPack (CMR)
  • Same region selection method as BMR
  • No constraint on new CLBs in each iteration
  • Whitespace insertion using model

26
Results
  • Typical results
  • Stdev004
  • CMR Speedup comparable to Multiregion Un/DoPack
  • BMR Slightly faster than Baseline

27
Results
  • Typical results
  • Stdev004
  • BMR area better than Multiregion
  • CMR area better than Multiregion

28
Runtime / Area Tradeoff
  • Previous Multiregion Approach (Fast)
  • Previous Fine-Grained Approach (Good Area)
  • Speed-Area Tradeoff

29
Runtime / Area Tradeoff
  • BMR
  • Improved runtime
  • Good area performance

30
Runtime / Area Tradeoff
  • CMR
  • Better runtime
  • Good area

31
Critical Path
32
Congestion Driven Placement
  • We can further improve area performance using a
    congestion driven placer

33
Conclusions
  • Use congestion information to perform better
    re-clustering
  • Up to 5x runtime speedup versus baseline
  • Up to 25 area savings versus baseline
  • Improve Reduce/Area tradeoff of Un/DoPack Flow
  • Simultaneous area and runtime savings

34
Future Work
  • Consider effect of neighboring Regions
  • Other congestion-driven tools
  • Fast Placement

35
Questions?
36
Outline
  • Motivation
  • Background
  • Multiregion approach
  • Congestion-Driven Whitespace insertion

37
Related Work
  • Un/DoPack 1
  • Reduce interconnect usage to meet target channel
    width constraint
  • Congestion Driven Clustering
  • iRAC, ISPL
  • Single-Pass Clustering
Write a Comment
User Comments (0)
About PowerShow.com