Title: Placement for Hierarchical Interconnect based FPGA Devices
1Placement for Hierarchical Interconnect based
FPGA Devices
- presented by
- Kesava R. Talupuru
2Outline
- Introduction- FPGAs
- Current design flow in Quartus tool
- Problems with current flow
- Motivation
- Proposed Flow
- Expected Results
- Time Line
3Introduction
- FPGA (Field Programmable Gate Array)
Two-dimensional array of logic blocks and FFs
with a means for the user to configure - 1. The interconnection between the logic
blocks - 2. The function of each block
Simplified version of FPGA internal architecture
4Why FPGAs?
- By the early 1980s most of the logic circuits in
typical systems were absorbed by a handful of
standard large scale integrated circuits (LSI). - Microprocessors, bus/IO controllers, system
timers, ... - Every system still had the need for random glue
logic to help connect the large ICs - generating global control signals (for resets
etc.)
Few LSI and lots of small low density SSI and MSI
5Why FPGAs?
- Custom ICs are sometimes designed to replace the
large amount of glue logic - reduced system complexity and manufacturing cost,
improved performance, but very expensive and
delay in introduction of product - Therefore the custom IC approach was only viable
for products with very high volume, and which
were not TTM(Time To Market) sensitive.
6Why FPGAs?
- FPGAs were introduced as an alternative to custom
ICs for implementing glue logic - FPGAs now also compete with microprocessors in
dedicated and embedded applications. - Performance advantage over microprocessors
because circuits can be customized for the task
at hand. Microprocessors must provide special
functions in software (many cycles).
7Commercial FPGA(Xilinx)
8Commercial FPGA(Altera)
9Objective
- To generate a legal placement for hierarchical
interconnect based FPGAs such that - Design performance is maximized
- Routing congestion is minimized
10Current Flow - Quartus
HDL Design
Synthesis
Modify
PlacementMincut based recursive partitioning
Routing
Programming bit generation
11Mincut Based Partitioning
- Generate design partitions such that
- Cutsize between partitions are minimized
- Cluster size constraints are satisfied
- Timing driven partitioning minimizes the crossing
of critical nets
12Problem with mincut based partitioning
- Minimizing cut size does not directly minimize
design delay -
-
non-critical edges
cluster
critical edge
Sum of weights of all non critical edges 11
weight 10
case1
case2
A cut will be made in the first case.So, timing
constraint is not met
13Routing Congestion
- The main idea in mincut based partitioning is
that it tries to reduce as much as possible the
number of wires crossing between two partitions - Routing congestion increases because of the
recursive partitioning - Routing congestion can be avoided by evenly
distributing the wires
14Motivation
- Bottom-up clustering groups closely connected
components Routing congestion is improved - Placement with wire length based cost function
Design delay is reduced
15Proposed Flow
HDL Design
Synthesis
Clustering
Simulated Annealing placement
Routing
16Framework-VPR(Versatile Place and Route)
- Targets island style devices
- Clustering-
- For Xilinx style devices cluster size lt 4 LUTs
- For Apex style devices cluster size gt 100LUTs
Island Style Architecture
17Framework-VPR (contd)
- Cost- Island style devices
- Linear function of distance
- Cost is calculated based on bounding box
approximation -
Cav,x(n), Cav,y(n) are the average channel
capacities bbx ,bby are horizontal and vertical
distances
18Hierarchical Devices
- Need to target hierarchical interconnect based
devices
19Cost Calculation for Hierarchical Devices
- Two types of routing
- Intra Cluster Routing
- Inter Cluster Routing
- Inter Cluster Routing Types
- Quadrant
- Half of the chip
- Neither
- Same Row
- Same Column
20Cost Calculation (contd)
single
double
x
21Results
- Verifying our motivation with the help of MCNC
benchmarks - Comparison
- Synthesis Vs Clustering Synthesis (Design
LUTs, Design Delay) - Benchmark(BLIF) -gt synthesis -gtVHDL -gt Quartus
(placement and routing) (Vs) - Benchmark(BLIF) -gt synthesis -gt Clustering -gt
placement -gt VHD -gt Quartus ( routing) --- Delay
values are compared
22Results comparing the Number of Look Up Tables
Used
23Results comparing the Maximum Pin to Pin Delay
24Results comparing Synthesis Fitting Delay
25Just Experimented Compare VPR and Hierarchical
cost function placement Results
26Time Line
- Planning to complete by December 15th
- Sequence of tasks to be done
- Download VPR and try to understand the source
code - Install Quartus software and get familiar with
the usage - Figuring out the different cost functions
calculations and modify the cost functions
accordingly in VPR source code - Get the MCNC benchmarks and perform design
implementation with the changed cost functions
and compare the results with the Quartus
27Acknowledgements
- Thanks to Srini Krishnamoorthy for project
related discussions
28References
- Performance-Driven Multi-Level Clustering with
Application to Hierarchical FPGA Mapping, Jason
Cong - Placement Algorithms for Datatpath-Oriented
FPGAs, Poplavko - Timing-Driven Placement for Hierarchical
Programmable Logic Devices, Michael Hutton - FlowMap An Optimal Technology Mapping Algorithm
for Delay Optimization in Lookup-Table Based FPGA
Designs, Jason Cong