Title: ASIC Front-End Design
1ASIC Front-End Design
ECE 448 Lecture 19
2Two competing implementation approaches
FPGA Field Programmable Gate Array
ASIC Application Specific Integrated Circuit
- designed all the way
- from behavioral description
- to physical layout
- no physical layout design
- design ends with
- a bitstream used
- to configure a device
- designs must be sent
- for expensive and time
- consuming fabrication
- in semiconductor foundry
- bought off the shelf
- and reconfigured by
- designers themselves
3FPGAs vs. ASICs
FPGAs
ASICs
Off-the-shelf
High performance
Low development costs
Low power
Short time to the market
Low cost (but only in high volumes)
Reconfigurability
4ASIC Design Example Factoring circuit/GMU
Global Memory
Local Memory
5ASIC 130 nm vs. Virtex II 6000 Factoring/GMU
19.80 mm
51x
Area of Xilinx Virtex II 6000 FPGA (estimation
by R.J. Lim Fong, MS Thesis, VPI, 2004)
19.68 mm
2.7 mm
2.82 mm
Area of an ASIC with equivalent functionality
6ASICs vs. FPGAs
- Source
- I. Kuon, J. Rose,
- University of Toronto
- Measuring the Gap Between
- FPGAs and ASICs
- IEEE Transactions on Computer-Aided
- Design of Integrated Circuits and Systems,
- vol. 62, no. 2, Feb 2007.
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11Simplified ASIC Design Flow
Synthesis
Front-End Design
Timing Analysis
Floorplanning
Back-End Design
Placement
Clock Tree Synthesis
Routing
Design for Manufacturing
31
12Major ASIC Toolsets
Cadence
Magma
13Simplified ASIC Design Flow
Synopsys Tools
Synthesis
Design Analyzer
Front-End Design
Primetime
Timing Analysis
Floorplanning
Back-End Design
Placement
Astro
Clock Tree Synthesis
Routing
Design for Manufacturing
31
14A Complete Placed and Routed Chip
28
15What is Physical Layout?
Physical Layout Topography of devices and
interconnects, made up of polygons that represent
different layers of material (diffusion,
polysilicon, metal, contact, etc)
16Process of Device Fabrication
- Devices are fabricated vertically on a silicon
substrate wafer by layering different materials
in specific locations and shapes on top of each
other - Each of many process masks defines the shapes and
locations of a specific layer of material
(diffusion, polysilicon, metal, contact, etc) - Mask shapes, derived from the layout view, are
transformed to silicon via photolithographic and
chemical processes
Wafer (cross-sectional) view
40
17Wafer Representation of Layout Polygons
Wafer Cross-sectional View
41
18Front-End Design Flow
19Simplified RTL Synthesis
20 VHDL vs. Verilog
21Logic Synthesis
VHDL description
Circuit netlist
architecture MLU_DATAFLOW of MLU is signal
A1STD_LOGIC signal B1STD_LOGIC signal
Y1STD_LOGIC signal MUX_0, MUX_1, MUX_2, MUX_3
STD_LOGIC begin A1ltA when (NEG_A'0')
else not A B1ltB when (NEG_B'0') else not
B YltY1 when (NEG_Y'0') else not
Y1 MUX_0ltA1 and B1 MUX_1ltA1 or
B1 MUX_2ltA1 xor B1 MUX_3ltA1 xnor
B1 with (L1 L0) select Y1ltMUX_0 when
"00", MUX_1 when "01", MUX_2 when
"10", MUX_3 when others end MLU_DATAFLOW
22Logic Synthesis
23TCL Tool Command Language
- Created by John Ousterhout of UC Berkeley
- Scripting Language
- Very simple to automate routine tasks.
- Extension Language
- Used to customize tools with user/company
specific aplications. - Nearly all of modern EDA tools have a TCL
interface. - Very simple to learn and use.
24TCL Example
- proc rfmdIfNotDirMkdir directory
- if ! file exists directory
- file mkdir directory
-
- if ! file isdirectory directory
- echo "Could not make \"directory\""
- exit 1
- elseif ! file writable directory
- echo " \"directory\" is not writable"
- exit 1
- else
- return 1
-
-
25TCL References
- Practical Programming in Tcl and TK
- Brent B. Welch
- Ken Jones
- TCL/TK in a Nutshell
- Paul Raines
- Jeff Tranter
26Basic Synthesis Flow
27Synthesis using Design Compiler
28(No Transcript)
29(No Transcript)
30Synthesis script (1)
- designer "Pawel Chodowiec"
- company "George Mason University"
- search_path
- "./opt3/synopsys/TSMCHOME/digital/Front_End/timing
_power/tcb013ghp_200a " - link_library " tcb013ghptc.db" /
Typical case library / - target_library "tcb013ghptc.db "
- symbol_library "tcb013ghp.sdb "
- / Directory configuration /
- src_directory /exam1/vhdl/
- report_directory /exam1/reports/
- db_directory /exam1/db/
31Synthesis script (2)
- / Packages can be only read /
- read_file -format vhdl -rtl src_directory
"components.vhd" - blocks regne, upcount, RAM_16Xn_DISTRIBUTED,
exam1 - foreach (block, blocks)
- block_source src_directory block ".vhd"
- read_file -format vhdl -rtl block_source
- analyze -format vhdl -lib WORK block_source
-
- current_design block
- / All commands now apply to the entity "exam1"
/
32Synthesis script (3)
- uniquify
- / Creates unique instances of multiple refrenced
entities / - link
- check_design
- / Checks the current design for consistency /
- //
- / apply block attributes and constraints /
- //
- create_clock -period 10 clk
- / Defines that the port "clk" on the entity
"clk" - is the clock for the design. Period10ns 50 duty
cycle - Use -waveform option to define duty cycle other
than 50/ - set_operating_conditions NCCOM
- /Normal Case Commercial Operating Conditions/
33Synthesis script (4)
- /
/ - / Apply these constraints to the top-level
entity/ - /
/ - set_max_fanout 100 block
- set_clock_latency 0.1 find(clock, "clk")
- set_clock_transition 0.01 find(clock, "clk")
- set_clock_uncertainty -setup 0.1 find(clock,
"clk") - set_clock_uncertainty -hold 0.1 find(clock,
"clk") - set_load 0 all_outputs()
- set_input_delay 1.0 -clock clk -max all_inputs()
- set_output_delay -max 1.0 -clock clk
all_outputs() - set_wire_load_model -library tcb013ghptc -name
"TSMC8K_Fsg_Conservative"
34Wireload model basics (1)
35Wireload model basics (2)
36Synthesis script (5)
- set_dont_touch block
- compile -map_effort medium
- change_names -rules vhdl
- vhdlout_architecture_name "sort_syn"
- vhdlout_use_packages "IEEE.std_logic_1164"
- write -f db -hierarchy -output db_directory
"exam1.db" - /write -f vhdl -hierarchy -output db_directory
"exam1_syn.vhd"/ - report -area gt report_directory
"exam1.report_area" - report -timing -all gt report_directory
"exam1.report_timing"
37Results of synthesis
38Area report after synthesis (1)
- report_area
- Information Updating design information...
(UID-85) -
-
- Report area
- Design exam1
- Version V-2003.12-SP1
- Date Tue Nov 15 203906 2005
-
- Library(s) Used
- tcb013ghptc (File /opt3/synopsys/TSMCHOME/dig
ital/Front_End/timing_power/ - tcb013ghp_200a/tcb013ghptc.db)
39Area report after synthesis (2)
- Number of ports 75
- Number of nets 346
- Number of cells 107
- Number of references 28
- Combinational area 10593.477539
- Noncombinational area 14295.521484
- Net Interconnect area
- undefined
(Wire load has zero net area) - Total cell area 24888.976562
- Total area undefined
40Critical Path (1)
- Critical Path The Longest Path From Outputs of
Registers to Inputs of Registers
t logic
tCritical tFF-P tlogic tFF-setup
41Critical Path (2)
- Min. Clock Period Length of The Critical Path
- Max. Clock Frequency 1 / Min. Clock Period
42nm
nm
43Clock Jitter
- Rising Edge of The Clock Does Not Occur Precisely
Periodically - May cause faults in the circuit
clk
44Clock Skew
- Rising Edge of the Clock Does Not Arrive at Clock
Inputs of All Flip-flops at The Same Time
45Timing report after synthesis (1)
-
- Report timing
- -path full
- -delay max
- -max_paths 1
- Design exam1
- Version V-2003.12-SP1
- Date Tue Nov 15 203906 2005
-
- Operating Conditions NCCOM Library
tcb013ghptc - Wire Load Model Mode segmented
46Timing report after synthesis (2)
- Startpoint in_addr(1) (input port clocked by
clk) - Endpoint RegSUM/Q_reg34
- (rising edge-triggered flip-flop
clocked by clk) - Path Group clk
- Path Type max
- Des/Clust/Port Wire Load
Model Library - ------------------------------------------------
----------------------------------- - exam1 TSMC8K_Fsg_Conservati
ve tcb013ghptc - RAM_16Xn_DISTRIBUTED ZeroWireload
tcb013ghptc - exam1_DW01_cmp2_32_0 ZeroWireload
tcb013ghptc - exam1_DW01_cmp2_32_1 ZeroWireload
tcb013ghptc - exam1_DW01_add_35_0 ZeroWireload
tcb013ghptc - regne_1
ZeroWireload tcb013ghptc - regne_2
ZeroWireload tcb013ghptc - regne_n35
ZeroWireload tcb013ghptc
47Timing report after synthesis (3)
- Point
Incr Path - ------------------------------------------------
------------------------------------------------ - clock clk (rise edge)
0.00 0.00 - clock network delay (ideal)
0.10 0.10 - input external delay
1.00 1.10 f - in_addr(1) (in)
0.00 1.10 f - U98/Z (CKMUX2D1)
0.13 1.23 f - Memory/ADDR1 (RAM_16Xn_DISTRIBUTED) 0.00
1.23 f - Memory/U41/ZN (INVD1)
0.08 1.31 r - Memory/U343/Z (OR3D1)
0.10 1.41 r - Memory/U338/ZN (INVD2)
0.20 1.61 f - Memory/U40/ZN (MOAI22D0)
0.17 1.78 f - Memory/U350/Z (OR4D1)
0.26 2.03 f - Memory/DATA_OUT0 (RAM_16Xn_DISTRIBUTED) 0.00
2.03 f
48Timing report after synthesis (4)
- add_96xplusxplus/B0 (exam1_DW01_add_35_0)
0.00 2.03 f - add_96xplusxplus/U9/Z (AN2D0)
0.12 2.15 f - add_96xplusxplus/U1_1/CO (CMPE32D1)
0.10 2.25 f - add_96xplusxplus/U1_2/CO (CMPE32D1)
0.10 2.34 f - add_96xplusxplus/U1_3/CO (CMPE32D1)
0.10 2.44 f - add_96xplusxplus/U1_4/CO (CMPE32D1)
0.10 2.54 f - add_96xplusxplus/U1_5/CO (CMPE32D1)
0.10 2.63 f - add_96xplusxplus/U1_6/CO (CMPE32D1)
0.10 2.73 f - add_96xplusxplus/U1_7/CO (CMPE32D1)
0.10 2.82 f - add_96xplusxplus/U1_8/CO (CMPE32D1)
0.10 2.92 f - add_96xplusxplus/U1_9/CO (CMPE32D1)
0.10 3.02 f - add_96xplusxplus/U1_10/CO (CMPE32D1)
0.10 3.11 f - add_96xplusxplus/U1_11/CO (CMPE32D1)
0.10 3.21 f - add_96xplusxplus/U1_12/CO (CMPE32D1)
0.10 3.31 f - add_96xplusxplus/U1_13/CO (CMPE32D1)
0.10 3.40 f - add_96xplusxplus/U1_14/CO (CMPE32D1)
0.10 3.50 f
49Timing report after synthesis (5)
- add_96xplusxplus/U1_15/CO (CMPE32D1)
0.10 3.60 f - add_96xplusxplus/U1_16/CO (CMPE32D1)
0.10 3.69 f - add_96xplusxplus/U1_17/CO (CMPE32D1)
0.10 3.79 f - add_96xplusxplus/U1_18/CO (CMPE32D1)
0.10 3.88 f - add_96xplusxplus/U1_19/CO (CMPE32D1)
0.10 3.98 f - add_96xplusxplus/U1_20/CO (CMPE32D1)
0.10 4.08 f - add_96xplusxplus/U1_21/CO (CMPE32D1)
0.10 4.17 f - add_96xplusxplus/U1_22/CO (CMPE32D1)
0.10 4.27 f - add_96xplusxplus/U1_23/CO (CMPE32D1)
0.10 4.37 f - add_96xplusxplus/U1_24/CO (CMPE32D1)
0.10 4.46 f - add_96xplusxplus/U1_25/CO (CMPE32D1)
0.10 4.56 f - add_96xplusxplus/U1_26/CO (CMPE32D1)
0.10 4.66 f - add_96xplusxplus/U1_27/CO (CMPE32D1)
0.10 4.75 f - add_96xplusxplus/U1_28/CO (CMPE32D1)
0.10 4.85 f - add_96xplusxplus/U1_29/CO (CMPE32D1)
0.10 4.94 f - add_96xplusxplus/U1_30/CO (CMPE32D1)
0.10 5.04 f - add_96xplusxplus/U1_31/CO (CMPE32D1)
0.10 5.14 f
50Timing report after synthesis (6)
- add_96xplusxplus/U7/Z (AN2D0)
0.10 5.24 f - add_96xplusxplus/U5/Z (AN2D0)
0.08 5.32 f - add_96xplusxplus/U4/Z (CKXOR2D0)
0.15 5.47 f - add_96xplusxplus/SUM34 (exam1_DW01_add_35_0) 0
.00 5.47 f - RegSUM/R34 (regne_n35)
0.00 5.47 f - RegSUM/U32/Z (AO21D0)
0.11 5.57 f - RegSUM/Q_reg34/D (EDFQD1)
0.00 5.57 f - data arrival time
5.57
51Timing report after synthesis (7)
- clock clk (rise edge)
10.00 10.00 - clock network delay (ideal)
0.10 10.10 - clock uncertainty
-0.10 10.00 - RegSUM/Q_reg34/CP (EDFQD1)
0.00 10.00 r - library setup time
-0.12 9.88 - data required time
9.88 - ------------------------------------------------
------------------------------------- - data required time
9.88 - data arrival time
-5.57 - ------------------------------------------------
------------------------------------- - slack (MET)
4.31
52Static Timing Analysis
53Static Timing Analysis Review
- Tools will calculate all paths from sequential
start point to sequential end point. - The worst case path will be used for Setup
analysis, and the best case path will be used for
hold analysis. - All paths are considered for design rule checking
54Review of Setup and Hold Checks
55False and Multicycle paths
- False path
- Very slow signals like reset, test mode enable,
that are not used under normal conditions are
classified as false paths - Multicycle path
- Paths that take more than one clock cycle are
known as multicycle paths. - Have to take define the multicylce paths in the
analyzer and it takes those constraints into
account when synthesizing
56Multicycle path - Example
57Optimizationcriteria
58Degrees of freedom and possible trade-offs
speed
area
power
testability
59Degrees of freedom and possible trade-offs
speed
latency
area
throughput
60VHDL Coding for Synthesis
61Recommended rules for Synthesis
- When implementing combinational paths do not have
hierarchy - Register all outputs
- Do not implement glue logic between blocks,
partition them well - Separate designs on functional boundary
- Keep block sizes to a reasonable size
62Avoid hierarchical combinational blocks
The path between reg1 and reg2 is divided between
three different block Due to hierarchical
boundaries, optimization of the combinational
logic cannot be achieved Synthesis tools
(Synopsys) maintain the integrity of the I/O
ports, combinational optimization cannot be
achieved between blocks (unless grouping is
used).
63Recommend way to handle Combinational Paths
All the combinational circuitry is grouped in the
same block that has its output connected the
destination flip flop It allows the optimal
minimization of the combinational logic during
synthesis Allows simplified description of the
timing interface
64Register all outputs
Simplifies the synthesis design environment
Inputs to the individual block arrive within the
same relative delay (caused by wire delays) Dont
really need to specify output requirements since
paths starts at flip flop outputs. Take care of
fanouts, rule of thumb, keep the fanout to 16
(dependent on technology and components that are
being driven by the output)
65NO GLUE LOGIC between blocks
Due to time pressures, and a bug found that can
be simply be fixed by adding some simple glue
logic. RESIST THE TEMPTATION!!! At this level in
the hierarchy, this implementation will not allow
the glue logic to be absorbed within any lower
level block.
66Separate design with different goals
reg1 may be driven by time critical function,
hence will have different optimization
constraints reg3 may be driven by slow logic,
hence no need to constrain it for speed
67Optimization based on design requirements
- Use different entities to partition design blocks
- Allows different constraints during synthesis to
optimize for area or speed or both.
68Separate FSM with random logic
- Separation of the FSM and the random logic allows
you to use FSM optimized synthesis
69Maintain a reasonable block size
- Partition your design such that each block is
between 1000-10000 gates (this is strictly tools
and technology dependent) - Larger the blocks, longer the run time -gt quick
iterations cannot be done.