Title: Scalable and Scalably-Verifiable Sequential Synthesis
1Scalable and Scalably-Verifiable Sequential
Synthesis
- Alan Mishchenko Mike Case Robert
Brayton - UC Berkeley
2Overview
- Introduction
- Computations
- SAT sweeping
- Induction
- Partitioning
- Verification
- Experiments
- Future work
3Introduction
- Combinational synthesis
- Cuts at the register boundary
- Preserves state encoding, scan chains test
vectors - No sequential optimization easy to verify
- Sequential synthesis
- Runs retiming, re-encoding, use of sequential
dont-cares, etc - Changes state encoding, invalidates scan chains
test vectors - Some degree of sequential optimization
non-trivial to verify - Scalably-verifiable sequential synthesis
- Merges sequentially equivalent registers and
internal nodes - Minor change to state encoding, scan chains
test vectors - Some degree of sequential optimization easy to
verify!
4Combinational SAT Sweeping
- Naïve CEC approach SAT solving
- Build output miter and call SAT
- works well for many easy problems
- Better CEC approach SAT sweeping
- based on incremental SAT solving
- Detects possibly equivalent nodes using
simulation - Candidate constant nodes
- Candidate equivalent nodes
- Runs SAT on the intermediate miters in a
topological order - Refines the candidates using counterexamples
5Sequential SAT Sweeping
- Sequential SAT sweeping is similar to
combinational one in that it detects node
equivalences - The difference is, the equivalences are
sequential - They hold only in the reachable state space
- Every comb. equivalence is a seq. one, not vice
versa - It makes sense to run comb. SAT sweeping
beforehand - Sequential equivalence is proved by K-step
induction - Base case
- Inductive case
- Efficient implementation of induction is key!
6Base Case Inductive Case
?
Candidate equivalences A,B, C,D
?
Proving internal equivalences in a topological
order in frame K
?
?
PIk
0
0
PI1
C
?
D
A
Assuming internal equivalences to in
uninitialized frames 0 through K-1
?
B
PI1
0
0
PI0
C
D
Initial state
A
B
Proving internal equivalences in initialized
frames 0 through K-1
PI0
Symbolic state
7Efficient Implementation
- Two observations
- Both base and inductive cases of K-step induction
are runs of combinational SAT sweeping - Tricks and know-hows of combinational sweeping
are applicable - The same integrated package can be used
- Starts with simulation
- Performs node checking in a topological order
- Benefits from the counter-example simulation
- Speculative reduction
- Has to do with how the assumptions are made (see
next slide)
8Speculative Reduction
- Inputs to the inductive case
- Sequential circuit
- The number of frames to unroll (K)
- Candidate equivalence classes
- One node in each class is designated as the
representative node - Currently the representatives are the first nodes
in a topological order - Speculative reduction moves fanouts to the
representative nodes - Makes 80 of the constraints redundant
- Dramatically simplifies the resulting timeframes
(observed 3x reductions) - Leads to saving 100-1000x in runtime during
incremental SAT solving
0
0
A
A
B
B
Adding assumptions with speculative
reduction
Adding assumptions without speculative
reduction
9Partitioning for Induction
- A simple output-partitioning algorithm was
implemented - One person-day of programming
- CEC and induction became more scalable
- Typical reduction in runtime is 20x for a 1M-gate
design - Partitioning is meant to make SAT problems
smaller - The same partitioning is useful for
parallelization! - Partitioning algorithm
- Pre-processing For all POs, finds PIs they
depend on - Main loop For each PO, in a degreasing order of
support size - Finds a partition by looking at the supports
- Chooses partition with min linear combination of
attraction and repulsion (determined by the
number of common and new variables in this PO) - Imposes restrictions on the partition size
- Post-processing Compacts smaller partitions
- Complexity O( numPis(AIG) numPos(AIG) )
10Partitioning Details
- Currently induction is partitioned only for
register correspondence - In this case, it is enough to partition only one
timeframe! - In each iteration of induction
- The design is re-partitioned
- Nodes in each candidate equiv class are added to
the same partition - Constant candidates can be added to any partition
- Candidates are merged at the PIs and proved at
the POs - After proving all partitions, the classes are
refined - The partitioned induction has the same
fixed-point as the monolithic induction while the
number of iterations can differ (different
c-examples lead to different refinements)
Partition 1
Illustration for two cand equiv classes A,B,
C,D
Partition 2
11Other Observations
- Surprisingly, the following are found to be of
little or no importance for speeding up the
inductive prover - The quality of initial equivalence classes
- How much simulation (semi-formal filtering) was
applied - AIG rewriting on speculated timeframes
- Although AIG can be reduced 20, incremental SAT
runs the same - The quality of AIG-to-CNF conversion
- Naïve conversion (1 AIG node 3 clauses) works
just fine - Open question Given these observations, how to
speed up this type of incremental SAT?
12Verification after PSS
- Poison and antidote are the same!
- The same inductive prover is used
- during synthesis to prove seq equivalence of
registers and nodes - during verification to prove seq equivalence of
registers, nodes, and POs of two circuits - Verification is unbounded and general-case
- No limit on the input sequence is imposed (unlike
BMC) - No information about synthesis is passed to the
verification tool - The runtimes of synthesis and verification are
comparable - Scales to 10K-register designs due to
partitioning for induction
Synthesis problem
Equivalence checking problem
13Integrated SEC Flow
- The following is the sequence of transformations
currently applied by the integrated SEC in ABC
(command dsec) - creating sequential miter (miter -c)
- PIs/POs are paired by name if some registers
have dont-care init values, they are converted
by adding new PIs and muxes all logic is
represented in the form of an AIG - sequential sweep (scl)
- removes logic that does not fanout into POs
- structural register sweep (scl -l)
- removes stuck-at-constant and combinationally-equi
valent registers - most forward retiming (retime M 1) (disabled
by switch r, e.g. dsec r) - moves all registers forward and computes new
initial state - partitioned register correspondence (lcorr)
- merges sequential equivalent registers
(completely solves SEC after retiming) - combinational SAT sweeping (fraig)
- merges combinational equivalent nodes before
running signal correspondence - for ( K 1 K ? 16 K K 2 )
- signal correspondence (ssw) // merges seq
equivalent signals by K-step induction - AIG rewriting (drw) //
minimizes and restructures combinational logic - most forward retiming // moves
registers forward after logic restructuring - sequential AIG simulation // targets
satisfiable SAT instances - post-processing (write_aiger)
14Example of PSS in ABC
- abc 01gt r iscas/blif/s38417.blif // reads in an
ISCAS89 benchmark - abc 02gt st ps // shows the AIG statistics after
structural hashing - s38417 i/o 28/ 106 lat 1636 and
9238 (exor 178) lev 31 - abc 03gt ssw K 1 -v // performs one round of
signal correspondence using simple induction - Initial fraiging time 0.27 sec
- Simulating 9096 AIG nodes for 32 cycles ... Time
0.06 sec - Original AIG 9096. Init 2 frames 84. Fraig
82. Time 0.01 sec - Before BMC Const 5031. Class 430. Lit
9173. - After BMC Const 5031. Class 430. Lit
9173. - 0 Const 5031. Class 430. L
9173. LR 1928. NR 3140. - 1 Const 4883. Class 479. L
8964. LR 1554. NR 2978. -
- 28 Const 145. Class 177. L
756. LR 198. NR 9099. - 29 Const 145. Class 176. L
753. LR 195. NR 9090. - SimWord 1. Round 2025. Mem 0.38 Mb.
LitBeg 9173. LitEnd 753. ( 8.21 ). - Proof 5022. Cex 2025. Fail 0. FailReal 0.
C-lim 10000000. ImpRatio 0.00 - NBeg 9096. NEnd 8213. (Gain 9.71 ).
RBeg 1636. REnd 1345. (Gain 17.79 ).
15Experimental Results
- Public benchmarks
- 25 test cases
- ITC 99 (b14, b15, b17, b20, b21, b22)
- ISCAS 89 (s13207, s35932, s38417, s38584)
- IWLS 05 (systemcaes, systemcdes, tv80,
usb_funct, vga_lcd, wb_conmax, wb_dma, ac97_ctrl,
aes_core, des_area, des_perf, ethernet, i2c,
mem_ctrl, pci_spoci_ctrl) - Industrial benchmarks
- 50 test cases
- Nothing else is known
- Workstation
- Intel Xeon 2-CPU 4-core, 8Gb RAM
16ABC Scripts
- Baseline
- choice if choice if choice if //
comb synthesis and mapping - Register correspondence (Reg Corr)
- scl l // structural register sweep
- lcorr // register correspondence using
partitioned induction - dsec r // SEC
- choice if choice if choice if //
comb synthesis and mapping - Signal correspondence (Sig Corr)
- scl l // structural register sweep
- lcorr // register correspondence using
partitioned induction - ssw // signal correspondence using
non-partitioned induction - dsec r // SEC
- choice if choice if choice if //
comb synthesis and mapping
17Public Benchmarks
Columns Baseline, Reg Corr and Sig Corr
show geometric means.
18ITC / ISCAS Benchmarks (details)
19IIWLS05 Benchmarks (details)
20ITC / ISCAS Benchmarks (runtime)
21IWLS05 Benchmarks (runtime)
22Industrial Benchmarks
In case of multiple clock domains, optimization
was applied only to the domain with the largest
number of registers.
23Future
- Continue tuning for scalability
- Speculative reduction
- Partitioning
- Experiment with new ideas
- Unique-state constraints
- Interpolate when induction fails
- Synthesizing equivalence
- Go beyond merging sequential equivalences
- Add logic restructuring using subsets of
unreachable states - Add retiming (improves delay on top of reg/area
reductions) - Add iteration (led to improvements in other
synthesis projects) - etc