Title: Codesigned On-Chip Logic Minimization
1Codesigned On-Chip Logic Minimization
- Roman Lysecky Frank Vahid
- Department of Computer Science and Engineering
- University of California, Riverside
- Also with the Center for Embedded Computer
Systems, UC Irvine - This work was supported in part by the National
Science Foundation, the Semiconductor Research
Corporation, and a Department of Education GAANN
fellowship
2Introduction(On-chip Logic Minimization)
MEM
Proc.
I
D
ARM7
DMA
MEM
System-On-Chip
On-chip Minimizer
3On-Chip Minimization Applications (IP Routing
Table Reduction)
- IP routing table reduction
- Routing tables of large network routers have over
30,000 entries - Fast IP routing lookup is difficult without using
large hardware resources - Ternary CAM (McAuley Francis, 1993)
- TCAM can be used to perform routing table lookup
in single cycle - Requires large resources and large power
consumption - Mask Extension (Liu, 2002)
- Uses two-level logic minimization to reduce the
size of the routing table - Good results but did not considering off-chip
communication
Incoming IP packet
Destination IP
138.23.16.9
Prefix
Next hop
Lookup IP in Routing Table
Longest Prefix Match
Port 7
4On-Chip Minimization Applications (Access
Control List Reduction)
- Access Control List (ACL)
- Used to restrict IP traffic through network
routers - ACL size can range anywhere from from 300 (UCR
CSE Dept.) to 10,000 (AOL) - Common use is to block a particular protocol or
port number to avoid attacks such as Denial of
Service attacks - ACL Minimization
- Similar approach as used for IP routing table
reduction - However, order of the list must be preserved
ACL Input Format
5On-Chip Minimization Applications (Dynamic
Hardware/Software Partitioning)
- Dynamic hardware/software partitioning
(JIT compilation for FPGAs) - Dynamically detects frequently executed loop and
re-implements the software loops using on-chip
configurable logic - Requires logic synthesis tools to embedded on-chip
Profiler
MIPS/ARM
I
Warp Processor
Warp Processor
Warp Processor
D
Dynamic Partitioning Module
Configurable Logic
Warp Processor
Warp Processor
Warp Processor
6ROCM
- On-chip Logic Minimization Requirements
- Limited data and instruction memory available
- Quality of results must still be close to optimal
- Execution time should remain reasonable
- On-chip Logic Minimization Goal
- Focus on developing an on-chip logic minimization
tool that produces acceptable results with
reasonable increases in execution time while
using limited memory resources - ROCM Riverside On-Chip Minimizer
- Two-level minimization tool
- Utilized a combination of approaches from
Espresso-II (Brayton, et al. 1984) and Presto
(Svoboda White, 1979) - Eliminate the need to computer the off-set to
reduce memory usage - Utilizes a single expand phase instead of
multiple iterations - On average only 2 larger than optimal solution
7ROCM Results(Performance/Memory Usage)
500 MHz Sun Ultra60
40 MHz ARM 7 (Triscend A7)
8Codesign ROCM(Hardware Coprocessor)
- Customized ROCM enables us to develop an
efficient hardware coprocessor - Profiled the execution of ROCM-32 and ROCM-128
using ARM port of the SimpleScalar simulator - Determine critical loops/functions that are
suitable for implementation in hardware - Identified six critical kernels that comprised
91 of the total execution time but only 2 of
the code size
9Codesign ROCM(Minimization Coprocessor)
ARM7
MEM
On-Chip Minimizer
10Codesign ROCM(Minimization Coprocessor)
data
addr
Proc/Mem Interface
Tautology.1
IsCov
SetLit
Cofactor.1
GetLit
Minimization Coprocessor
11Codesign ROCM Results(Execution Time)
12Codesign ROCM Results(Energy Consumption)
- Average energy reduction of 59.2
13Codesign ROCM(Minimization Coprocessor)
- Software modifications were required to achieve
speedup of 7.8 - Data structures/algorithms not suitable for
hardware implementation - Reorganized data structures
- Customized width of data items
- Eliminate memory allocation within critical
regions - Not automated with current hardware/software
partitioning tools
14Codesign ROCM(Minimization Coprocessor)
for(i0 iltF-gtnumImplicants i) if(
!DoesIntersect(implicant, xj) ) continue
for(k0 kltxj-gtnumLiterals k) // determine
coImplicant ... AddImplicant(cofacto
r, coImplicant)
Move to HW
Original C Code
15Codesign ROCM(Minimization Coprocessor)
// determine size of cofactor initially cofactorSi
ze 0 for(i0 iltF-gtnumImplicants i)
if( !DoesIntersect(implicant, xj) ) continue
cofactorSize // allocate all memory
outside of main loop cofactor-gtimplicants
malloc() for(i0 iltF-gtnumImplicants i)
if( !DoesIntersect(implicant, xj) )
continue for(k0 kltxj-gtnumLiterals k)
// additional initialization code need
for each iterations coImplicant
(cofactor-gtimplicantsindex) ...
// determine size of cofactor initially
// allocate all memory outside of main loop
// additional initialization code need for each
iterations
Modified C Code
16Conclusions Future Work
- Developed codesigned on-chip logic minimization
- Performance improvement of nearly 8X compared to
earlier software only implementation - Energy reduction of almost 60
- New directions in hardware/software partitioning
- Designer effort was required to rewrite
algorithms and fine tune data structures - Could better hardware/software partitioning tools
automate this?