Codesigned On-Chip Logic Minimization - PowerPoint PPT Presentation

About This Presentation

Title:

Codesigned On-Chip Logic Minimization

Description:

Codesigned On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside – PowerPoint PPT presentation

Number of Views:116

Avg rating:3.0/5.0

Slides: 17

Provided by: rly9

Learn more at: https://uweb.engr.arizona.edu

Category:

more less

Transcript and Presenter's Notes

Title: Codesigned On-Chip Logic Minimization

1
Codesigned On-Chip Logic Minimization

Roman Lysecky Frank Vahid
Department of Computer Science and Engineering
University of California, Riverside
Also with the Center for Embedded Computer
Systems, UC Irvine
This work was supported in part by the National
Science Foundation, the Semiconductor Research
Corporation, and a Department of Education GAANN
fellowship

2
Introduction(On-chip Logic Minimization)
MEM
Proc.
I
D
ARM7
DMA
MEM
System-On-Chip
On-chip Minimizer
3
On-Chip Minimization Applications (IP Routing
Table Reduction)

IP routing table reduction
Routing tables of large network routers have over
30,000 entries
Fast IP routing lookup is difficult without using
large hardware resources
Ternary CAM (McAuley Francis, 1993)
TCAM can be used to perform routing table lookup
in single cycle
Requires large resources and large power
consumption
Mask Extension (Liu, 2002)
Uses two-level logic minimization to reduce the
size of the routing table
Good results but did not considering off-chip
communication

Incoming IP packet
Destination IP
138.23.16.9
Prefix
Next hop
Lookup IP in Routing Table
Longest Prefix Match
Port 7
4
On-Chip Minimization Applications (Access
Control List Reduction)

Access Control List (ACL)
Used to restrict IP traffic through network
routers
ACL size can range anywhere from from 300 (UCR
CSE Dept.) to 10,000 (AOL)
Common use is to block a particular protocol or
port number to avoid attacks such as Denial of
Service attacks
ACL Minimization
Similar approach as used for IP routing table
reduction
However, order of the list must be preserved

ACL Input Format
5
On-Chip Minimization Applications (Dynamic
Hardware/Software Partitioning)

Dynamic hardware/software partitioning
(JIT compilation for FPGAs)
Dynamically detects frequently executed loop and
re-implements the software loops using on-chip
configurable logic
Requires logic synthesis tools to embedded on-chip

Profiler
MIPS/ARM
I
Warp Processor
Warp Processor
Warp Processor
D
Dynamic Partitioning Module
Configurable Logic
Warp Processor
Warp Processor
Warp Processor
6
ROCM

On-chip Logic Minimization Requirements
Limited data and instruction memory available
Quality of results must still be close to optimal
Execution time should remain reasonable
On-chip Logic Minimization Goal
Focus on developing an on-chip logic minimization
tool that produces acceptable results with
reasonable increases in execution time while
using limited memory resources
ROCM Riverside On-Chip Minimizer
Two-level minimization tool
Utilized a combination of approaches from
Espresso-II (Brayton, et al. 1984) and Presto
(Svoboda White, 1979)
Eliminate the need to computer the off-set to
reduce memory usage
Utilizes a single expand phase instead of
multiple iterations
On average only 2 larger than optimal solution

7
ROCM Results(Performance/Memory Usage)
500 MHz Sun Ultra60
40 MHz ARM 7 (Triscend A7)
8
Codesign ROCM(Hardware Coprocessor)

Customized ROCM enables us to develop an
efficient hardware coprocessor
Profiled the execution of ROCM-32 and ROCM-128
using ARM port of the SimpleScalar simulator
Determine critical loops/functions that are
suitable for implementation in hardware
Identified six critical kernels that comprised
91 of the total execution time but only 2 of
the code size

9
Codesign ROCM(Minimization Coprocessor)

ARM7
MEM
On-Chip Minimizer
10
Codesign ROCM(Minimization Coprocessor)
data
addr
Proc/Mem Interface
Tautology.1
IsCov
SetLit
Cofactor.1
GetLit
Minimization Coprocessor
11
Codesign ROCM Results(Execution Time)

Average speedup of 7.8

12
Codesign ROCM Results(Energy Consumption)

Average energy reduction of 59.2

13
Codesign ROCM(Minimization Coprocessor)

Software modifications were required to achieve
speedup of 7.8
Data structures/algorithms not suitable for
hardware implementation
Reorganized data structures
Customized width of data items
Eliminate memory allocation within critical
regions
Not automated with current hardware/software
partitioning tools

14
Codesign ROCM(Minimization Coprocessor)
for(i0 iltF-gtnumImplicants i) if(
!DoesIntersect(implicant, xj) ) continue
for(k0 kltxj-gtnumLiterals k) // determine
coImplicant ... AddImplicant(cofacto
r, coImplicant)
Move to HW
Original C Code
15
Codesign ROCM(Minimization Coprocessor)
// determine size of cofactor initially cofactorSi
ze 0 for(i0 iltF-gtnumImplicants i)
if( !DoesIntersect(implicant, xj) ) continue
cofactorSize // allocate all memory
outside of main loop cofactor-gtimplicants
malloc() for(i0 iltF-gtnumImplicants i)
if( !DoesIntersect(implicant, xj) )
continue for(k0 kltxj-gtnumLiterals k)
// additional initialization code need
for each iterations coImplicant
(cofactor-gtimplicantsindex) ...

// determine size of cofactor initially
// allocate all memory outside of main loop
// additional initialization code need for each
iterations
Modified C Code
16
Conclusions Future Work

Developed codesigned on-chip logic minimization
Performance improvement of nearly 8X compared to
earlier software only implementation
Energy reduction of almost 60
New directions in hardware/software partitioning
Designer effort was required to rewrite
algorithms and fine tune data structures
Could better hardware/software partitioning tools
automate this?

Write a Comment

User Comments (0)