Codesign Extended Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Codesign Extended Applications

Description:

Codesign Extended Applications Brian Grattan, Greg Stitt, Frank Vahid* Dept of Computer Science & Engineering University of California, Riverside – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 22
Provided by: FrankV162
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Codesign Extended Applications


1
Codesign Extended Applications
  • Brian Grattan, Greg Stitt, Frank Vahid
  • Dept of Computer Science Engineering
  • University of California, Riverside
  • Also with the Center for Embedded Computer
    Systems at UC Irvine
  • This work was supported in part by the National
    Science Foundation and by NEC CC Research Labs

2
Outline
  • Introduction Hardware/Software Partitioning
  • And the common assumption of a single
    specification
  • Different Algorithms in Hardware/Software
  • Codesign Extended Applications
  • Experiments
  • Future Work and Conclusions

3
Introduction Hw/Sw Partitioning
  • Hw/sw partitioning can speedup software
  • Shown by numerous researchers
  • E.g., Balboni, Fornaciari, Sciuto CODES96 Eles,
    Peng, Kuchchinski, Doboli DAES97 Gajski, Vahid,
    Narayan, Gong Prentice-Hall 1997 Grode, Knudsen,
    Madsen DATE98 many others
  • 1.5 to 10x common
  • Some examples like image processing get 100-800x
    speedup
  • E.g., Cameron project, FCCM02
  • Can reduce energy too
  • E.g.
  • Henkel, Li CODES98
  • Wan, Ichikawa, Lidsky, Rabaey CICC98
  • Stitt, Grattan, Villarreal, Vahid FCCM02
  • 60-80 energy savings measured on real
    single-chip uP/FPGA devices

4
Hw/Sw Partitioning on Single-Chip Platforms
Configurable logic
  • Numerous single-chip commercial devices with uP
    and FPGA
  • Triscend E5 (shown)
  • Triscend A7
  • Atmel FPSLIC
  • Xilinx Virtex II Pro
  • Altera Excalibur
  • More sure to come
  • Make hw/sw partitioning even more attractive

uP and peripherals
Cache/memory
5
Hw/Sw Partitioning Commercial Tools Evolving
  • Commercial products evolving
  • Synopsys Nimble compiler (2000) attempt
  • Proceler
  • Microprocessor Reports 2001 Technology of the
    Year Award
  • Others coming

6
Hw/Sw Partitioning Single-Spec Assumption
  • Assumption Start from a single specification
  • Typically sw source
  • Partitioning
  • Find critical sw kernels, map some to hw
  • This assumption is made in most research efforts
    as well as commercial tools

Specification
Hw/sw partitioner
Sw
Hw
Compilation
Synthesis
Binaries
Netlists
7
Digital Camera Example
  • Developed with intent of exploring hw/sw
    tradeoffs
  • Captures images, compresses, uploads to PC
  • Soon found that a single specification wasnt
    reasonable
  • Two key functions had different hw/sw algorithms
  • CRC
  • DCT

DCT
Huffman encoder
DCT
Huffman
Encoder
Controller
Controller
Communications
CCD
CCD Pre-Processor
CRC
CRC
Pre
-
Process
calculation
8
Digital Camera Example
  • Results in weak hw design
  • We would have written CRC and DCT differently had
    we known theyd be mapped to hw
  • Yet, wed keep the original algorithms if they
    ended up in software

Spec DCT, Huffman, CRC, CCD, Ctrl
Hw/sw partitioner
Sw Huff., CCD, Ctrl
Hw CRC, DCT
Compilation
Synthesis
Binaries
Netlists
Weak
9
Different Algorithms in Hw vs. Sw
  • The single-specification assumption doesnt
    always hold
  • Key observation
  • Designers often use very different algorithms if
    a behavior is mapped to hardware versus if that
    behavior is mapped to software
  • Widely known by designers
  • In textbooks
  • Also known in parallel processing sequential
    and parallel algorithms

10
Different Algorithms Sorting Example
  • Suppose desired behavior fills a buffer, sorts
    the buffer, and transmits the sorted list
  • Fill()
  • Sort()
  • Transmit()
  • Sort() in software QuickSort
  • Simple and fast in sw
  • Poor in hw, cant be parallelized well
  • Sort() in hardware Parallel Mergesort
  • Very fast in hardware
  • Slow in sw (if sequential) due to overhead
  • Derive one from the other?

Quicksort
MS
MS
MS
MS
MS
MS

11
Different Algorithms CRC Example
  • CRC Cyclic Redundancy Check
  • Used for error checking during communication,
    stronger than parity
  • Mathematically, divides a constant into the data
    and saves the remainder

 
Main Function calls crc() with
parameters init_crc-initial value data-pointer
to data len-length of data jinit-initializing
options    
crc() returns value of CRC for given data
crc/data/data/data
12
Different Algorithms CRC in Hardware
  • char crc_hw()
  • unsigned short j , crc_value init_crc
  • unsigned short new_crc_value
  • if (jinit gt 0) crc_value((uchar) jinit)
    (((uchar) jinit) ltlt 8)
  • for (j1jltlenj)
  • new_crc_value bit(4,dataj)
    bit(0,dataj) bit(8,crc_value)
    bit(12,crc_value) // bit 0
  • new_crc_value new_crc_value
    (bit(5,dataj)bit(1,dataj)bit(9,crc_value)bi
    t(13,crc_value))ltlt1
  • new_crc_value new_crc_value
    (bit(6,dataj)bit(2,dataj)bit(10,crc_value)b
    it(14,crc_value))ltlt 2
  • . continue for bits 3 through 7
  • .
  • return (new_crc_value)
  • Hardware Version
  • Knowing the generator polynomial, one can
    calculate the XORs for each individual bit
  • Each CRC value is the result of bit-wise XORs
    with the data and the previous CRC value
  • Synthesizes to hw very nicely but getting bits
    and shifting are inefficient in sw

13
Different Algorithms CRC in Software
  • Software Version
  • Before doing any calculations, create an
    initialization table that calculates the CRC for
    each individual character
  • Use data as index into initialization table and
    execute two XORs
  • Requires lookups, but faster for a sequential
    calculation

char crc_sw() // Source Numerical Recipes in
C unsigned short initialize_table(unsigned
short crc, unsigned char one_char) static
unsigned short icrctb256 unsigned short
tmp1, j , crc_value init_crc if (!init)
init1 for (j0jlt255j)
icrctbjinitialize_table(j ltlt 8,(uchar)0)
if (jinit gt 0) crc_value((uchar)
jinit) (((uchar) jinit) ltlt 8) for
(j1jltlenj) tmp1 dataj
HIBYTE(crc_value) crc_value
icrctbtmp1 LOBYTE(crc_value) ltlt 8
return (crc_value)
14
Different Algorithms -- DCT
  • DCT Discrete Cosine Transform
  • Computationally intensive, numerous matrix
    multiplies
  • Accounts for perhaps 70 of JPEG encoding time
  • Dozens of possible algorithms
  • Best algorithm depends largely on computational
    resources
  • Certainly different for sw and hw
  • Doing multiplications in floating-point vs.
    fixed-point
  • Multiplication by a constant can be efficiently
    mapped to hardware, but accuracy will be lost by
    not using floating-point

15
Codesign Extended Applications (CEAs)
main() crc() char
crc() ifdef cea_crc_hw crc_hw() else
crc_sw() endif gcc Dcea_crc_hw main.c
  • Basic idea
  • Write two versions of certain functions
  • Only the critical functions, and
  • Only those with different sw and hw algorithms
  • Typically only a handful of these
  • Most time is spent in just a few critical
    functions
  • Include both function versions in the
    specification
  • But use compiler flags to include either sw or hw
    version

16
CEAs when using C/C and VHDL
VHDL code if (rst '1') then crc lt
"0000000000000000" done lt '0' elsif
(clk'event and clk '1') then if (enable
'1') then if done '0' then crc lt
nextCRC16_D8(input,crc) done lt '1'
end if else done lt '0'
output lt crc end if end if
  • C code
  • crc_hw(inputs)
  • / Hardware crc... /
  • for (j1jltlenj)
  • TSHORT(to_hw) dataj)
  • TBYTE(enable) 1
  • TBYTE(enable) 0
  • crc_valueTSHORT(result)
  • return (crc_value)

17
CEAs Enable Hw/Sw Partitioning Tool
Specification
  • Traditional hw/sw partitioner
  • Compiler, estimators, search heuristics,
    technology files, etc.
  • Drawback heavy impact on tool flow
  • CEAs plus platforms result in simple partitioner
  • Script uses existing compiler, synthesis, and
    evaluation (simulation or physical measurement)
  • Drawbacks must write two versions of critical
    functions, script may use simpler search function
  • Different partitioners for different domains

Essentially a compiler, search heuristic, and
estimator. Heavy-duty tool.
Hw/sw partitioner
Sw
Hw
Compilation
Synthesis
Binaries
Netlists
CEA
Search heuristic and tool control. Lightweight
tool.
Script
Sw
Hw
Compilation
Synthesis
Binaries
Netlists
Evaluator
18
Experiments
Sw and hw CRC algorithms in FPGA.
  • Compared hw and sw CRC algorithms
  • Synthesized to FPGA
  • Compiled to MIPS uP
  • Demonstrates need for different algorithms

Sw and hw CRC algorithms on a microprocessor.
19
Experiments
  • Wrote small signal processing example as CEA
  • Wrote sw and hw versions of core functions
  • In this case, algorithms were similar
  • Setup power measurement for two real platforms
  • XS40 (board with microcontroller chip and Xilinx
    FPGA chip)
  • E5 (single chip with microcontroller and FPGA)
  • Partitioning script automatically partitioned and
    measured power and cycles (overnight due to
    place route time)
  • Demonstrates how CEAs enable simple yet practical
    hw/sw partitioning
  • Easily migrates to different platforms, different
    chips

20
Issues and Future Work
  • Issues
  • What if hw versions not used after partitioning?
    Wasted effort?
  • Verification of all possible combinations?
  • Must use wisely or problem grows unwieldy
  • Future work
  • More examples, more platforms
  • Several versions of the same function
  • One hardware area-conscious
  • One hardware speed-conscious
  • One software code-size-conscious
  • One software speed-conscious
  • more
  • Experimenting with communication between hardware
    and software
  • DMA transfer, wide-access memories,

21
Conclusions
  • Basic hw/sw partitioning assumption of a single
    specification doesnt always hold
  • Codesign Extended Applications help support
    different algorithms
  • CEAs enable hw/sw partitioning in existing tool
    flows
  • Utilizes existing compilation, synthesis,
    mapping, evaluation tools, and platforms
  • Simple yet effective approach to hw/sw
    partitioning
Write a Comment
User Comments (0)
About PowerShow.com