Code Compression for VLIW Processors Using Variabletofixed Coding PowerPoint PPT Presentation

presentation player overlay
1 / 18
About This Presentation
Transcript and Presenter's Notes

Title: Code Compression for VLIW Processors Using Variabletofixed Coding


1
Code Compression for VLIW Processors Using
Variable-to-fixed Coding
  • Yuan Xie, Wayne Wolf, Haris Lekatsas
  • Princeton University ISSS02
  • 2004/07/03

2
Outline
  • Introduction
  • Related works
  • Compression algorithm
  • Decompression architecture
  • Power reduction for instruction bus
  • Experimental results
  • Conclusion and future work

3
Introduction
  • Important issues for embedded systems
  • Restricted memory size
  • poses serious constraints on program size.
  • Power consumption
  • busses in a typical IC consume half of total chip
    power.
  • Main contributions in this paper
  • presents novel code compression schemes.
  • based on variable-to-fixed (V2F) coding
  • makes decompressor design easier.
  • enables parallel decompression.
  • proposes a novel instruction bus power reduction
    scheme.

4
Related Works
  • Ishiura et al.
  • Instruction code compression for application
    specific VLIW processors based on automatic field
    partitioning (SASIMI97)
  • proposed dictionary-based schemes.
  • not feasible for modern VLIW processors
  • Y. Xie et al.
  • A code decompression architecture for VLIW
    processors (MICRO-34)
  • assumed modern VLIW processors which adapts a
    VLES (various length execution set) scheme.
  • extended present compression algorithms and
    proposed the decompression architecture for
    modern VLIW architectures.
  • Tunstall et al.
  • Synthesis of noiseless compression codes (PhD
    thesis, GIT)
  • investigated variable-to-fixed (V2F) coding.

5
Compression Algorithm- Memoryless V2F Coding
Algorithm (1)
  • Algorithm to construct N-bit
  • Tunstall codewords
  • Encoding example
  • 000 01 001 -gt 11 01 10

6
Compression Algorithm- Memoryless V2F Coding
Algorithm (2)
  • Two possible problems
  • end of block
  • problem
  • compression is done by block by block.
  • tree traversal may end at a non-leaf node.
  • solution
  • pads extra bits to the block for the traversal to
    meet the leaf node
  • extra bits can be simply truncated during
    decompression
  • byte alignment
  • problem
  • the compressed block must be byte aligned.
  • solution
  • a few extra bits are padded if the size of the
    compressed block is not multiple of 8 (in bits).

7
Compression Algorithm- Markov V2F Coding
Algorithm (1)
  • How to improve the compression ratio?
  • exploit the statistical dependencies among bits
    in the instructions.
  • use more complicated probability model.
  • Markov model is used in this paper.
  • Markov model
  • consists of
  • a number of states
  • transitions between states with certain
    probability
  • two main variables to describe proposed model
  • model depth should divide the instruction
    evenly or be multiples of the instruction size.
  • model width models ability to remember the
    path to a certain node

8
Compression Algorithm- Markov V2F Coding
Algorithm (2)
  • Example of 4X4 Markov model

Markov model
2-bit V2F coding tree and codebook for Markov
state 0
9
Compression Algorithm- Markov V2F Coding
Algorithm (3)
  • Code compression procedure
  • statistics-gathering phase
  • choose the width and depth for Markov model.
  • gather the probability for each transition by
    going through the whole program.
  • codebook construction phase
  • generate N-bit V2F length coding tree and
    codebook for each state.
  • M codebooks and 2N codewords per each codebook
    for a M-state Markov model.
  • codewords assignment can be arbitrary.
  • compression phase
  • traverse the coding tree for each state from the
    root until a leaf node is met.
  • produce the codewords related to the leaf node.
  • jump to the other coding tree indicated by the
    leaf node.

10
Decompression Architecture
  • Decoder
  • N-bit table (i.e. codebook) lookup unit
  • very small
  • less than 100 gates
  • the size is only 4um2. (TSMC 0.25 cell library)
  • Parallel decompression
  • memoryless V2F possible
  • all codewords in the compressed code are
    independent.
  • Markov V2F impossible
  • the codebook for the next N-bit chunk is known
    only after the current N-bit chunk is
    decompressed.

11
Power Reduction for Instruction Bus (1)
  • Codeword assignment
  • does not affect compression ratio.
  • can reduce bit toggling if it is done carefully.
  • power consumption on the bus ? bit toggles on the
    bus
  • Formulation
  • each codeword can be represented by Ci, Wj
  • Ci one of the M codebooks (i 1, 2, , M)
  • Wj one of the codewords in codebook Ci (j 1,
    2, , 2N)

12
Power Reduction for Instruction Bus (2)
  • Formulation
  • codeword transition graph
  • Ei specifying how many times the transition
    happens
  • Hi Hamming distance between two N-bit binary
    codewords
  • goal
  • find out the best codeword assignment to minimize
    the total bus toggles (i.e. the sum of HiEi).
  • proved to be an NP problem when M 1.

13
Power Reduction for Instruction Bus (3)
  • A greedy heuristic codeword assignment algorithm
  • 1. sort all the edges by weights in decreasing
    order
  • 2. for each edge, if either node is not assigned,
    assign valid codewords with minimal Hamming
    distance.
  • 3. go to step 2 until all nodes are assigned.

14
Experimental Results- Compression ratio using
memoryless V2F
Best compression ratio (72.7) when N 4
15
Experimental Results- Compression ratio using
Markov V2F
Average compression ratio is 56 when N 4.
16
Experimental Results- Instruction Bus Toggles
17
Conclusion and Future Work
  • VLIW code compression schemes using V2F are
    proposed.
  • A greedy codeword assignment algorithm are
    presented to reduce instruction bus toggles.
  • Future work
  • better heuristics algorithm for low power
    codeword assignment
  • the ASIC design of the decompression architecture

18
Very Long Instruction Word (VLIW) Architecture
  • A single instruction specifies more than one
    concurrent operation
  • The instruction is quite large.
  • VLIW processor relies on compiler to pack the
    operations into an instruction.
  • VLIW processor is not software compatible with
    any general purpose processor.
  • Compaction depends on the instruction level
    parallelism.
  • VLIW leads to simple hardware implementation
    (compared to superscalar).
Write a Comment
User Comments (0)
About PowerShow.com