Title: Code Compression for VLIW Processors Using Variabletofixed Coding
1Code Compression for VLIW Processors Using
Variable-to-fixed Coding
- Yuan Xie, Wayne Wolf, Haris Lekatsas
- Princeton University ISSS02
- 2004/07/03
2Outline
- Introduction
- Related works
- Compression algorithm
- Decompression architecture
- Power reduction for instruction bus
- Experimental results
- Conclusion and future work
3Introduction
- Important issues for embedded systems
- Restricted memory size
- poses serious constraints on program size.
- Power consumption
- busses in a typical IC consume half of total chip
power. - Main contributions in this paper
- presents novel code compression schemes.
- based on variable-to-fixed (V2F) coding
- makes decompressor design easier.
- enables parallel decompression.
- proposes a novel instruction bus power reduction
scheme.
4Related Works
- Ishiura et al.
- Instruction code compression for application
specific VLIW processors based on automatic field
partitioning (SASIMI97) - proposed dictionary-based schemes.
- not feasible for modern VLIW processors
- Y. Xie et al.
- A code decompression architecture for VLIW
processors (MICRO-34) - assumed modern VLIW processors which adapts a
VLES (various length execution set) scheme. - extended present compression algorithms and
proposed the decompression architecture for
modern VLIW architectures. - Tunstall et al.
- Synthesis of noiseless compression codes (PhD
thesis, GIT) - investigated variable-to-fixed (V2F) coding.
5Compression Algorithm- Memoryless V2F Coding
Algorithm (1)
- Algorithm to construct N-bit
- Tunstall codewords
- Encoding example
- 000 01 001 -gt 11 01 10
6Compression Algorithm- Memoryless V2F Coding
Algorithm (2)
- Two possible problems
- end of block
- problem
- compression is done by block by block.
- tree traversal may end at a non-leaf node.
- solution
- pads extra bits to the block for the traversal to
meet the leaf node - extra bits can be simply truncated during
decompression - byte alignment
- problem
- the compressed block must be byte aligned.
- solution
- a few extra bits are padded if the size of the
compressed block is not multiple of 8 (in bits).
7Compression Algorithm- Markov V2F Coding
Algorithm (1)
- How to improve the compression ratio?
- exploit the statistical dependencies among bits
in the instructions. - use more complicated probability model.
- Markov model is used in this paper.
- Markov model
- consists of
- a number of states
- transitions between states with certain
probability - two main variables to describe proposed model
- model depth should divide the instruction
evenly or be multiples of the instruction size. - model width models ability to remember the
path to a certain node
8Compression Algorithm- Markov V2F Coding
Algorithm (2)
- Example of 4X4 Markov model
Markov model
2-bit V2F coding tree and codebook for Markov
state 0
9Compression Algorithm- Markov V2F Coding
Algorithm (3)
- Code compression procedure
- statistics-gathering phase
- choose the width and depth for Markov model.
- gather the probability for each transition by
going through the whole program. - codebook construction phase
- generate N-bit V2F length coding tree and
codebook for each state. - M codebooks and 2N codewords per each codebook
for a M-state Markov model. - codewords assignment can be arbitrary.
- compression phase
- traverse the coding tree for each state from the
root until a leaf node is met. - produce the codewords related to the leaf node.
- jump to the other coding tree indicated by the
leaf node.
10Decompression Architecture
- Decoder
- N-bit table (i.e. codebook) lookup unit
- very small
- less than 100 gates
- the size is only 4um2. (TSMC 0.25 cell library)
- Parallel decompression
- memoryless V2F possible
- all codewords in the compressed code are
independent. - Markov V2F impossible
- the codebook for the next N-bit chunk is known
only after the current N-bit chunk is
decompressed.
11Power Reduction for Instruction Bus (1)
- Codeword assignment
- does not affect compression ratio.
- can reduce bit toggling if it is done carefully.
- power consumption on the bus ? bit toggles on the
bus - Formulation
- each codeword can be represented by Ci, Wj
- Ci one of the M codebooks (i 1, 2, , M)
- Wj one of the codewords in codebook Ci (j 1,
2, , 2N)
12Power Reduction for Instruction Bus (2)
- Formulation
- codeword transition graph
- Ei specifying how many times the transition
happens - Hi Hamming distance between two N-bit binary
codewords - goal
- find out the best codeword assignment to minimize
the total bus toggles (i.e. the sum of HiEi). - proved to be an NP problem when M 1.
13Power Reduction for Instruction Bus (3)
- A greedy heuristic codeword assignment algorithm
- 1. sort all the edges by weights in decreasing
order - 2. for each edge, if either node is not assigned,
assign valid codewords with minimal Hamming
distance. - 3. go to step 2 until all nodes are assigned.
14Experimental Results- Compression ratio using
memoryless V2F
Best compression ratio (72.7) when N 4
15Experimental Results- Compression ratio using
Markov V2F
Average compression ratio is 56 when N 4.
16Experimental Results- Instruction Bus Toggles
17Conclusion and Future Work
- VLIW code compression schemes using V2F are
proposed. - A greedy codeword assignment algorithm are
presented to reduce instruction bus toggles. - Future work
- better heuristics algorithm for low power
codeword assignment - the ASIC design of the decompression architecture
18Very Long Instruction Word (VLIW) Architecture
- A single instruction specifies more than one
concurrent operation - The instruction is quite large.
- VLIW processor relies on compiler to pack the
operations into an instruction. - VLIW processor is not software compatible with
any general purpose processor. - Compaction depends on the instruction level
parallelism. - VLIW leads to simple hardware implementation
(compared to superscalar).