Title: A BitmaskBased Code Compression Technique For Embedded Systems
1A Bitmask-Based Code Compression Technique For
Embedded Systems
- Seok-Won Seong and Prabhat Mishra
- Department of Computer and Information Science
and Engineering - University of Florida
November 6, 2006
2Overview
- Introduction
- Code Compression Techniques
- Traditional Dictionary Based Code Compression
- Hamming Distance Based Code Compression
- Our Approach Bitmask-Based Code Compression
- Compression Algorithm
- Decompression Mechanism
- Experiments
- Conclusion
3Embedded Systems
- Ubiquitous!
- Automobiles, ATMs, digital cameras, PDAs, and
cellular phones - Real-time performance at a low cost
- Compact in size
- Minimal power consumption
- MEMORY
- Major Design Constraint
- Cost size of system
- Power consumption
Code Compression
4Code Compression
Static Encoding (Offline)
Application Program (Binary)
Compression Algorithm
Dynamic Decoding (Online)
Processor (Fetch and Execute)
Compressed Code (Memory)
Decompression Engine
5Decompression Engine (DCE)
- Pre-cache design
- Between memory and cache
- Post-cache design
- Between cache and processor
- () Cache holds compressed data
- () Reduced bus bandwidth and higher cache hits
Main Memory
I-Cache
Processor
D-Cache
6Compression Ratio
- Compression ratio
- Smaller compression ratio is better
7Compression Techniques
- Dictionary-based approach
- Good compression ratio
- Fast decompression scheme
- Hamming distance approach
- Limited profitable only up to 3 bit mismatches
- Our approach
- Use bitmasks to maximize matching patterns
- Improve compression ratio even further
8Traditional Dictionary-Based Code Compression
Original Program
Compressed Program
Dictionary
Original code size 80 bits Compressed code size
62 bits Dictionary size 16 bits Compression
ratio (6216)/80 97.5
0000 0000 1000 0010 0000 0010 0100 0010 0100
1110 0101 0010 0000 1100 0100 0010 1100 0000 0000
0000
0 0 1 1000 0010 1 0000 0010 0 1 1 0100
1110 1 0101 0010 1 0000 1100 0 1 1 1100
0000 0 0
9Hamming Distance Approach
Original Program
Compressed Program
Dictionary
Original code size 80 bits Compressed code size
60 bits Dictionary size 16 bits Compression
ratio 95
0 1 0 1 1000 0010 0 0 110
0 0 1 1 1 0100 1110 0 0 001
0 1 0000 1100 0 1 1 1 1100
0000 0 1 0
0000 0000 1000 0010 0000 0010 0100 0010 0100
1110 0101 0010 0000 1100 0100 0010 1100 0000 0000
0000
10Overview
- Introduction
- Code Compression Techniques
- Traditional Dictionary Based Code Compression
- Hamming Distance Based Code Compression
- Our Approach Bitmask-Based Code Compression
- Compression Algorithm
- Decompression Mechanism
- Experiments
- Conclusion
11Cost-Benefit Analysis
- 4 bit mismatches on 32 bit vector
- Hamming distance approach
- 2 bits to indicate of mismatches
- 5 bits to indicate location of mismatch
- 2 54 22 bits
- Bitmasks approach (one 4-bit mask)
- 3 bits to indicate position (8 possible
locations) - 4 bits for mask pattern
- 3 4 7 bits
12Our Technique Using Bitmasks
- Traditional and hamming distance are inefficient!
- Introduction of bitmasks and handling mismatches
- Generate repeating patterns aggressively
- XOR operation simple and fast
0000 0000 0010 1101
In Dictionary 0000 0000 0010 0001
11
0000 0000 0011 0001
0001
1111 0100 0010 0001
111101
13Generating More Matches
ADD R1 R2 R1 ADD R1 R2 R2 ADD R1 R2 R3 ADD R1 R2
R4 ADD R1 R2 R1 SUB R1 R2 R1 MUL R1 R2
R1 COMP R1 R2 R1
14Bitmask Encoding
- 32-Bit instructions
- Format for uncompressed code
- Format for compressed code
Uncompressed Data (32 Bits)
Decision (1 Bit)
Decision (1 Bit)
Number of Masks
Dictionary Index
15Customized Encoding
- Up to two bitmasks is sufficient
- Optimize the generic encoding further
- Sample customized encodings
- Encoding 1 One 8-bit mask
- Encoding 2 Two 4-bit masks
- Encoding 3 4-bit and 8-bit masks
16Code Compression with Bitmasks
Original Program
Compressed Program
Dictionary
Original code size 80 bits Compressed code size
54 bits Dictionary size 16 bits Compression
ratio (5416)/80 87.5
0000 0000 1000 0010 0000 0010 0100 0010 0100
1110 0101 0010 0000 1100 0100 0010 1100 0000 0000
0000
0 1 0 0 0 00 11 1 0 0 11 10
0 0 1 1 0 0 10 11 1 0 0 01
01 1 0 0 10 11 0 0 1 1 0 0
00 11 0 0 1 0
17Compression Algorithm
- Algorithm Compression using bitmasks
- Input Original binary code divided into 32-bit
vectors - Output Compressed code and dictionary
- Begin
- Step1 Create the frequency distribution of
the vectors - Step 2 Create the dictionary based on Step 1
- Step 3 Compress each 32-bit vectors using cost
constraints - Step 4 Handle and adjust branch targets
- End
18Branch Targets
- Our approach for handling branches
- (Back)patch all the possible branches
- Create minimal mapping table for unpatchable ones
- ()Significant reduction of mapping table
- ()Fast retrieval of new target address
- More than 75 control flow instructions are
conditional branches (patchable) - 95 of the branches taken Do Not require the
mapping table
19Overview
- Introduction
- Code Compression Techniques
- Traditional Dictionary Based Code Compression
- Hamming Distance Based Code Compression
- Our Approach Bitmask-Based Code Compression
- Compression Algorithm
- Decompression Mechanism
- Experiments
- Conclusion
20Design of Decompression Engine (DCE)
- Goal of DCE design
- One instruction/clock cycle
- Minimal or no modifications
- Minimum power consumption
- Adopt the post-cache design
- Based on previous one-cycle design by Letkatsas
et al.
Code compression w/ dictionary-based and
post-cache design Up to 63 or 25 on average
of performance enhancement reported
21One-Cycle Decompression Engine
prev_comp
prev_decomp
Dictionary (SRAM)
Decoding Logic
MUX
Index
Compressed Code
Output Buffer
Uncompressed Code
- 8.5ns clock cycle constraint
- The critical path was 5.99ns
22Decompression Engine for Bitmask Encodings
- Generating a mask done in Parallel
- With accessing the dictionary
- XOR gate propagation delay (0.090.5ns)
- Many under 0.25ns
- 5.990.25 Satisfies 8.5ns Constraint
- Capable of decoding More than One Instructions
prev_comp
prev_decomp
Dictionary (SRAM)
Decoding Logic
MUX
Index
Compressed Code
Output Buffer
XOR
Mask
Uncompressed Code
23Experimental Results
- 1. Compression ratio for adpcm_en benchmark
- SPARC, TMS320C6x, and MIPS
- 2. Compression ratio for different benchmarks
- Mediabench adpcm, mpeg, jpeg,
- MiBench gsm, pegwit,
- 3. Comparison to other compression techniques
24Compression Ratio for adpcm_en
Encoding2 outperforms others.
- Encoding 1 (one 8-bit mask)
- Encoding 2 (two 4-bit masks)
- Encoding 3 (4-bit and 8-bit masks)
25Compression Ratio for Different Dictionary Size
- Compression ratio - 5567
- Smaller program small dictionary
- Bigger program big dictionary
26Comparison to Other Techniques
Smaller compression ratio is better
- Outperforms other dictionary-based techniques by
15 - Higher decompression bandwidth than best-known
compression techniques
27Conclusion and Future Work
- Memory - Major design constraint
- Code compression
- Dictionary-based code compression is popular
- Hamming distance technique
- Code compression with bitmasks
- 10-15 improved compression ratio
- Better performance (higher cache hits)
- Less power consumption
- Fast and simple decompression engine
- One or more instructions per cycle
- Parallel decompression
28Conclusion and Future Work
- Future work
- Optimal mask and dictionary selection techniques
- Compiler optimization
- Power saving/size analysis
- Applications in other domains
- Testing data compression
29Thank You!