A BitmaskBased Code Compression Technique For Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

A BitmaskBased Code Compression Technique For Embedded Systems

Description:

Automobiles, ATMs, digital cameras, PDAs, and cellular phones ... Applications in other domains. Testing data compression. Thank You! Questions? ... – PowerPoint PPT presentation

Number of Views:416
Avg rating:3.0/5.0
Slides: 30
Provided by: cise8
Learn more at: https://www.cise.ufl.edu
Category:

less

Transcript and Presenter's Notes

Title: A BitmaskBased Code Compression Technique For Embedded Systems


1
A Bitmask-Based Code Compression Technique For
Embedded Systems
  • Seok-Won Seong and Prabhat Mishra
  • Department of Computer and Information Science
    and Engineering
  • University of Florida

November 6, 2006
2
Overview
  • Introduction
  • Code Compression Techniques
  • Traditional Dictionary Based Code Compression
  • Hamming Distance Based Code Compression
  • Our Approach Bitmask-Based Code Compression
  • Compression Algorithm
  • Decompression Mechanism
  • Experiments
  • Conclusion

3
Embedded Systems
  • Ubiquitous!
  • Automobiles, ATMs, digital cameras, PDAs, and
    cellular phones
  • Real-time performance at a low cost
  • Compact in size
  • Minimal power consumption
  • MEMORY
  • Major Design Constraint
  • Cost size of system
  • Power consumption

Code Compression
4
Code Compression
Static Encoding (Offline)
Application Program (Binary)
Compression Algorithm
Dynamic Decoding (Online)
Processor (Fetch and Execute)
Compressed Code (Memory)
Decompression Engine
5
Decompression Engine (DCE)
  • Pre-cache design
  • Between memory and cache
  • Post-cache design
  • Between cache and processor
  • () Cache holds compressed data
  • () Reduced bus bandwidth and higher cache hits

Main Memory
I-Cache
Processor
D-Cache
6
Compression Ratio
  • Compression ratio
  • Smaller compression ratio is better

7
Compression Techniques
  • Dictionary-based approach
  • Good compression ratio
  • Fast decompression scheme
  • Hamming distance approach
  • Limited profitable only up to 3 bit mismatches
  • Our approach
  • Use bitmasks to maximize matching patterns
  • Improve compression ratio even further

8
Traditional Dictionary-Based Code Compression
Original Program
Compressed Program
Dictionary
Original code size 80 bits Compressed code size
62 bits Dictionary size 16 bits Compression
ratio (6216)/80 97.5
0000 0000 1000 0010 0000 0010 0100 0010 0100
1110 0101 0010 0000 1100 0100 0010 1100 0000 0000
0000
0 0 1 1000 0010 1 0000 0010 0 1 1 0100
1110 1 0101 0010 1 0000 1100 0 1 1 1100
0000 0 0
9
Hamming Distance Approach
Original Program
Compressed Program
Dictionary
Original code size 80 bits Compressed code size
60 bits Dictionary size 16 bits Compression
ratio 95
0 1 0 1 1000 0010 0 0 110
0 0 1 1 1 0100 1110 0 0 001
0 1 0000 1100 0 1 1 1 1100
0000 0 1 0
0000 0000 1000 0010 0000 0010 0100 0010 0100
1110 0101 0010 0000 1100 0100 0010 1100 0000 0000
0000
10
Overview
  • Introduction
  • Code Compression Techniques
  • Traditional Dictionary Based Code Compression
  • Hamming Distance Based Code Compression
  • Our Approach Bitmask-Based Code Compression
  • Compression Algorithm
  • Decompression Mechanism
  • Experiments
  • Conclusion

11
Cost-Benefit Analysis
  • 4 bit mismatches on 32 bit vector
  • Hamming distance approach
  • 2 bits to indicate of mismatches
  • 5 bits to indicate location of mismatch
  • 2 54 22 bits
  • Bitmasks approach (one 4-bit mask)
  • 3 bits to indicate position (8 possible
    locations)
  • 4 bits for mask pattern
  • 3 4 7 bits

12
Our Technique Using Bitmasks
  • Traditional and hamming distance are inefficient!
  • Introduction of bitmasks and handling mismatches
  • Generate repeating patterns aggressively
  • XOR operation simple and fast

0000 0000 0010 1101
In Dictionary 0000 0000 0010 0001
11
0000 0000 0011 0001
0001
1111 0100 0010 0001
111101
13
Generating More Matches
ADD R1 R2 R1 ADD R1 R2 R2 ADD R1 R2 R3 ADD R1 R2
R4 ADD R1 R2 R1 SUB R1 R2 R1 MUL R1 R2
R1 COMP R1 R2 R1
14
Bitmask Encoding
  • 32-Bit instructions
  • Format for uncompressed code
  • Format for compressed code

Uncompressed Data (32 Bits)
Decision (1 Bit)
Decision (1 Bit)
Number of Masks
Dictionary Index
15
Customized Encoding
  • Up to two bitmasks is sufficient
  • Optimize the generic encoding further
  • Sample customized encodings
  • Encoding 1 One 8-bit mask
  • Encoding 2 Two 4-bit masks
  • Encoding 3 4-bit and 8-bit masks

16
Code Compression with Bitmasks
Original Program
Compressed Program
Dictionary
Original code size 80 bits Compressed code size
54 bits Dictionary size 16 bits Compression
ratio (5416)/80 87.5
0000 0000 1000 0010 0000 0010 0100 0010 0100
1110 0101 0010 0000 1100 0100 0010 1100 0000 0000
0000
0 1 0 0 0 00 11 1 0 0 11 10
0 0 1 1 0 0 10 11 1 0 0 01
01 1 0 0 10 11 0 0 1 1 0 0
00 11 0 0 1 0
17
Compression Algorithm
  • Algorithm Compression using bitmasks
  • Input Original binary code divided into 32-bit
    vectors
  • Output Compressed code and dictionary
  • Begin
  • Step1 Create the frequency distribution of
    the vectors
  • Step 2 Create the dictionary based on Step 1
  • Step 3 Compress each 32-bit vectors using cost
    constraints
  • Step 4 Handle and adjust branch targets
  • End

18
Branch Targets
  • Our approach for handling branches
  • (Back)patch all the possible branches
  • Create minimal mapping table for unpatchable ones
  • ()Significant reduction of mapping table
  • ()Fast retrieval of new target address
  • More than 75 control flow instructions are
    conditional branches (patchable)
  • 95 of the branches taken Do Not require the
    mapping table

19
Overview
  • Introduction
  • Code Compression Techniques
  • Traditional Dictionary Based Code Compression
  • Hamming Distance Based Code Compression
  • Our Approach Bitmask-Based Code Compression
  • Compression Algorithm
  • Decompression Mechanism
  • Experiments
  • Conclusion

20
Design of Decompression Engine (DCE)
  • Goal of DCE design
  • One instruction/clock cycle
  • Minimal or no modifications
  • Minimum power consumption
  • Adopt the post-cache design
  • Based on previous one-cycle design by Letkatsas
    et al.

Code compression w/ dictionary-based and
post-cache design Up to 63 or 25 on average
of performance enhancement reported
21
One-Cycle Decompression Engine
prev_comp
prev_decomp
Dictionary (SRAM)
Decoding Logic
MUX
Index
Compressed Code
Output Buffer
Uncompressed Code
  • 8.5ns clock cycle constraint
  • The critical path was 5.99ns

22
Decompression Engine for Bitmask Encodings
  • Generating a mask done in Parallel
  • With accessing the dictionary
  • XOR gate propagation delay (0.090.5ns)
  • Many under 0.25ns
  • 5.990.25 Satisfies 8.5ns Constraint
  • Capable of decoding More than One Instructions

prev_comp
prev_decomp
Dictionary (SRAM)
Decoding Logic
MUX
Index
Compressed Code
Output Buffer
XOR
Mask
Uncompressed Code
23
Experimental Results
  • 1. Compression ratio for adpcm_en benchmark
  • SPARC, TMS320C6x, and MIPS
  • 2. Compression ratio for different benchmarks
  • Mediabench adpcm, mpeg, jpeg,
  • MiBench gsm, pegwit,
  • 3. Comparison to other compression techniques

24
Compression Ratio for adpcm_en
Encoding2 outperforms others.
  • Encoding 1 (one 8-bit mask)
  • Encoding 2 (two 4-bit masks)
  • Encoding 3 (4-bit and 8-bit masks)

25
Compression Ratio for Different Dictionary Size
  • Compression ratio - 5567
  • Smaller program small dictionary
  • Bigger program big dictionary

26
Comparison to Other Techniques
Smaller compression ratio is better
  • Outperforms other dictionary-based techniques by
    15
  • Higher decompression bandwidth than best-known
    compression techniques

27
Conclusion and Future Work
  • Memory - Major design constraint
  • Code compression
  • Dictionary-based code compression is popular
  • Hamming distance technique
  • Code compression with bitmasks
  • 10-15 improved compression ratio
  • Better performance (higher cache hits)
  • Less power consumption
  • Fast and simple decompression engine
  • One or more instructions per cycle
  • Parallel decompression

28
Conclusion and Future Work
  • Future work
  • Optimal mask and dictionary selection techniques
  • Compiler optimization
  • Power saving/size analysis
  • Applications in other domains
  • Testing data compression

29
Thank You!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com