Reducing Code Size with Runtime Decompression - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Reducing Code Size with Runtime Decompression

Description:

Reducing Code Size with Run-time Decompression. Charles Lefurgy, Eva Piccininni, ... Code bloat. Solution: code compression. Reduce compiled code size ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 20

Provided by: charles161

Category:

more less

Transcript and Presenter's Notes

Title: Reducing Code Size with Runtime Decompression

1
Reducing Code Size with Run-time Decompression

Charles Lefurgy, Eva Piccininni,
and Trevor Mudge
Advanced Computer Architecture Laboratory
Electrical Engineering and Computer Science Dept.
The University of Michigan, Ann Arbor
High-Performance Computer Architecture (HPCA-6)
January 10-12, 2000

2
Motivation

Problem embedded code size
Constraints cost, area, and power
Fit program in on-chip memory
Compilers vs. hand-coded assembly
Portability
Development costs
Code bloat
Solution code compression
Reduce compiled code size
Take advantage of instruction repetition
Implementation
Hardware or software?
Code size?
Execution speed?

ROM Program
RAM
CPU
I/O
Original Program
RAM
CPU
ROM
I/O
Compressed Program
Embedded Systems
3
Software decompression

Previous work
Decompression unit whole program Tauton91
No memory savings
Decompression unit procedures Kirovski97Ernst9
7
Requires large decompression memory
Fragmentation of decompression memory
Slow
Our work
Decompression unit 1 or 2 cache-lines
High performance focus
New profiling method

4
Dictionary compression algorithm

Goal fast decompression
Dictionary contains unique instructions
Replace program instructions with short index

32 bits
16 bits
32 bits
5
lw r15,r3
5
lw r15,r3
30
.dictionary segment
lw r15,r3
30
lw r15,r3
30
.text segment
.text segment (contains indices)
Original program
Compressed program
5
Decompression

Algorithm
1. I-cache miss invokes decompressor (exception
handler)
2. Fetch index
3. Fetch dictionary word
4. Place instruction in I-cache (special
instruction)
Write directly into I-cache
Decompressed instructions only exist in I-cache

Memory
?
?
?
Add r1,r2,r3
I-cache
Dictionary
Proc.
Indices
5
...
?
D-cache
6
CodePack

Overview
IBM
PowerPC
First system with instruction stream compression
Decompress during I-cache miss
Software CodePack

7
Compression ratio

CodePack 55 - 63
Dictionary 65 - 82

8
Simulation environment

SimpleScalar
Pipeline 5 stage, in-order
I-cache 16KB, 32B lines, 2-way
D-cache 8KB, 16B lines, 2-way
Memory 10 cycle latency, 2 cycle rate

9
Performance

CodePack very high overhead
Reduce overhead by reducing cache misses

10
Cache miss

Control slowdown by optimizing I-cache miss ratio

11
Selective compression

Hybrid programs
Only compress some procedures
Trade size for speed
Avoid decompression overhead
Profile methods
Count dynamic instructions
Example Thumb
Use when compressed code has more instructions
Reduce number of executed instructions
Count cache misses
Example CodePack
Use when compressed code has longer cache miss
latency
Reduce cache miss latency

New!
12
Cache miss profiling

Cache miss profile reduces overhead 50
Loop-oriented benchmarks benefit most
Approach performance of native code

13
CodePack vs. Dictionary

More compression may have better performance
CodePack has smaller size than Dictionary
compression
Even with some native code, CodePack is smaller
CodePack is faster due to using more native code

14
Conclusions

High-performance SW decompression possible
Dictionary faster than CodePack, but 5-25
compression ratio difference
Hardware support
I-cache miss exception
Store-instruction instruction
Tune performance by reducing cache misses
Cache size
Code placement
Selective compression
Use cache miss profile for loop-oriented
benchmarks
Code placement affects decompression overhead
Future unify code placement and compression

15
Web page
http//www.eecs.umich.edu/compress
16
Code placement
Original code
Memory
Whole compression
decompress region (in L1 cache only)
compressed code
Decompress
Same order
Selective compression
native region
compressed code
decompress region
Decompress
Different order!
17
Hardware or software decompression?

Hardware
Fast translation
Potential speedup
Tune compression for each benchmark
Software
Low cost
Re-target for new algorithms
New algorithm for each benchmark
Slow

18
CodePack encoding

32-bit insn is split into 2 16-bit words
Each 16-bit word compressed separately

Encoding for upper 16 bits
Encoding for lower 16 bits
Encodes zero
0
0
x
x
x
0
0
8
1
32
0
1
x
x
x
x
x
0
1
x
x
x
x
16
64
1
0
0
x
x
x
x
x
x
1
0
0
x
x
x
x
x
23
128
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
128
1
0
1
1
0
1
256
1
1
0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1
1
0
256
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1
1
1
1
1
1
Tag
Escape
Index
Raw bits
19
CodePack decompression
31
26
25
6
5
0
L1 I-cache miss address
Index table(in main memory)
Fetch index
Byte-aligned block address
Compressed bytes (in main memory)
Fetch compressed instructions
Compression Block(16 instructions)
Hi tag
Low tag
Low index
Hi index
1 compressed instruction
Decompress
High dictionary
Low dictionary
High 16-bits
Low 16-bits
Native Instruction

Write a Comment

User Comments (0)