Another Performance Evaluation of Memory Hierarchy in Embedded Systems

About This Presentation

Title:

Another Performance Evaluation of Memory Hierarchy in Embedded Systems

Description:

Related Work. Problem Statement. Proposed Solutions. Experimental Setup. Experimental Results ... Pseudo-LRU techniques perform as well as LRU for data caches ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 21

Provided by: nelson90

Learn more at: http://www.ece.uah.edu

Category:

more less

Transcript and Presenter's Notes

Title: Another Performance Evaluation of Memory Hierarchy in Embedded Systems

1
Another Performance Evaluation of Memory
Hierarchy in Embedded Systems

Nelson Barnes
CPE 631
04/14/03

2
Outline

Introduction
Related Work
Problem Statement
Proposed Solutions
Experimental Setup
Experimental Results
Conclusions

3
Introduction

Why is cache design so important in embedded
systems?

4
Cache Design Parameters

Cache organization
Unified vs. Split (Instruction Data) caches
Cache size
Cache block (line) size
Block placement policy
Direct-mapped, fully-associative, set-associative
Block replacement policy
Random, Least-Recently Used (LRU), Round-robin,
Pseudo-LRU, OPT (Optimal)

5
Related Work

Mibench
vs.
NetBench

6
Problem Statement

Comprehensive performance evaluation of cache
design issues in embedded systems
Split versus unified cache
Cache placement and size
Cache block size
Block replacement policy
Performance metrics
Static measure the number of cache misses per 1K
instructions executed - measured at the end of
application execution
Dynamic measure The number of cache misses per
1K instructions executed - measured on every
100K instructions executed

7
Proposed Solution

Why use NetBench?

8
Experimental Setup

ARM version of the SimpleScalar toolset
Sim-cache
Sim-cheetah
NetBench Applications include
Micro-Level Programs
CRC Checksum calculation
TL Table lookup
IP-Level Programs
Route IPv4 routing
DRR Deficit round robin
Application-Level Programs
DH Public key encryption/decryption
MD5 Message digest algorithm (secure signature)

9
Experimental Setup

Cache memory setup
Split first level instruction and data
Unified first level cache
Cache parameters
Cache size ? ranging from 0.5KB to 32KB
Cache associativity ? direct mapped, 2-way,
4-way, and 8-way set associative
Cache replacement policies ? FIFO, Random, LRU,
pLRUt, pLRUm, and Optimal
Cache block size ? 32B, 64B

10
Experimental Setup (contd)
Instructions
ARM Core
L1I
Data
L1D
ARM Core
L1U
Instructions Data
11
MiBench Experimental Results
12
Data Cache Misses
13
Instruction Cache Misses
14
Unified Cache Misses
15
Dynamic Behavior
16
Dynamic Behavior
17
Replacement Policies
18
Experimental Results

NetBench
Discussion

19
Conclusions

Split caches outperform the equivalent unified
cache for relatively small direct mapped caches
Unified cache almost always outperforms the split
caches for set-associative caches

20
Conclusions

Increasing cache associativity reduces the number
of cache misses (up to 8-way associative
caches)
more beneficial for data and unified cachesthan
for instruction caches
Pseudo-LRU techniques perform as well as LRU for
data caches
Random performs the best for instruction caches
Relatively significant difference between optimal
replacement policy and the best non-optimal
policy