Another Performance Evaluation of Memory Hierarchy in Embedded Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Another Performance Evaluation of Memory Hierarchy in Embedded Systems

Description:

Related Work. Problem Statement. Proposed Solutions. Experimental Setup. Experimental Results ... Pseudo-LRU techniques perform as well as LRU for data caches ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 21
Provided by: nelson90
Learn more at: http://www.ece.uah.edu
Category:

less

Transcript and Presenter's Notes

Title: Another Performance Evaluation of Memory Hierarchy in Embedded Systems


1
Another Performance Evaluation of Memory
Hierarchy in Embedded Systems
  • Nelson Barnes
  • CPE 631
  • 04/14/03

2
Outline
  • Introduction
  • Related Work
  • Problem Statement
  • Proposed Solutions
  • Experimental Setup
  • Experimental Results
  • Conclusions

3
Introduction
  • Why is cache design so important in embedded
    systems?

4
Cache Design Parameters
  • Cache organization
  • Unified vs. Split (Instruction Data) caches
  • Cache size
  • Cache block (line) size
  • Block placement policy
  • Direct-mapped, fully-associative, set-associative
  • Block replacement policy
  • Random, Least-Recently Used (LRU), Round-robin,
    Pseudo-LRU, OPT (Optimal)

5
Related Work
  • Mibench
  • vs.
  • NetBench

6
Problem Statement
  • Comprehensive performance evaluation of cache
    design issues in embedded systems
  • Split versus unified cache
  • Cache placement and size
  • Cache block size
  • Block replacement policy
  • Performance metrics
  • Static measure the number of cache misses per 1K
    instructions executed - measured at the end of
    application execution
  • Dynamic measure The number of cache misses per
    1K instructions executed - measured on every
    100K instructions executed

7
Proposed Solution
  • Why use NetBench?

8
Experimental Setup
  • ARM version of the SimpleScalar toolset
  • Sim-cache
  • Sim-cheetah
  • NetBench Applications include
  • Micro-Level Programs
  • CRC Checksum calculation
  • TL Table lookup
  • IP-Level Programs
  • Route IPv4 routing
  • DRR Deficit round robin
  • Application-Level Programs
  • DH Public key encryption/decryption
  • MD5 Message digest algorithm (secure signature)

9
Experimental Setup
  • Cache memory setup
  • Split first level instruction and data
  • Unified first level cache
  • Cache parameters
  • Cache size ? ranging from 0.5KB to 32KB
  • Cache associativity ? direct mapped, 2-way,
    4-way, and 8-way set associative
  • Cache replacement policies ? FIFO, Random, LRU,
    pLRUt, pLRUm, and Optimal
  • Cache block size ? 32B, 64B

10
Experimental Setup (contd)
Instructions
ARM Core
L1I
Data
L1D
ARM Core
L1U
Instructions Data
11
MiBench Experimental Results
12
Data Cache Misses
13
Instruction Cache Misses
14
Unified Cache Misses
15
Dynamic Behavior
16
Dynamic Behavior
17
Replacement Policies
18
Experimental Results
  • NetBench
  • Discussion

19
Conclusions
  • Split caches outperform the equivalent unified
    cache for relatively small direct mapped caches
  • Unified cache almost always outperforms the split
    caches for set-associative caches

20
Conclusions
  • Increasing cache associativity reduces the number
    of cache misses (up to 8-way associative
    caches)
  • more beneficial for data and unified cachesthan
    for instruction caches
  • Pseudo-LRU techniques perform as well as LRU for
    data caches
  • Random performs the best for instruction caches
  • Relatively significant difference between optimal
    replacement policy and the best non-optimal
    policy
Write a Comment
User Comments (0)
About PowerShow.com