Xiaomi An, Jiqiang Song, Wendong Wang - PowerPoint PPT Presentation

About This Presentation
Title:

Xiaomi An, Jiqiang Song, Wendong Wang

Description:

Temporal Distribution Based Software Cache Partition To Reduce I-Cache Misses ... Distribute temporal interleaved code onto different cache lines (Hashemi, Gloy, etc) ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 19
Provided by: sunc6
Category:
Tags: gloy | jiqiang | song | wang | wendong | xiaomi

less

Transcript and Presenter's Notes

Title: Xiaomi An, Jiqiang Song, Wendong Wang


1

Temporal Distribution Based Software Cache
Partition To Reduce I-Cache Misses
  • Xiaomi An, Jiqiang Song, Wendong Wang
  • SimpLight Nanoelectronics Ltd
  • 2008/03/24

2
outline
  • Traditional code layout optimizations
  • Code layout optimizations in Open64 compiler
  • Temporal distribution based software cache
    partition to reduce I-Cache misses
  • Future work

3
Traditional code layout optimizations
  • Code layout is a kind of optimization to change
    the code organization in memory.
  • Main benefits of code layout
  • Improve branch prediction by placement of basic
    blocks
  • Reduce I-cache misses by changing codes mapping
    onto cache (mainly compulsory misses and conflict
    misses)
  • Fit code into complex memory hierarchy (e.g.
    scratch-pad memory and cache)

4
Traditional code layout optimizations
  • Representation of temporal relationship
  • control flow graph with edge frequency
  • weighted call graph
  • temporal relation graph
  • Consideration of cache architecture
  • Linearize code, do not consider cache
    architecture (Pettis and Hansen)
  • Distribute temporal interleaved code onto
    different cache lines (Hashemi, Gloy, etc)

5
Code layout optimizations in Open64 compiler
  • Profile based basic block reordering and
    procedure-splitting in CG
  • Based on control flow graph with edge frequency
  • Pettis and Hansen based algorithm
  • Procedure reordering in IPA
  • Based on weighted call graph with call-edge
    frequency
  • Kind of Pettis and Hansen based algorithm

6
Software cache partition
  • What is Software cache partition?
  • Through code layout optimization, different code
    blocks are mapped to different regions of the
    I-cache.
  • Benefits of software cache partition
  • Reduce cache misses
  • Remove interference of multi-programs and avoid
    additional hardware support (embedded systems)
  • Soft implementation of scratch pad memory on top
    of I-cache

7
Benefits of software cache partition (1)
  • Remove interference of multi-programs and avoid
    additional hardware support

I-cache is partitioned according to the
performance demand and code locality of the video
application and the audio application.
8
Benefits of software cache partition (2)
  • Soft implementation of scratch pad memory on top
    of I-cache

I-cache is partitioned to guarantee code with
real time requirement will not be replaced after
they are brought into the cache.
9
Benefits of software cache partition (3)
  • Reduce I-cache misses

Runtime trace of code blocks ABCDEF(UV)5ABCDEF(P
Q)5ABCDEF(XY)5ABCDEF
Layout 1 24 misses
Layout 2 18 misses
10
Temporal distribution based layout of code blocks
in the partitioned cache
  • Selection of good candidates holding cache lines
    exclusively
  • Hot, Dense and Temporal Distribution

Mapping into I-cache
Share cache lines
Share cache lines
11
Temporal distribution
  • Temporal locality and temporal regularity
  • Trace ABCDEF(UV)5ABCDEF(PQ)5ABCDEF(XY)5ABCDEF
  • A,B,C,D,E,F have good temporal regularity since
    they have uniform distribution along the trace.
  • U,V,P,Q,X,Y have good temporal locality since
    they exhibit a large skew in the reference
    distribution.

Share cache lines
Our mapping Totally 18 misses
Share cache lines
12
Qualification of temporal distribution
  • Variance of reuse distance
  • Temporal distribution
  • Weighted temporal distribution

13
Iterative partition and layout
  • Func Partition (RB, IRB)
  • Sort nodes in RB by instruction density //
    highest //instruction density first
  • RB_SIZE Calc_rb_size(RB)
  • IRB_SIZE Calc_irb_size(IRB)
  • While(RB_SIZEIRB_SIZEgtCACHE_SIZE)
  • Adjust(RB, IRB)
  • RB_SIZE Calc_rb_size(RB)
  • IRB_SIZE Calc_irb_size(IRB)

14
Experiments and results (1)
Cumulative effect of optimizations on I-cache
miss reduction
15
Experiments and results (2)
Reduction of I-cache misses by TD, PH and TRG.
16
Experiments and results (3)
H264 codec I-cache miss reduction by TD, PH and
TRG with various inputs
17
Future work
  • Improve current iterative partition algorithm
  • Incorporate more cache configurations into the
    layout algorithm, e.g. cache line size, L2 cache
  • Develop effective software cache partition method
    for multi-thread programs on our memory hierarchy

18
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com