Origin of the problem

About This Presentation

Title:

Origin of the problem

Description:

In case of a miss, a second access is performed using an XOR function ... with 6th International Symposium on High Performance Computing, January 2000 ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 11

Provided by: WAL47

Learn more at: http://pascal.eng.uci.edu

Category:

more less

Transcript and Presenter's Notes

Title: Origin of the problem

1
Origin of the problem

Multi-threaded processors put more pressure on
the memory system, causing interference among
threads (inter-thread misses) and increasing the
interferences inside threads (intra-thread
misses)
Conventional caches are not suitable to deal with
these problems

2
Two different placement scheme for conventional
cache design

Modulus Function, which indexes the cache using
some bits of the address
XOR-based placement function, which performs a
bitwise XOR among the index bits and the n least
significant bits of the tag

3
Modulus placement function and bitwise XOR
placement function
tag (t bits)
index (i bits)
block (b bits)
tag (t bits)
Thread id
index (i bits)
tag (t-i bits) (i bits)
index (i bits)
block (b bits)
XOR
tag (t bits)
Thread id
index (i bits)
4
Previous Observation

As the number of threads increases, the
contribution of the inter-thread miss ratio to
the overall miss ratio also increases
Increasing associativity decreases the overall
percentage of inter-thread misses

5
Double-access cache

Cache is accessed using two different functions
First, cache is indexed with modulus function
In case of a miss, a second access is performed
using an XOR function
Performing two accesses is an artificial approach
to double the associativity, while increasing
associativity is very effective to reduce
inter-thread misses
This one works well

6
Cache Splitting

Split the cache into equal sized part, each one
is assigned to a running thread
Does not perform well, eliminate the inter-thread
misses, but increases the intra-thread misses, no
performance gain

7
Double access local-local split cache

Each thread may perform two access the first one
is over its cache area (first local access), and
in the case of a miss, over the local area of
other thread
Does not perform well

8
Double access local-global split cache

Split the cache into equal sized part, each one
is assigned to a running thread
Each thread may perform two access the first one
is over its local area (local access), and in the
case of a miss, over the whole cache (global
access)
Performs well

9
Join cache

Cache is divided into two part
One of them (the join area) can accesses by all
threads
The other part is split among all running threads

10
References

Montse Garcia, Jose Gonzalez and Antonio
Gonzalez, Data Caches for Multithreaded
Processors, in Multi-Threaded Execution,
Architecture and Compilation Workshop (MTEAC),
held in conjunction with 6th International
Symposium on High Performance Computing, January
2000
A. Agarwal and S.D. Pudar Column-Associative
Caches A Technique for Reducing the Miss Rate of
Direct-Mapped Caches, in Proc. Int. Symp. On
Computer Architecture. 1993
D.M. Tullsen, S.J. Eggers, and H.M. Levy
Simultaneous Multithreading Maximizing On-Chips
Parallelism,in 22nd ISCA. June 1995