Prefetching%20Challenges%20in%20%20Distributed%20Memories%20for%20CMPs - PowerPoint PPT Presentation

About This Presentation
Title:

Prefetching%20Challenges%20in%20%20Distributed%20Memories%20for%20CMPs

Description:

Prefetching Challenges in Distributed Memories for CMPs. Mart Torrents, Ra lMart nez, and Carlos Molina. Computer Architecture Department. UPC BarcelonaTech – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 37
Provided by: upc82
Category:

less

Transcript and Presenter's Notes

Title: Prefetching%20Challenges%20in%20%20Distributed%20Memories%20for%20CMPs


1
Prefetching Challenges in Distributed Memories
for CMPs
  • Martí Torrents, Raúl Martínez, and Carlos Molina

Computer Architecture Department UPC
BarcelonaTech
2
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

3
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

4
Prefetching
  • Reduce memory latency
  • Bring to a nearest cache next data required by
    CPU
  • Increase the hit ratio
  • It is implemented in most of the commercial
    processors
  • Erroneous prefetching may produce
  • Cache pollution
  • Resources consumption (queues, bandwidth, etc.)
  • Power consumption

5
Motivation
  • Number of cores in a same chip grows every year

Intel Polaris 80 Cores
Nehalem 46 Cores
Tilera 64100 Cores
Nvidia GeForce Up to 256 Cores
6
Prefetch in CMPs
  • Useful prefetchers implies more performance
  • Avoid network latency
  • Reduce memory access latency
  • Useless prefetchers implies less performance
  • More power consumption
  • More NoC congestion
  • Interference with other cores requests

7
Prefetch adverse behaviors
M. Torrents, R. Martínez, C. Molina. Network
Aware Performance Evaluation of Prefetching
Techniques in CMPs. Simulation Modeling Practice
and Theory (SIMPAT), 2014.
8
Distributed memories
  • Distribution of the memory access pattern

_at_
_at_2
_at_4
_at_6
_at_8
_at_10
9
Distributed memories
  • Distribution of the memory access pattern

_at_14
_at_
_at_2
_at_4
_at_6
_at_8
_at_10
_at_12
10
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

11
Prefetch Distributed Memory Systems
  • Analysis phase

Distributed patterns
DISTRIBUTED L2 MEMORY
_at_
L1 MISS for _at_
12
Pattern Detection Challenge
  • Distribution of the memory stream
  • Prefetcher aware of a certain part of the stream
  • Harder to detect access patterns or correlation
  • Not all the prefetchers affected
  • Correlation prefetchers affected GHB
  • One Block Lookahead not affected Tagged

13
Prefetch Distributed Memory Systems
  • Request generation phase

DISTRIBUTED L2 MEMORY
_at_
_at_ 2
_at_ 4
Queue filtering
14
Prefetch Queue Filtering Challenge
  • Prefetch requests queued in distributed queues
  • Independent engines generating requests
  • Repeated requests can be queued
  • In a centralized queue those would be merged
  • Adverse effects
  • Power consumption
  • Network contention

15
Prefetch Distributed Memory Systems
  • Evaluation phase

Dynamic profiling
DISTRIBUTED L2 MEMORY
?
_at_
_at_ 2
_at_ 4
L1 MISS for _at_ 2
16
Dynamic Profiling Challenge
  • Prefetch requests generated in one tile
  • Dynamic profiling information in another tile
  • Erroneous profiling in the self tile
  • Techniques using this info may work erroneously
  • Filtering
  • Throttling
  • Concrete prefetching engines

17
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

18
Challenge evaluation methodology
  • Three environments to test the challenges
  • Pattern Detection Challenge Ideal Prefetcher
  • Prefetcher that it is aware of all the memory
    stream
  • No extra network contention added in the system
  • No extra power consumed
  • Requests classified depending on its core
    identifier
  • To preserve the original stream of each core
  • Prefetcher used to test Global History Buffer

19
Pattern Detection Challenge
20
Challenge evaluation methodology
  • Three environments to test the challenges
  • Prefetch Queue Filtering Centralized queue
  • All the requests sent to a centralized queue
  • Repeated requests are merged
  • No extra network contention added in the system
  • No extra power consumed
  • Repeated requests are not issued
  • Prefetcher used to test Tagged prefercher

21
Prefetch Queue Filtering Challenge
22
Challenge evaluation methodology
  •  

23
Dynamic Profiling Challenge
24
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

25
Experimental framework
  • Gem5
  • 64 x86 CPUs
  • Ruby memory system
  • L2 prefetchers
  • MOESI coherency protocol
  • Garnet network simulator
  • Parsecs 2.1

26
Simulation environment
27
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

28
Pattern Detection Challenge
29
Prefetch Queue Filtering Challenge
30
Dynamic Profiling Challenge
31
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

32
Facing the challenges
  • There are two main options
  • Redesign the entire prefetch philosophy
  • Adapt the current techniques to work with DSMs
  • Moreover, there are two main directions
  • Centralize the information
  • Handicap of communication increment
  • Distribute the prefetcher
  • Handicap of smartly distribute the prefetcher

33
Outline
  • Introduction
  • Naming the challenges
  • Challenge evaluation methodology
  • Experimental framework
  • Challenge Quantification
  • Facing the Challenges
  • Conclusions

34
Conclusions
  • Three challenges when prefetching in DSMs
  • Prefetch Queue Filtering Challenge
  • Dynamic Profiling Challenge
  • Challenge evaluation methodology
  • Directions for future investigators
  • There are no evident solutions for them
  • Not solving them -gt limited prefetch performance

35
Q A
36
Prefetching Challenges in Distributed Memories
for CMPs
  • Martí Torrents, Raúl Martínez, and Carlos Molina

Computer Architecture Department UPC
BarcelonaTech
Write a Comment
User Comments (0)
About PowerShow.com