Lecture 5: Snooping Protocol Design Issues - PowerPoint PPT Presentation

About This Presentation

Title:

Lecture 5: Snooping Protocol Design Issues

Description:

Lecture 5: Snooping Protocol Design Issues Topics: barriers, basic snooping protocol implementation, multi-level cache hierarchies University of Utah Barriers ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 16

Provided by: RajeevB79

Learn more at: https://users.cs.utah.edu

Category:

Tags: block | design | issues | lecture | protocol | signals | snooping

Transcript and Presenter's Notes

Title: Lecture 5: Snooping Protocol Design Issues

1
Lecture 5 Snooping Protocol Design Issues

Topics barriers, basic snooping protocol
implementation,
multi-level cache hierarchies

2
Barriers

Barriers require each process to execute a lock
and
unlock to increment the counter and then spin
on a
shared variable
If multiple barriers use the same variable,
deadlock can
arise because some process may not have left
the
earlier barrier sense-reversing barriers can
solve this
problem
A tree can be employed to reduce contention for
the
lock and shared variable
When one process issues a read request, other
processes can snoop and update their invalid
entries

3
Barrier Implementation
LOCK(bar.lock) if (bar.counter 0) bar.flag
0 mycount bar.counter UNLOCK(bar.lock) if
(mycount p) bar.counter 0 bar.flag
1 else while (bar.flag 0)
4
Sense-Reversing Barrier Implementation
local_sense !(local_sense) LOCK(bar.lock) myco
unt bar.counter UNLOCK(bar.lock) if
(mycount p) bar.counter 0 bar.flag
local_sense else while (bar.flag !
local_sense)
5
Implementing Coherence Protocols

Correctness and performance are not the only
metrics
Deadlock a cycle of resource dependencies,
where each
process holds shared resources in a
non-preemptible
fashion
Livelock similar to deadlock, but transactions
continue in
the system without each process making forward
progress
Starvation an extreme case of unfairness

6
Basic Implementation

Assume single level of cache, atomic bus
transactions
It is simpler to implement a processor-side
cache
controller that monitors requests from the
processor and
a bus-side cache controller that services the
bus
Both controllers are constantly trying to read
tags
tags can be duplicated (moderate area overhead)
unlike data, tags are rarely updated
tag updates stall the other controller

7
Reporting Snoop Results

Uniprocessor system initiator places address on
bus, all
devices monitor address, one device acks by
raising a
wired-OR signal, data is transferred
In a multiprocessor, memory has to wait for the
snoop
result before it chooses to respond need 3
wired-OR
signals (i) indicates that a cache has a copy,
(ii) indicates
that a cache has a modified copy, (iii)
indicates that the
snoop has not completed
Ensuring timely snoops the time to respond
could be
fixed or variable (with the third wired-OR
signal), or the
memory could track if a cache has a block in M
state

8
Non-Atomic State Transitions

Note that a cache controllers actions are not
all atomic tag
look-up, bus arbitration, bus transaction,
data/tag update
Consider this block A in shared state in P1 and
P2 both
issue a write the bus controllers are ready
to issue an
upgrade request and try to acquire the bus is
there a
problem?
The controller can keep track of additional
intermediate
states so it can react to bus traffic (e.g.
S?M, I?M, I?S,E)
Alternatively, eliminate upgrade request use
the shared
wire to suppress memorys response to an
exclusive-rd

9
Serialization

Write serialization is an important requirement
for
coherence and sequential consistency writes
must be
seen by all processors in the same order
On a write, the processor hands the request to
the cache
controller and some time elapses before the bus
transaction happens (the external world sees
the write)
If the writing processor continues its execution
after
handing the write to the controller, the same
write order
may not be seen by all processors hence, the
processor
is not allowed to continue unless the write has
completed

10
Livelock

Livelock can happen if the processor-cache
handshake
is not designed correctly
Before the processor can attempt the write, it
must
acquire the block in exclusive state
If all processors are writing to the same block,
one of
them acquires the block first if another
exclusive request
is seen on the bus, the cache controller must
wait for the
processor to complete the write before
releasing the block
-- else, the processors write will fail again
because the
block would be in invalid state

11
Atomic Instructions

A testset instruction acquires the block in
exclusive
state and does not release the block until the
read and
write have completed
Should an LL bring the block in exclusive state
to avoid
bus traffic during the SC?
Note that for the SC to succeed, a bit
associated with
the cache block must be set (the bit is reset
when a
write to that block is observed or when the
block is evicted)
What happens if an instruction between the LL
and SC
causes the LL-SC block to always be replaced?

12
Multilevel Cache Hierarchies

Ideally, the snooping protocol employed for L2
must be
duplicated for L1 redundant work because of
blocks
common to L1 and L2
Inclusion greatly simplifies the implementation

13
Maintaining Inclusion

Assuming equal block size, if L1 is 8KB 2-way
and L2 is
256KB 8-way, is the hierarchy inclusive?
(assume that an
L1 miss brings a block into L1 and L2)
Assuming equal block size, if L1 is 8KB
direct-mapped
and L2 is 256KB 8-way, is the hierarchy
inclusive?
To maintain inclusion, L2 replacements must also
evict
relevant blocks in L1

14
Intra-Hierarchy Protocol

Some coherence traffic needs to be propagated to
L1
likewise, L1 write traffic needs to be
propagated to L2
What is the best way to do implement the above?
More
traffic? More state?
In general, external requests propagate upward
from L3 to
L1 and processor requests percolate down from
L1 to L3
Dual tags are not as important as the L2 can
filter out
bus transactions and the L1 can filter out
processor
requests

15
Title

Bullet

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Identification and Authentication CSCE 790 Lecture 8 PowerPoint PPT Presentation

Identification and Authentication CSCE 790 Lecture 8 - ... A, finds A's master key and sends to WS a ticket to A and ... The ticket to A consists of B, expiration time, and KB,A both encrypted using A's master key ... | PowerPoint PPT presentation | free to view

CS 258 Parallel Computer Architecture Lecture 16 Snoopy Protocols I PowerPoint PPT Presentation

CS 258 Parallel Computer Architecture Lecture 16 Snoopy Protocols I - CS 258. Parallel Computer Architecture. Lecture 16. Snoopy Protocols I. March 15, 2002 ... every process issues mem operations in program order ... | PowerPoint PPT presentation | free to view

CS 258 Parallel Computer Architecture Lecture 21 Directory Based Protocols PowerPoint PPT Presentation

CS 258 Parallel Computer Architecture Lecture 21 Directory Based Protocols - some may assert inhibit to extend response phase till done snooping ... Cache with dirty block asserts inhibit line till done with snoop ... | PowerPoint PPT presentation | free to view

CS 258 Parallel Computer Architecture Lecture 12 Shared Memory Multiprocessors II PowerPoint PPT Presentation

CS 258 Parallel Computer Architecture Lecture 12 Shared Memory Multiprocessors II - also clears out copies that will never be used again. Update. ... Whatever it is, we need an ordering model for clear semantics ... | PowerPoint PPT presentation | free to view

CSE 502 Graduate Computer Architecture Lec 16-18 PowerPoint PPT Presentation

CSE 502 Graduate Computer Architecture Lec 16-18 - Snooping ('Snoopy') Every cache with a copy of a data block ... since rely on snooping ... Snooping coherence protocol is usually implemented by ... | PowerPoint PPT presentation | free to view

CS%20620%20Advanced%20Operating%20Systems PowerPoint PPT Presentation

CS%20620%20Advanced%20Operating%20Systems - The data segment (or heap) contains dynamically allocated memory. ... Machines make decisions based only on local information. Design Issues ... | PowerPoint PPT presentation | free to view

CS 258 Parallel Computer Architecture Lecture 11 Shared Memory Multiprocessors PowerPoint PPT Presentation

CS 258 Parallel Computer Architecture Lecture 11 Shared Memory Multiprocessors - Processors see different values for u after event 3 ... only one copy of code/data used by both proc. Can share data within a line without 'ping-pong' ... | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP Lecture 6 - Performance comparison Summary CS267 Lecture 6 * Parallel Programming with Threads CS267 Lecture 6 * Recall Programming Model 1: Shared Memory ... Memory/Cache ... | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP Lecture 6 - ... (physical registers, cache, memory ... Threads and OpenMP Lecture 6 Outline Parallel Programming with Threads Recall Programming Model 1: Shared Memory ... | PowerPoint PPT presentation | free to view

CSE 497A Spring 2002 Functional Verification Lecture 2/3 Vijaykrishnan Narayanan PowerPoint PPT Presentation

CSE 497A Spring 2002 Functional Verification Lecture 2/3 Vijaykrishnan Narayanan - CSE 497A Spring 2002 Functional Verification Lecture 2/3 Vijaykrishnan Narayanan Course Administration Instructor Vijay Narayanan (vijay@cse.psu.edu ... | PowerPoint PPT presentation | free to view

Lecture 09 Location Management ?9? ?????? PowerPoint PPT Presentation

Lecture 09 Location Management ?9? ?????? - Title: PowerPoint Presentation Last modified by: WWG Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show (4:3) Other titles | PowerPoint PPT presentation | free to view

Lecture 9: Directory-Based Examples PowerPoint PPT Presentation

Lecture 9: Directory-Based Examples - Lecture 9: Directory-Based Examples Topics: SGI Origin 2000 case study University of Utah SGI Origin 2000 Flat memory-based directory protocol Uses a bit vector ... | PowerPoint PPT presentation | free to view

Lecture 7: PCM Wrap-Up, Cache coherence PowerPoint PPT Presentation

Lecture 7: PCM Wrap-Up, Cache coherence - Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: Rajeev Balasubramonian Created Date: 9/20/2002 6:19:18 PM Document presentation format | PowerPoint PPT presentation | free to view

Lecture 4: Directory Protocols PowerPoint PPT Presentation

Lecture 4: Directory Protocols - Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: Administator Created Date: 9/20/2002 6:19:18 PM Document presentation format | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 13: Multiprocessor 3: Measurements, Crosscutting Issues, Examples, Fallacies PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 13: Multiprocessor 3: Measurements, Crosscutting Issues, Examples, Fallacies - Title: Lecture 1: Course Introduction and Overview Author: Randy H. Katz Last modified by: Dave Created Date: 8/12/1995 11:37:26 AM Document presentation format | PowerPoint PPT presentation | free to view

Lecture 8: Snooping and Directory Protocols PowerPoint PPT Presentation

Lecture 8: Snooping and Directory Protocols - Title: PowerPoint Presentation Author: Rajeev Balasubramonian Last modified by: Rajeev Balasubramonian Created Date: 9/20/2002 6:19:18 PM Document presentation format | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP Lecture 6 - Slides by Jim Demmel and Kathy Yelick ... Threads and OpenMP Lecture 6 James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr14/ | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP Lecture 6 - Slides by Jim Demmel and Kathy Yelick ... Threads and OpenMP Lecture 6 James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr13/ | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP Lecture 6 - ... Shared Memory Program is a collection of threads of control. Can be created dynamically, mid-execution, in some languages Each thread has a set of private ... | PowerPoint PPT presentation | free to view

Math 3680 PowerPoint PPT Presentation

Math 3680 - Math 3680 Lecture #7 The Sign Test and the Binomial Exact Test | PowerPoint PPT presentation | free to view

CS 252 Graduate Computer Architecture Lecture 17 Parallel Processors: Past, Present, Future PowerPoint PPT Presentation

CS 252 Graduate Computer Architecture Lecture 17 Parallel Processors: Past, Present, Future - Title: EECS 252 Graduate Computer Architecture Lec XX - TOPIC Last modified by: Krste Asanovic Created Date: 2/8/2005 3:17:21 AM Document presentation format | PowerPoint PPT presentation | free to view

Graduate Computer Architecture I PowerPoint PPT Presentation

Graduate Computer Architecture I - Graduate Computer Architecture I Lecture 10: Shared Memory Multiprocessors Young Cho | PowerPoint PPT presentation | free to view

Lecture%2029:%20Parallel%20Programming%20Overview PowerPoint PPT Presentation

Lecture%2029:%20Parallel%20Programming%20Overview - Lecture 29: Parallel Programming Overview | PowerPoint PPT presentation | free to view

Lecture: Coherence Protocols PowerPoint PPT Presentation

Lecture: Coherence Protocols - * SMPs Centralized main memory and many caches many ... processors The memory consistency model defines time elapsed ... X B: Wr Y * SMP ... | PowerPoint PPT presentation | free to view

Lecture 7: Implementing Cache Coherence PowerPoint PPT Presentation

Lecture 7: Implementing Cache Coherence - Lecture 7: Implementing Cache Coherence Topics: implementation details | PowerPoint PPT presentation | free to view

Shared Memory Programming: Threads and OpenMP Lecture 6 PowerPoint PPT Presentation

Shared Memory Programming: Threads and OpenMP Lecture 6 - Title: Shared Memory Parallel Programming Author: Kathy Yelick Description: Slides by Jim Demmel and Kathy Yelick Last modified by: James Demmel Created Date | PowerPoint PPT presentation | free to view

Week 3 Lecture slides PowerPoint PPT Presentation

Week 3 Lecture slides - Cosc 3P92 Week 3 Lecture s An intelligence test sometimes shows a man how smart he would have been not to have taken it. Laurence J. Peter US educator & writer ... | PowerPoint PPT presentation | free to view