Computer Architecture - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Computer Architecture

Description:

We're going to do only one section from this chapter, that part related to how ... (avoids bottlenecks) Send point-to-point requests to processors via network ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 25

Provided by: jb20

Category:

more less

Transcript and Presenter's Notes

Title: Computer Architecture

1
Computer Architecture

Chapter 8
Multiprocessors
Shared Memory Architectures
Prof. Jerry Breecher
CSCI 240
Fall 2003

2
Chapter Overview

Were going to do only one section from this
chapter, that part related to how caches from
multiple processors interact with each other.
8.1 Introduction the big picture
8.3 Centralized Shared Memory Architectures

3
Introduction
The Big Picture Where are We Now?

8.1 Introduction
8.3 Centralized Shared Memory Architectures

The major issue is this Weve taken copies of
the contents of main memory and put them in
caches closer to the processors. But what
happens to those copies if someone else wants to
use the main memory data? How do we keep all
copies of the data in synch with each other?
4
The Multiprocessor Picture
Processor/Memory Bus
Example Pentium System Organization
PCI Bus
I/O Busses
5
Shared Memory Multiprocessor
Chipset
Memory

Memory centralized with Uniform Memory Access
time (uma) and bus interconnect, I/O
Examples Sun Enterprise 6000, SGI Challenge,
Intel SystemPro

Disk other IO
6
Shared Memory Multiprocessor

Several processors share one address space
conceptually a shared memory
often implemented just like a multicomputer
address space distributed over private memories
Communication is implicit
read and write accesses to shared memory
locations
Synchronization
via shared memory locations
spin waiting for non-zero
barriers

P
P
P
Network/Bus
M
Conceptual Model
7
Message Passing Multicomputers

Computers (nodes) connected by a network
Fast network interface
Send, receive, barrier
Nodes not different than regular PC or
workstation
Cluster conventional workstations or PCs with
fast network
cluster computing
Berkley NOW
IBM SP2

8
Large-Scale MP Designs

Memory distributed with nonuniform memory access
time (numa) and scalable interconnect
(distributed memory)

100 cycles
Low Latency High Reliability
40 cycles
1 cycle
9
Shared Memory Architectures
8.1 Introduction 8.3 Centralized Shared
Memory Architectures

In this section we will understand the issues
around
Sharing one memory space among several
processors.
Maintaining coherence among several copies of a
data item.

10
The Problem of Cache Coherency
Shared Memory Architectures
CPU
CPU
CPU
Cache 100 200
Cache 550 200
Cache 100 200
A
A
A
B
B
B
Memory 100 200
Memory 100 200
Memory 100 440
A
A
A
B
B
B
I/O
I/O Output of A gives 100
I/O Input 440 to B
a) Cache and memory coherent A A, B B.
b) Cache and memory incoherent A A.
c) Cache and memory incoherent B B.
11
Some Simple Definitions
Shared Memory Architectures
Mechanism
How It Works
Performance
Coherency Issues
Write Back
Write modified data from cache to memory only
when necessary.
Good, because doesnt tie up memory bandwidth.
Can have problems with various copies containing
different values.
Write Through
Write modified data from cache to memory
immediately.
Not so good - uses a lot of memory bandwidth.
Modified values always written to memory data
always matches.
12
What Does Coherency Mean?
Shared Memory Architectures

Informally
Any read must return the most recent write
Too strict and too difficult to implement
Better
Any write must eventually be seen by a read
All writes are seen in proper order
(serialization)
Two rules to ensure this
If P writes x and P1 reads it, Ps write will be
seen by P1 if the read and write are sufficiently
far apart
Writes to a single location are serialized seen
in one order
Latest write will be seen
Otherwise could see writes in illogical order
(could see older value after a newer value)

13
There are Different Types of Memory In The Cache
Shared Memory Architectures
Test_and_set(lock) shared_data
xyz Clear(lock)

What kinds of memory are there in the cache?

TYPE
Shared?
Writable
How Kept Coherent
Code
Shared
No
No Need.
Private Data
Exclusive
Yes
Write Back
Shared Data
Shared
Yes
Write Back
Interlock Data
Shared
Yes
Write Through
Write Back gives good performance, but if you
use write through here, there will be performance
degradation. Write through here means the
lock state is seen immediately. You want a write
through here to flush the cache.
14
Potential HW Coherency Solutions
Shared Memory Architectures

Snooping Solution (Snoopy Bus)
Send all requests for data to all processors
Processors snoop to see if they have a copy and
respond accordingly
Requires broadcast, since caching information is
at processors
Works well with bus (natural broadcast medium)
Dominates for small scale machines (most of the
market)
Directory-Based Schemes
Keep track of what is being shared in one
centralized place
Distributed memory gt distributed directory for
scalability(avoids bottlenecks)
Send point-to-point requests to processors via
network
Scales better than Snooping
Actually existed BEFORE Snooping-based schemes

15
An Example Snoopy ProtocolMaintained by Hardware
Shared Memory Architectures

Invalidation protocol, write-back cache
Each block of memory is in one state
Clean in all caches and up-to-date in memory
(Shared)
OR Dirty in exactly one cache (Exclusive)
OR Not in any caches
Each cache block is in one state (track these)
Shared block can be read
OR Exclusive cache has only copy, its
writeable, and dirty
OR Invalid block contains no data
Read misses cause all caches to snoop bus
Writes to clean line are treated as misses

16
Snoopy-Cache State Machine-I
Shared Memory Architectures
CPU Read hit

State machinefor CPU requestsfor each cache
block

CPU Read
Shared (read/only)
Invalid
Place read miss on bus
CPU Write
Applies to Write Back Data
CPU read miss Write back block
CPU Read miss Place read miss on bus
Place Write Miss on bus
CPU Write Place Write Miss on Bus
Cache Block State
Exclusive (read/write)
CPU Write Miss Write back cache block Place write
miss on bus
CPU read hit CPU write hit
17
Snoopy-Cache State Machine-II
Shared Memory Architectures

State machinefor bus requests for each cache
block
Appendix E gives details of bus requests

Write miss for this block
Shared (read/only)
Invalid
Write Back Block (abort memory access)
Write Back Block (abort memory access)
Write miss for this block
Read miss for this block
Exclusive (read/write)
18
Example
Shared Memory Architectures
Bus
Processor 1
Processor 2
Memory
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
This is the Cache for P1.
19
Example Step 1
Shared Memory Architectures
20
Example Step 2
Shared Memory Architectures
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
21
Example Step 3
Shared Memory Architectures
A1
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2.
22
Example Step 4
Shared Memory Architectures
A1
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
23
Example Step 5
Shared Memory Architectures
A1
A1
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
24
Summary

8.1 Introduction the big picture
8.3 Centralized Shared Memory Architectures
Weve looked at what happens to caches when we
have multiple processors or devices looking at
memory.

Write a Comment

User Comments (0)