Title: Computer Architecture
1Computer Architecture
- Chapter 8
- Multiprocessors
- Shared Memory Architectures
- Prof. Jerry Breecher
- CSCI 240
- Fall 2003
2Chapter Overview
- Were going to do only one section from this
chapter, that part related to how caches from
multiple processors interact with each other. - 8.1 Introduction the big picture
- 8.3 Centralized Shared Memory Architectures
3Introduction
The Big Picture Where are We Now?
- 8.1 Introduction
- 8.3 Centralized Shared Memory Architectures
The major issue is this Weve taken copies of
the contents of main memory and put them in
caches closer to the processors. But what
happens to those copies if someone else wants to
use the main memory data? How do we keep all
copies of the data in synch with each other?
4The Multiprocessor Picture
Processor/Memory Bus
Example Pentium System Organization
PCI Bus
I/O Busses
5Shared Memory Multiprocessor
Chipset
Memory
- Memory centralized with Uniform Memory Access
time (uma) and bus interconnect, I/O - Examples Sun Enterprise 6000, SGI Challenge,
Intel SystemPro
Disk other IO
6Shared Memory Multiprocessor
- Several processors share one address space
- conceptually a shared memory
- often implemented just like a multicomputer
- address space distributed over private memories
- Communication is implicit
- read and write accesses to shared memory
locations - Synchronization
- via shared memory locations
- spin waiting for non-zero
- barriers
P
P
P
Network/Bus
M
Conceptual Model
7Message Passing Multicomputers
- Computers (nodes) connected by a network
- Fast network interface
- Send, receive, barrier
- Nodes not different than regular PC or
workstation - Cluster conventional workstations or PCs with
fast network - cluster computing
- Berkley NOW
- IBM SP2
8Large-Scale MP Designs
- Memory distributed with nonuniform memory access
time (numa) and scalable interconnect
(distributed memory)
100 cycles
Low Latency High Reliability
40 cycles
1 cycle
9Shared Memory Architectures
8.1 Introduction 8.3 Centralized Shared
Memory Architectures
- In this section we will understand the issues
around - Sharing one memory space among several
processors. - Maintaining coherence among several copies of a
data item.
10The Problem of Cache Coherency
Shared Memory Architectures
CPU
CPU
CPU
Cache 100 200
Cache 550 200
Cache 100 200
A
A
A
B
B
B
Memory 100 200
Memory 100 200
Memory 100 440
A
A
A
B
B
B
I/O
I/O Output of A gives 100
I/O Input 440 to B
a) Cache and memory coherent A A, B B.
b) Cache and memory incoherent A A.
c) Cache and memory incoherent B B.
11Some Simple Definitions
Shared Memory Architectures
Mechanism
How It Works
Performance
Coherency Issues
Write Back
Write modified data from cache to memory only
when necessary.
Good, because doesnt tie up memory bandwidth.
Can have problems with various copies containing
different values.
Write Through
Write modified data from cache to memory
immediately.
Not so good - uses a lot of memory bandwidth.
Modified values always written to memory data
always matches.
12What Does Coherency Mean?
Shared Memory Architectures
- Informally
- Any read must return the most recent write
- Too strict and too difficult to implement
- Better
- Any write must eventually be seen by a read
- All writes are seen in proper order
(serialization) - Two rules to ensure this
- If P writes x and P1 reads it, Ps write will be
seen by P1 if the read and write are sufficiently
far apart - Writes to a single location are serialized seen
in one order - Latest write will be seen
- Otherwise could see writes in illogical order
(could see older value after a newer value)
13There are Different Types of Memory In The Cache
Shared Memory Architectures
Test_and_set(lock) shared_data
xyz Clear(lock)
- What kinds of memory are there in the cache?
TYPE
Shared?
Writable
How Kept Coherent
Code
Shared
No
No Need.
Private Data
Exclusive
Yes
Write Back
Shared Data
Shared
Yes
Write Back
Interlock Data
Shared
Yes
Write Through
Write Back gives good performance, but if you
use write through here, there will be performance
degradation. Write through here means the
lock state is seen immediately. You want a write
through here to flush the cache.
14Potential HW Coherency Solutions
Shared Memory Architectures
- Snooping Solution (Snoopy Bus)
- Send all requests for data to all processors
- Processors snoop to see if they have a copy and
respond accordingly - Requires broadcast, since caching information is
at processors - Works well with bus (natural broadcast medium)
- Dominates for small scale machines (most of the
market) - Directory-Based Schemes
- Keep track of what is being shared in one
centralized place - Distributed memory gt distributed directory for
scalability(avoids bottlenecks) - Send point-to-point requests to processors via
network - Scales better than Snooping
- Actually existed BEFORE Snooping-based schemes
15An Example Snoopy ProtocolMaintained by Hardware
Shared Memory Architectures
- Invalidation protocol, write-back cache
- Each block of memory is in one state
- Clean in all caches and up-to-date in memory
(Shared) - OR Dirty in exactly one cache (Exclusive)
- OR Not in any caches
- Each cache block is in one state (track these)
- Shared block can be read
- OR Exclusive cache has only copy, its
writeable, and dirty - OR Invalid block contains no data
- Read misses cause all caches to snoop bus
- Writes to clean line are treated as misses
16Snoopy-Cache State Machine-I
Shared Memory Architectures
CPU Read hit
- State machinefor CPU requestsfor each cache
block
CPU Read
Shared (read/only)
Invalid
Place read miss on bus
CPU Write
Applies to Write Back Data
CPU read miss Write back block
CPU Read miss Place read miss on bus
Place Write Miss on bus
CPU Write Place Write Miss on Bus
Cache Block State
Exclusive (read/write)
CPU Write Miss Write back cache block Place write
miss on bus
CPU read hit CPU write hit
17Snoopy-Cache State Machine-II
Shared Memory Architectures
- State machinefor bus requests for each cache
block - Appendix E gives details of bus requests
Write miss for this block
Shared (read/only)
Invalid
Write Back Block (abort memory access)
Write Back Block (abort memory access)
Write miss for this block
Read miss for this block
Exclusive (read/write)
18Example
Shared Memory Architectures
Bus
Processor 1
Processor 2
Memory
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
This is the Cache for P1.
19Example Step 1
Shared Memory Architectures
20Example Step 2
Shared Memory Architectures
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
21Example Step 3
Shared Memory Architectures
A1
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2.
22Example Step 4
Shared Memory Architectures
A1
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
23Example Step 5
Shared Memory Architectures
A1
A1
Assumes initial cache state is invalid and A1
and A2 map to same cache block, but A1 ? A2
24Summary
- 8.1 Introduction the big picture
- 8.3 Centralized Shared Memory Architectures
- Weve looked at what happens to caches when we
have multiple processors or devices looking at
memory.