Title: Cache Coherency on
1Cache Coherency on Heterogeneous Multiprocessor
Platform with Shared Memory
Taeweon Suh
21. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A0
A0
Memory
Memory
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
31. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A0
Memory
Memory
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
41. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
Memory
Memory
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
51. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
Memory
Memory
A1
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
61. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
Memory
Memory
A2
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
71. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
A0
A0
Memory
Memory
A2
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
81. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
A1
A0
Memory
Memory
A2
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
91. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
A1
A1
Memory
Memory
A2
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
101. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
A1
A3
Memory
Memory
A2
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
111. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
A3
A3
Memory
Memory
A2
A0
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
121. What is my topic? Cache Coherency
- Cache coherency protocol is well known technique
on multiprocessor - platform with shared memory
Intel Xeon 1
Intel Xeon 2
Intel Xeon 1
Intel Xeon 2
D
D
D
D
A1
A2
A3
A3
Memory
Memory
A2
A3
Fig 1. Without Cache Coherency
Fig 2. With Cache Coherency
1. Xeon1 read A 2. Xeon2 read A 3. Xeon1
add 1 to A 4. Xeon2 add 2 to A 5. Xeon1
write A to memory 6. Xeon2 write A to memory
Parallel program Xeon1 Xeon2 A A
1 A A 2
Serial program A A 1 A A 2
132. What do other people do? ? Cache Coherency
ONLY on homogeneous platforms - MEI, MSI,
MESI, Dragon protocol etc ? How ?
143. What is my approach? ? Cache Coherency on
heterogeneous platform Invalidation scheme ?
Processors PowerPC755
MEI(Modified, Exclusive, Invalid) protocol
ARM920T No support for cache coherency ?
How ? Hardware Wrapper around
PowerPC755 for bus interface
snoop inputs
Snooping logic between bus and
ARM920T -
Tag logic same as ARM920T D
- Bus mastership yield logic
when snoop hit
- Shared page register
- Snoop hit address register
- FIQ status
register
Software Interrupt Service Routine(FIQ_ISR)
153. What is my approach? (continued)
- ARM920T D
- 16KB
- 32 byte block
- 64-way set associative
- Replacement
- Round-Robin, Random
- - No cache coherency
- PowerPC D
- 32KB
- 32 byte block
- 8-way set associative
- MEI protocol
Wrapper
ARM920T
FIQ
PowerPC755
D
D
GBL TT ARTRY ADDR
Snoop logic
TAG
Shared page register Snoop hit address
register FIQ status register
Bus mastership logic
ASB(Advanced System Bus)
Memory
16- 4. Generalization to other processors ?
- Any processor combinations are possible
- - If processors have more than 2
interrupt inputs - - Embedded processors have more than
2 interrupts - (Example ARM, MIPS, i960 )
- - One interrupt is used by
RTOS(Real-Time Operating System) - Dont use SHARED state in cache coherency
protocol - - Invalidation scheme
- - What happens if SHARED state is used ?
- Interrupt is used to inform snoop hit
- Interrupt cannot be accepted immediately
depending on processor status pipeline - ? During that time, coherency has not been
maintained
17Any Questions?