Title: Theory of Memory
1Theory of Memory
- W. Paul
- Saarland University and DFKI
- bmbf Projekt Verisoft-XT
- joint work with
- Ulan Degebaev and Norbert Schirmer
- Saarland University
2why might his be important?
- Unites theories of
- store buffers
- interlocking
- caches
- cache coherence
- out of order execution
- X64 instruction set
- address translation
- optimized compilation
- structured parallel C semantics
- Explains why hypervisor might run structured
parallel C - VCC is supposed to mirror structured parallel C
semantics - thus VCC might be(come) sound
3Specifying Memory
x
M(x)
4Store Buffer
memory M
sbuf(y)
w(i)
r(j)
5Store Buffer
memory M
sbuf(y)
w(i)
r(j)
6Caches
M
ca
7Many Caches Snooping
M
ca(1)
ca(p)
8Many Caches
M
x.la
x.off
ca(1)
ca(p)
9Many Caches
M
x.la
x.off
ca(1)
ca(p)
10Many Caches
M
x.off
ca(1)
ca(p)
11Overlapping Transactions
c
b
public (a)
a
c
c
12Sequentially Consistent Memorylemma 5
c
b
public (a)
a
c
c
13Tomasulo Schedulers for OOO
IF
issue
reservation stations
funct. units
CDB
ROB
WB
14Two Memory Units
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
ROB
15Single Processor OOO correctnesslemma 6
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
ROB
16Multi Processor OOO implementation
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
data(i,j)
ROB
17Multi Processor OOO correctnesslemma 7
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
data(i,j)
ROB
18Multi Processor OOO correctnesslemma 7
m
RS
RS
sbuf
MMU
funct. units
LS
CDB
data(i,j)
ROB
19X64 architecture
- CPU core
- R user registers
- SR system registers
- CR3
- acc access
- segmentation
- mmu memory management unit
- tlb translation look aside buffer
- memory system
- mm main memory
- ca cache
- sbuf store buffer
mm
ca
sbuf
acc
mmu
tlb
acc
CR3
segmentation
core
R
20segmentation offlemma 8
- 1 segment
- large as entire address space
- segmentation invisible
mm
ca
sbuf
acc
mmu
tlb
acc
CR3
segmentation
core
R
21Bad news cache state is visible
- CPU core
- acc access
- acc.adr address
- acc.r rights (user,write, exe)
- acc.data
- acc.mmode memory mode
- WB write back
- WT write through ...
- NC no cache
mm or devices
ca
sbuf
acc
mmu
tlb
acc
CR3
core
R
22Good News no device, no NC mode
- acc.mmode memory mode
- WB write back
- WT write through ...
- NC no cache not used
mm
ca
sbuf
acc
mmu
tlb
acc
CR3
core
R
23Sequentially Consistent Physical Memorylemma 9
- acc.mmode memory mode
- WB write back
- WT write through ...
- mix on same address
- PM sequentially consistent physical memory
abstraction - Proof MOESI invariants are maintained
PM
sbuf
acc
mmu
tlb
acc
CR3
core
R
24Initialize page tables
- 1 processor
- sbuf invisible
- operating mode paging disabled
- mmu invisible
- set up page table tree in PM
PM
page tables
sbuf
acc
mmu
tlb
acc
CR3
core
R
25Translated Linear Memory
- many processors
- operating mode paging enabled
- keep tlb consistent
PM
page tables
sbuf
acc
mmu
tlb
acc
CR3
core
R
26Translated Consistent Linear Memory sbufs lemma
10
- many processors
- operating mode paging enabled
- keep tlb consistent
LM
page tables
sbuf
acc
CR3
core
R
27C0 Pascal with C syntaxconfigurations
- c ( pr, rd, lms, hm,gm)
- pr program rest
- rd recursion depth
- lms 0 recursion depth!local memories
- hm heap memory
- gm global memory
- subvariables
- (m,i)17.gpr3
- value of pointers subvariables !
memory m
va(c,(m,i))
size(m,i)
ba(m,i)
28Parallel C
- c ( pr, rd, lms, hm,gm)
- pr program rest
- rd recursion depth
- lms 0 recursion depth!local memories
- hm heap memory
- gm global memory
- Share
- gm
- hm
- Interleave at small steps semantics steps
memory m
va(c,(m,i))
size(m,i)
ba(m,i)
29Parallel C
- c ( pr, rd, lms, hm,gm)
- pr program rest
- rd recursion depth
- lms 0 recursion depth!local memories
- hm heap memory
- gm global memory
- Share
- gm
- hm
- Interleave at small steps semantics steps
- Problem
- Processor interleaves instructions
- of compiled programs code(p)
memory m
va(c,(m,i))
size(m,i)
ba(m,i)
30simulation relation consis(c, alloc, d)
LM
alloc(c,y)
y
alloc(c,p)
p
31Non optimizing compilerstep by step simulation
32Optimizing compilersimulation between IO-steps
33IO-steps (1) volatile accesses
34Volatiles Sequentially Consistentlemma 11
35Structured Parallel C
- Implement Locks using Volatiles
- IO-steps (2) lock release
- Run Processors alone on locked portions
- of linear memory
- Lemma 1 sbufs invisible
- Lemma 10 Ordinary C code in linear memory
36Summary
- Implement Locks using Volatiles
- IO-steps (2) lock release
- Run Processors alone on locked portions
- of linear memory
- Lemma 1 sbufs invisible
- Lemma 10 Ordinary C code in linear memory
- Outlined correctness proof for implementation of
structured parallel C - Initialisation
- compilation