Title: Memory Coherence in Shared Virtual Memory Systems
1Memory Coherence in Shared Virtual Memory Systems
- Kai Li and Paul Hudak
-
- Presentation G. Passas
2Problem and Design Choices
- Coherence granularity 1page (2KB)
- Page synchronization invalidation (instead of
writeback) - Page ownership dynamic (instead of static)
- Coherence violations cause page faults and
- Page fault handlers implement the coherence
protocols.
3A centralized protocol for coherent SVM
Page entry at local page table
Page entry at manager info table
4Distributing the managerial task
- Move synchronization of page ownership totally to
the individual owners
confirmation message is eliminated
- Keep track of ownership at each processors
local page table
- Map pages to processors in a fixed manner
- Each processor knows a probable owner
5A distributed protocol for coherent SVM
Page entry at local page table
- Resembles directories
- At most N-1 messages for each req
6The shortcomings of SVM
- Great false sharing due to coherence granularity
- frequent page transfers of large units
between nodes - The interrupt cost for a message reception is high
7Implementation and performance of Munin
- J. Karpet, J. Bennet and W. Zwaenepoel
Presentation G. Passas
8Core Ideas
- Release consistency to amortize the cost of
message transfers
Propagate invalidations/updates only after a
release
- Multiple consistency (coherence)
- protocols
Cope with different access paterns
Annotate shared variables to indicate preferred
protocol
The core system is basically the same with the
distributed system previously described
9Annotations and protocol params
10Implementation issues
- How are page tables (object directories)
generated? - (preprocessor, auxiliary files, linking, root
Munin thread (fault handler) - How are write updates/invalidations delayed?
- (Delayed update queue, DUQ. Flushed upon a
lock release) - How is synchronization implemented?
- (queue based locks, synch. object directory)
11Performance vs. message passing
- Benchmark multiplication of 2 400x400 matrices
- User time time executing user core
- System time time executing Munin code
12Shortcomings
- Constraints to the programmer
- (A) Annotations
- (B) Use of the systems synchronization libs