Title: CSCS: A Concise Implementation of User-Level Distributed Shared Memory
1CSCS A Concise Implementation of User-Level
Distributed Shared Memory
Final Presentation
- Zhi Zhai Feng Shen
- Computer Science and Engineering
- University of Notre Dame
- Dec. 11, 2009
2DSM Overview
- DSM Characteristics
- Physically distributed memory
- Logically a single shared address space
Figure 1 DSM architecture
3Related Work
- Models and Main Features
- IVY (Yale)
- - Divided Space Shared Private space
- Mirage (UCLA)
- - Time Interval d Avoid page thrashing
- TreadMarks (Rice)
- - Lazy Release Consistency Improve
efficiency - SAM (Stanford)
4System Design
Figure 2 Server/Client mode
5System Design
- Server
- Holder of metadata only
- Thread-based Connection
- Event-based Service
6System Design
Figure 3 Server Process/Threads
7System Design
- Client
- Physical memory owner
- UI/Work/Page Fetch Thread
- Fixed-home Protocol
- Not Aware of Peer Clients
8System Design
Figure 4 Client process/thread
9System Design
Figure 5 Sample Operation
10Implementation
- Message Passing TCP socket
Figure 6 Message Passing
11Implementation
- Server/Client Page Table
- Server holds most up-to-date meta data
- Server managers whole virtual memory space
- Server records id addresses of all nodes
- Client owns the most up-to-date local memory
segment - Client caches referenced pages from peer nodes
12Client ID IP Address
0 129.74.155.107 (e.g.)
1 129.74.155.122
.
Figure 7 Connection Table
Page Frame Access Bits Page Owner
0 57 PROT_READ 1
1 67 PROT_READPROT_WRITE 1
2 57 PROT_READ 3
Figure 8 Server Page Table
13Implementation
Page Frame Access Bits Page Owner Ref Count
0 30 PROT_READ 1 0
1 31 PROT_READ 1 0
2 32 PROT_READ 1 4
3 60 PROT_READPROT_WRITE 1 1
4 200 PROT_READ 5 0
Figure 9 Client Page Table
14Implementation
- Page fault handler
- Client ?? Server
- Check the access right
- Fetch the page owner id/address
- Update global access bits
- Client ?? Client
- Connect to the page owner
- Cache the referenced page
- Update local access bits
15Implementation
- Page fault handler
- Page fault type
- Read remote page
- Write on a page
- Assumption
- Reading happens more often than writing
- Writing needs most-to-date copy more than reading
16Implementation
Truly a remote reading fault?
dsm call dsm_do_wrt_page ()
Figure 10 Page fault handler wordflow
17Implementation
- Memory Consistency Model
- Assumption Revisit
- Reading happens more often than writing
- Writing needs most-to-date copy more than reading
- Multi-Reader/Single Writer
- Snap-shot for reading
- Every writing triggers page fault
- Locks on pages being referenced
- Semaphore-like reference counts
- If ref_count gt 0 ? Waiting/Re-random
18DSM Evaluation
Figure 11 Parallel Computation on ASP Problem
19DSM Evaluation
Figure 12 Execution time comparison
20DSM Evaluation
Figure 13 Message Transmission
Comparison
21DSM Evaluation
Figure 14 Network Traffic Comparison
22Future Work
- Enhance system robustness
- Evaluate scalability boundary
- Provide better programmability
23Thank You! QA