Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems PowerPoint PPT Presentation

presentation player overlay
1 / 26
About This Presentation
Transcript and Presenter's Notes

Title: Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems


1
Treadmarks Distributed Shared Memory on Standard
Workstations and Operating Systems
  • P. Keleher, S. Dwarkadas, A. Cox, and W.
    Zwaenepoel
  • The Winter Usenix Conference 1994

Presented by Hyuck Han
2
TOC
  • Preliminary
  • Motivation
  • Solution
  • Lazy Release Consistency
  • Multiple Writer Protocol
  • Evaluation
  • Conclusion

3
Programming Models (1)
4
Programming Models (2)
  • Message-passing model
  • MPI (Message Passing Interface) de facto
    standard
  • Shared-address-space model
  • Shared-memory multiprocessors (SMPs)
  • Processes, threads, etc.
  • Distributed-memory machines
  • Hardware
  • CC-NUMA ( Cache Coherent Non-Uniform Memory
    Access)
  • Software
  • DSM (Distributed Shared Memory)

5
DSM
  • Gives an illusion that memories are shared among
    physically distributed nodes
  • Provides a uniform single address space view

6
Page-based DSM
  • Use virtual memory protection mechanisms and
    signals
  • mmap(), mprotect()
  • SIGSEGV/SIGIO signals

7
Page-based DSM
8
TreadMarks Motivation (1)
  • Sequential Consistency
  • Every write visible immediately
  • Problems
  • of messages
  • latency

9
TreadMarks Motivation (2)
  • False sharing
  • Pieces of the same page updated by different
    processors
  • Leads to ping-pong effect, increasing network
    traffic enormously

10
TreadMarks Solutions
  • Lazy release consistency
  • Multiple-writer protocol

11
Relaxed Consistency Models
  • Delay making writes visible
  • Goal
  • Reduce of messages
  • Hide latency
  • Delay until when?

12
Release Consistency (1)
  • Eager Release Consistency (ERC) (Munin)
  • Write access information is delivered to all the
    shared copies at the release point.
  • Release blocks until acknowledgments have been
    received from all others.

13
Release Consistency (2)
  • Lazy Release Consistency (LRC)
  • Write access information is delivered only to the
    next acquiring copy at the next acquire point.
  • Fewer messages

14
Release Consistency (3)
  • ERC vs. LRC message traffic

15
Multiple Writer Protocol (1)
  • Basic Idea
  • Buffer writes until synchronization events
  • Create diffs
  • Pull in modifications at synchronization events

16
Multiple Writer Protocol (2)
  • Lazy diff creation

17
TreadMarks
1 (A) Acquire ? two writes ? Release 2 (B)
Acquire send CVT(Current Vector Timestamp)
Lock request 3 (A) reply write notices (it
means Bs page is invalid) Lock grant 4
(B) access the page, delts(diffs) request (Lazy
Diff Creation) 5 (A) send diffs
18
TreadMarks
  • 1 Generate Write notices
  • 23 What does A send to B on the red line
  • - a list of intervals known to A but not to B
  • - for each interval in the list
  • - the origin node i and interval number n
  • - is vector clock CVTi during that interval n
    on node i
  • - a list of pages dirtied by i during that
    interval n
  • - these dirty page notifications are called
    write notices
  • 45 Lazy Diff Creation
  • - Diffs created only when the modifications are
    requested
  • - Decreases number of diffs created

19
TreadMarks API
  • Global variables
  • extern unsigned Tmk_nprocs
  • extern unsigned Tmk_proc_id
  • Functions
  • void Tmk_startup (int argc, char argv)
  • void Tmk_exit (int status)
  • void Tmk_barrier (unsigned id)
  • void Tmk_lock_acquire (unsigned id)
  • void Tmk_lock_release (unsigned id)
  • char Tmk_malloc (unsigned size)
  • void Tmk_free (char ptr)

20
Performance
  • Experimental Environment
  • 8 DECstation-5000/240
  • connected to a 100-Mbps ATM LAN and a 10-Mbps
    Ethernet
  • Applications
  • Water molecular dynamics simulation
  • Jacobi Successive Over-Relaxation
  • TSP branch bound algorithm to solve the
    traveling salesman problem
  • Quicksort using bubblesort to sort subarray of
    less than 1K element
  • ILINK genetic linkage analysis

21
Result
22
Execution Time Breakdown (1/2)
23
Execution Time Breakdown (2/2)
24
Lazy vs. Eager Release Consistency(1/2)
25
Lazy vs. Eager Release Consistency (2/2)
26
Conclusion
  • Efficient user-level implementation
  • lazy release consistency, multiple-writer
    protocols and lazy diff creation for reducing the
    cost of communication
  • good speedups for Jacobi, TSP, Quicksort, ILINK
  • moderate sppedups for Water
  • viable technique for parallel computation on
    clusters of workstations connected by suitable
    networking technology
Write a Comment
User Comments (0)
About PowerShow.com