Breakpoints and Halting in Distributed Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Breakpoints and Halting in Distributed Systems

Description:

Debugger process. 12. Problems with this solution. Communication overheads ... Encapsulate single process behavior. Detect simple events: Entered procedure ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 40
Provided by: asax
Category:

less

Transcript and Presenter's Notes

Title: Breakpoints and Halting in Distributed Systems


1
Breakpoints and Halting in Distributed Systems
  • Presented by
  • Abhishek Saxena
  • CS 739 Distributed Systems
  • Spring 2002

2
References
  • Detecting Relational Global Predicates in
    Distributed Systems by Alexander I. Tomlinson and
    Vijay K. Garg, 1993
  • Breakpoints and Halting in Distributed Programs
    by Barton P. Miller and Jong-Deok Choi, 1992
  • Restoring Consistent Global States of Distributed
    Computations by Goldberg et al., 1991

3
Presentation Layout
  • Introduction
  • Motivation
  • Halting in Distributed Systems
  • Detecting Breakpoints for
  • Conjunctive/Disjunctive/Linked Predicates
  • Relational Predicates
  • Applications to Research
  • Relevance to papers read
  • Conclusions

4
Introduction
  • General problems of
  • Halting distributed programs
  • Detecting breakpoints
  • Validating resource conflicts
  • Recording, restoration and replay of program
    sequences

5
Motivation
  • Why halt?
  • Interactive debugging
  • Issues in distributed systems
  • No single global notion of time
  • Unpredictable communication delays
  • How to issue instant command to all processes?
  • Command to simultaneously reach all processes?

6
Halting
  • 2 pertinent questions
  • How to halt a distributed program?
  • Halting Algorithm
  • When to halt?
  • Breakpoint Detection

7
Halting Algorithm
  • Extends Chandy Lamports algorithm
  • Sending rule
  • Increments last_halt_id
  • Send halt marker containing this value to
    outgoing channels
  • Receiving rule
  • Compare the halt_id with its last_halt_id
    update
  • Send halt marker like sender

8
The Halting Algorithm
Process R
Sending process P
Halt marker
Halt marker
Halt marker
Halt marker
Halt marker
Process U
Process S
9
The Halting Algorithm
  • Intuitive extension to Chandy Lamports
    Algorithm1
  • Leads to a global consistent state since
  • Process states same as recorded process states in
    1
  • Undelivered messages same as recorded channels
    states in 1

10
Problems with this Algorithm
  • Processes that infrequently interact with other
    computation processes
  • Long halting time
  • Acyclic network connection

Consumer
Producer
P
Q
Communication Channel
11
A Solution
  • Centralized debugger process

d
Debugger process
q
p
12
Problems with this solution
  • Communication overheads
  • Possible change in execution of program
  • Complex to build

13
Detecting Breakpoints
  • Breakpoints Predicates
  • Predicate satisfaction breakpoint detection
  • Distributed processes system needs
  • Simple predicates
  • Disjunctive predicates
  • Linked predicatesinteresting!
  • Conjunctive predicatesvery interesting!

14
Simple Predicates
  • Encapsulate single process behavior
  • Detect simple events
  • Entered procedure
  • Message sent / received
  • Channel created / destroyed
  • Process created / destroyed

15
Disjunctive predicates
  • Form
  • DP SP U SP
  • Satisfied when any SP is satisfied
  • Initiate halting when DP is true

16
Linked Predicates
  • Specify sequences of events
  • Form
  • LP DP -gtDP
  • Debugger process sends the LP DP1-gt... to
    processes involved in DP1
  • Upon DP1, strip off DP1 send stripped LP to
    processes involved in DP2

17
Linked predicates implementation
Process Q
Processes involved in DP2
Processes involved in DP1
Process S
Process P
Debugger process
Start halting
DP2
DP2
DP1-gtDP2
DP1-gtDP2
DP1-gtDP2
Process T
Start halting
Start Halting
Process R
18
Conjunctive Predicates
  • Form
  • CP SP n SP
  • Hardest to detect!
  • No single time reference across machines
  • Interpretation based on virtual time
  • Consider processes P1, P2 with virtual time axes
    T1, T2
  • Define
  • SCP (t1, t2) t1e T1, t2e T2, SP(t1) n
    SP(T2)

19
Conjunctive predicates
  • Split SCP into
  • Ordered-SCP
  • (t1, t2) (t1, t2)e SCP, ((SP1) i -gt (SP2) j)
    U ((SP2) i -gt(SP1) j)
  • Unordered-SCP
  • (t1, t2) (t1, t2)e SCP, (t1, t2)
    ordered-SCP

20
Conjunctive Predicates
t11
t21
ordered-SCP pair
t12
t22
unordered- SCP pair
t23
t13
21
Conjunctive Predicates
  • Detecting unordered-SCP events difficult
  • Requires
  • Global information gathering process
  • Time delay!
  • Cannot preserve meaningful process states

22
Detecting Relational Global Predicates
  • Resource conflict validation problems
    undetectable by earlier predicate classes
  • Form
  • ( x0 xn gt C )
  • xi resource usage at Pi
  • C total resource available
  • Undecomposable into earlier classes of predicates

23
How to detect such predicates?
  • 2 algorithms
  • Decentralized runs concurrently
  • Centralized decoupled from the target program

24
Model Notation
  • Partial ordering on S S0, , Sn where, Si
    lt Sj, for 0 lt i,j lt n
  • Happens-before relation -gt
  • pred.u.i Intuitively, is the state just
    preceding u in process i
  • succ.u.i The state just succeeding u in process
    i

25
Concurrent States Intervals
Q
P

9
2
State Interval
10
3
Receive Interval
11
4
Deterministic event
Local state
Non-deterministic event
26
Concurrent Intervals
1, lo1
1, j
1, hi1
P1
0, lo0
0, i
0, hi0
KEY
P0
pred relation
27
Concurrent Intervals
  • Intervals (0,i) (1, j) concurrent iff
  • KEY exists in P0 or P1 s.t.,
  • lo0 lt i lt hi0 lo1 lt j lt hi1,
  • where,
  • the lo0, lo1, hi0, hi1 as defined by the
    previous diagram

28
Overview of algorithms
  • Gather information
  • What?
  • How?
  • Consider 2 processes P0 P1
  • Gather concurrent interval sequences
  • lo0 to hi0 at P0 lo1 to hi1 at P1
  • Check resource violations at all possible pairs
    of states in these sequences!!

29
Algorithms contd
  • Representation of
  • (0, lo0) (0, hi0)
  • (1, lo1) (1, hi1)
  • as a 2x2 Matrix clock
  • Row i of Pis matrix clock Pis vector clock
  • Current interval at Pk (k, Mk , )
  • Row k of Mkpred() of current interval at Pk
  • Row iltgtkpred.pred() of current interval at Pk

30
Maintaining Matrix Clocks
  • Initialize
  • Initialize matrix to 0
  • If k0 or k1 Mkk, k
  • Send message tagged with Mk., . Increment
    Mkk,k for k0 V 1
  • Upon message receive update matrix clock
    Increment Mkk,k
  • Mkk, diagonal(Mk)

31
Matrix Clock Example
3 1 0 1
1 0 0 0
2 1 0 1
P0
2 1 0 1
0 0 0 1
0 0 0 1
0 0 0 2
2 1 2 3
P1
32
Decentralized Algorithm
  • Consider process P0
  • Upon mesg receive evaluate lo0, lo1, hi0, hi1
  • Find min value of resource(x) at P0
  • Send debug mesg (min_x0, lo1, hi1) to P1
  • P1 detects the predicate
  • (min_x0 min_x1 gt C)

33
Overheads Complexity at P0
  • Message overheads
  • ( of receive intervals at P0) sizeof ( 3
    integers)..Debug mesgs
  • Sizeof(4 integers)Application mesgs
  • Memory
  • intervals at P0 min_x for each interval
  • Computation
  • ( intervals at P0)( debug mesgs sent
    received)

34
Centralized Algorithm
  • Checker process runs concurrently or, post-mortem
  • Consider the latter processes P0 P1
  • Processes keep trace files containing
  • min_x for each interval
  • an array of lo0, lo1, hi0, hi1 for each
    interval
  • Runs a check algorithm
  • Builds heaps by inserting the min_x values for
    all concurrent interval sequences at P0 P1
  • Use these heap-tops to detect the predicate

35
Overheads Complexity for P0
  • Memory
  • 4 integers for matrix clock each application
    process
  • Computation
  • Monitor local variables
  • Rest offloaded to checker
  • O(R0 M0logM0 M1logM1)
  • Where, R0 M0 rec intervals total
    intervals at P0

36
Major Practical Problems
  • Reduced complexity from exp to O(nlogn) but
    still
  • Large overheads even for 2 processes
  • Lots of messages!
  • Lots of memory space!
  • Lots of computation!

37
Applications to Research
  • Development of distributed debugging environment
  • Recording of execution sequences
  • Rollback
  • Replay
  • Exploration of new execution scenarios
  • Command of mission-control distributed systems

38
Relevance to Papers Read
  • The S/Nets Linda kernel
  • Debugging distributed tuple space
  • Detecting race conditions, deadlocks, probe
    effects
  • Chandy Lamports paper explores the detection
    of stable predicates and Gargs paper explores
    unstable predicate detection

39
Conclusions
  • Distributed debugging still challenging
  • No efficient algorithm
  • Hard to do away with overheads
  • Need for efficient event monitoring
    manipulation tools
  • Message sequence chart generators
  • Program flow analysis for more independent
    program splitting
Write a Comment
User Comments (0)
About PowerShow.com