Dariusz Kowalski - PowerPoint PPT Presentation

About This Presentation
Title:

Dariusz Kowalski

Description:

Performing Tasks in Asynchronous Environments Dariusz Kowalski University of Connecticut & Warsaw University joint work with Alex Shvartsman University of Connecticut ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 30
Provided by: UNIVERSITY808
Category:

less

Transcript and Presenter's Notes

Title: Dariusz Kowalski


1
Performing Tasks in Asynchronous Environments
  • Dariusz Kowalski
  • University of Connecticut Warsaw University
  • joint work with
  • Alex Shvartsman
  • University of Connecticut MIT

2
Do-All problem (DHW et al.)
  • DA(p,t) problem abstracts the basic problem of
    cooperation in a distributed setting
  • p processors must perform t tasks, andat least
    one processor must know about it
    Dwork Halpern Waarts
    92/98
  • Tasks are
  • known to every processor
  • similar - each takes similar number of local
    steps
  • independent - may be performed in any order
  • idempotent - may be performed concurrently

3
Do-All synchronous model with crashes
  • Model processors are synchronous, may fail by
    crashes
  • Solutions problem well understood, results close
    to optimal
  • Shared-memory model -- communication by
    read/write
  • Kanellakis, P.C., Shvartsman, A.A.
  • Fault-tolerant parallel computation. Kluwer
    Academic Publishers (1997)
  • Message-passing model -- communication by
    exchanging messages
  • Dwork, C., Halpern, J., Waarts, O.
  • Performing work efficiently in the presence of
    faults.
  • SIAM Journal on Computing, 27 (1998)
  • De Prisco, R., Mayer, A., Yung, M.
  • Time-optimal message-efficient work performance
    in the presence of faults. Proc. of 13th
    PODC, (1994)
  • Chlebus, B., De Prisco, R., Shvartsman, A.A.
  • Performing tasks on synchronous restartable
    message- passing processors. Distributed
    Computing, 14 (2001)

4
Do-All asynchronous models
  • Models
  • Shared-memory model -- communication by
    read/write -- widely studied, but solutions far
    from optimal
  • Kanellakis, P.C., Shvartsman, A.A.
    Fault-tolerant parallel computation. Kluwer
    Academic Publishers (1997)
  • Anderson, R.J., Woll, H. Algorithms for the
    certified Write-All problem. SIAM Journal on
    Computing, 26 (1997)
  • Kedem, Z., Palem, K., Raghunathan, A., Spirakis,
    P. Combining tentative and definite executions
    for very fast dependable parallel computing.
    Proc. of 23rd STOC, (1991)
  • Message-passing model -- communication by
    exchanging messages -- no interesting solutions
    until recently

5
Shared-Memory vs. Message-Passing
  • Shared-Memory (atomic registers)
  • processors communicate by read/write in
    shared-memory
  • atomicity - guarantees that read outputs the last
    written value
  • one read/write operation per local clock cycle
  • information propagates and information is
    persistent
  • Hence cooperation is always possible, although
    delayedHere processor scheduling is the major
    challenge
  • Message-Passing
  • processors communicate by exchanging messages
  • duration of a local step may be unbounded
  • message delays may be unbounded
  • information may not propagate -- send/recv depend
    on delay

6
Message-delay-sensitive approach
  • Even if messages delay are bounded by d
    (d-adversary),cooperation may be difficult
  • Observation
  • If d ?(t) then work must be ?(t p)
  • This means that cooperation is difficult, and
    addressing scheduling alone is not enough - -
    algorithm design and analysis must be d-sensitive
  • Message-delay-sensitive approach
  • C. Dwork, N. Lynch and L. Stockmeyer. Consensus
    in the presence of partial synchrony. J. of the
    ACM, 35 (1988)

7
Measures of efficiency
  • Termination time the first time when all tasks
    are done and at least one processors knows about
    it
  • Used only to define work and message complexity
  • Not interesting on its own if all processors but
    one are delayed then trivially time is ?(t)
  • Work measures the sum, over all processors, of
    the number of local steps taken until termination
    time
  • Message complexity (message-passing model)
    measures number of all point-to-point messages
    sent until termination time

8
Structure of the presentation
  • Part 1 Shared-memory model
  • Model and bibliography
  • Improving AW algorithm in shared-memory by better
    scheduling processors (task load-balancing)
  • Part 2 Message-passing model.
  • Model asynchrony, message delay, and modeling
    issues
  • Delay-sensitive lower bounds for Do-All
  • Progress-tree Do-All algorithms
  • Simulating shared-memory and Anderson-Woll (AW)
  • Asynch. message-passing progress-tree algorithm
  • Permutation Do-All algorithms

9
Shared-Memory - model and goal
  • We consider the following model
  • p asynchronous processors with PID in 0,,p-1
  • processors communicate by read/write in
    shared-memory
  • atomicity - read outputs the last written value
  • one read/write operation per local clock cycle
  • Write-All write 1s into t locations of given
    array

Goal improve scheduling of cooperating
asynchronous processors leading to better
load-balancing wrt tasks
10
Write-All Selected Bibliography
  • Introducing Write-All problem
  • Kanellakis, P.C., Shvartsman, A.A. Efficient
    parallel algorithms can be made robust. PODC
    (1989), Distributed Computing (1992)
  • AW algorithm with work O(t p? )
  • Anderson, R.J., Woll, H. Algorithms for the
    certified Write-All problem. SIAM Journal on
    Computing, 26 (1997)
  • Randomized algorithm with work ?(t plog p)
  • Martel, C., Subramonian, R. On the complexity of
    Certified Write-All algorithms. J. Algorithms 16
    (1994)
  • First work-optimal deterministic algorithm for
    t ?(p4log p)
  • Malewicz, G. A work-optimal deterministic
    algorithm for the asynchronous Certified
    Write-All problem. PODC (2003)

11
Progress tree algorithms BKRS, AW
  • Shared memory
  • p processors, t tasks (p t)
  • q permutations of q
  • q-ary progress tree of depth logq p
  • nodes are binary completion bits
  • Permutations establish the order in which
    the children are visited
  • p processors traverse the tree and use
    q-ary expansion of their PID to choose
    permutations
  • Anderson Woll

1 2 3 q
1 2 3 q
1 2 3 q
12
Algorithm AWT Anderson Woll
  • Progress tree data structure is stored in shared
    memory
  • p, t 9 , q 3
  • ? list of 3 schedules from S3
  • T ternary tree of 9 leaves (progress
    tree), values 0-1
  • PID(j) j-th digit of ternary-representation
    of PID

1
2
3
?0 PID 0,3,6
?1 PID 1,4,7
0
?2 PID 2,5,8
7213
1
2
3
7213
4
5
8
7
9
10
12
11
6
13
Contention of permutations
  • Sn - group of all permutations on set n,
  • with composition ? and identity ?n
  • ?, ? - permutations in Sn
  • ? - set of q permutations from Sn
  • i is lrm (left-to-right maximum) in ? if ?(i) gt
    maxjlti ?(j)
  • LRM(? ) - number of lrm in ? Knuth
  • Cont(?,? ) ?? ?? LRM(? -1 ? ?)
  • Contention of ? Cont(? ) max? Cont(?,? )
    AW
  • Theorem AW For any n gt 0 there exists set ?
    of n permutations from Sn with Cont(? ) ? 3nHn
    ?(n log n).
  • Knuth Knuth, D.E. The art of computer
    programming Vol. 3 (third edition).
    Addison-Wesley Pub Co. (1998)

10
3
5
2
4
6
1
9
7
8
11
14
Procedure Oblivious Do
  • n - number of jobs and units
  • ? - list of n schedules from Sn
  • Procedure Oblivious
  • Forall processors PID 0 to n-1
  • for i 1 to n do
  • perform Job(? PID(i))
  • Execution of Job(? PID(i)) by processor PID is
    primary, if job ? PID(i) has not been previously
    performed
  • Lemma AW In algorithm Oblivious with n units,
    n jobs, and using the list? of n permutations
    from Sn, the number of primary job executions is
    at most Cont(? ).

15
AWT(q) - new progress tree traversal algorithm
  • Instead of using q permutations on set q, we
    use q permutations on set n, where n q2 log
    q
  • p 6 , t 16 , q 2, n 4
  • ? list of 2 schedules from S4
  • T 4-ary tree of 16 leaves (progress
    tree), values 0-1
  • PID(j) j-th digit of ternary-representation
    of PID

?0 PID even
?1 PID odd
0
51014
1
2
3
4
51014
5
6
9
8
10
11
13
12
7
14
15
16
17
18
19
20
16
Main result
  • Set n q2 log q and let ? be list of q
    schedules from Sn
  • Define Cont(?, ?) max? ? ? Cont(?,? )
  • Lemma For sufficiently large q and any set ? of
    at most exp(q2 log2 q) permutations on set q2
    log q, there is a list of q schedules ? from Sn
    such that
  • Cont(?, ?) ? q2 log q 6q log q
  • Take q log p and ? from above Lemma
  • Theorem For every ? gt 0, sufficiently large p
    and t ?(p2?), algorithm AWT(q) performs
    work O(t).

17
Message-Passing - model and goals
  • We consider the following model
  • p asynchronous processors with PID in 0,,p-1
  • processors communicate by message passing
  • in one local step each processor can send a
    message to any subset of processors
  • messages incur delays between send and receive
  • processing of all received messages can be done
    during one local step
  • Goal understand the impact of message delay on
    efficiency of algorithmic solutions for Do-All

18
Lower bound - randomized algorithms
  • Theorem Any randomized algorithm solving DA with
    t tasks using p asynchronous message-passing
    processors performs expected work
  • ?(tp?d?logd1 t)
  • against any d-adversary.
  • Proof (sketch)
  • Adversary partitions computation into stages,
    each containingd time units, and constructs
    delay pattern stage after stage
  • ? delays all messages in stage to be received at
    the end of stage
  • ? delays linear number of processors (which want
    to perform more than (1-1/(3d)) fraction of
    undone tasks) during stage
  • selection is on-line, with high probability has
    good properties

19
Simulating shared-memory algorithms
  • Write-All algorithm AWT
  • Anderson, R.J., Woll, H. Algorithms for the
    certified Write-All problem. SIAM Journal on
    Computing, 26 (1997)
  • Quorum systems Atomic memory services
  • Attiya, H., Bar-Noy, A., Dolev, D. Sharing
    memory robust-ly in message passing systems. J.
    of the ACM, 42 (1996)
  • Lynch, N., Shvartsman, A. RAMBO A
    Reconfigurable Atomic Memory Service. Proc. of
    16th DISC, (2002)
  • Emulating asynchronous shared-memory algorithms
  • Momenzadeh, M. Emulating shared-memory Do-All in
    asynchronous message passing systems. Masters
    Thesis, CSE, University of Conn, (2003)

20
Atomic memory is not required
  • We use q-ary progress trees as the main data
    structure that is written and read -- note
    that atomicity is not required
  • If the following two writes occur (the entire
    tree is written), then a subsequent read may
    obtain a third value that was never written
  • Property of monotone progress
  • 1 at a tree node i indicates that all tasks
    attached to the leaves in the sub-tree rooted in
    i have been performed
  • If 1 is written at a node i in the progress tree
    of a processor, it remains 1 forever

0
0
0
write
write
read
1
0
0
1
1
1
21
Algorithm DAq - traverse progress tree
  • Instead of using shared memory, processors
    broadcast their progress trees as soon as local
    progress is recorded
  • p, t 9 , q 3
  • ? list of 3 schedules from S3
  • T ternary tree of 9 leaves (progress
    tree), values 0-1
  • PID(j) j-th digit of ternary-representation
    of PID

1
2
3
?0 PID 0,3,6
?1 PID 1,4,7
0
?2 PID 2,5,8
7213
1
2
3
7213
4
5
8
7
9
10
12
11
6
22
Algorithm DAq - case p ? t
23
Procedure DOWORK
24
Algorithm DAq - analysis
  • Modification of algorithm DAq for p lt t
  • We partition the t tasks into p jobs of size t
    /p and let the algorithm DAq work with these
    jobs.
  • It takes a processor O(t /p) work (instead of
    constant) to process such a job (job unit).
  • In each step, a processor broadcasts at most one
    message to p-1 other processors, we obtain
  • Theorem 4 For any constant ? gt 0 there is a
    constant q such that the algorithm DAq has work
  • W(p,t,d) O(t?p? p?d??t /d? ? )
  • and message complexity
  • O(p ? W(p,t,d))
  • against any d-adversary (do(t)).

25
Permutation algorithms - case p ? t
  • Algorithms proceed in a loop
  • select the next task using ORDERSELECT rule
  • perform selected task
  • send messages, receive messages, and update state
  • ORDERSELECT rules
  • PARAN1 initially processor PID permutes tasks
    randomly
  • PID selects first task remaining on
    his schedule
  • PARAN2 no initial order
  • PID selects task from remaining sets
    randomly
  • PADET initially processor PID chooses
    schedule ?PID in ?
  • PID selects first task remaining on
    schedule ?PID
  • ? - list of p schedules from St

26
d-Contention of permutations
  • We introduce the notion of d-Contention
  • i is d-lrm in ? if j lt i ?(i) lt ?(j) lt d
  • d 2
  • LRMd(?) - number of d-lrm in ?
  • Contd(?,? ) ?? ?? LRMd(? -1 ? ?)
  • d-Contention of ? Contd(? ) max? Contd(?,?
    )
  • Theorem For sufficiently large p and n, there
    is a list ? of p permutations from Sn such that,
    for every integer d gt1,
  • Contd(? ) ? n log n 5pd ln(en/d).
  • Moreover, random ? is good with high
    probability.

10
3
5
2
4
6
1
9
7
8
11
27
d-Contention and work
  • Lemma For algorithms PADET and PARAN1, the
    respective worst case work and expected work is
    at most
  • Contd(? )
  • against any d-adversary.
  • Example
  • p 2, t 11, d 2

Order of tasks to perform 1,2,3,4,5,6,7,8,9,10,1
1
1
3
2
5
7
4
9
8
6
11
10
1
3
2
5
7
9
11
10
2
4
6
8
10
11
9
7
5
3
1
2
4
6
8
10
11
28
Permutation algorithms - results
  • Theorem Randomized algorithms PARAN1 and PARAN2
    perform expected work
  • O(t?log p p?d?log(t /d))
  • and have expected communication
  • O(t?p?log p p2?d?log(t /d))
  • against any d-adversary (do(t)).
  • Corollary There exists a deterministic list of
    schedules ? such that algorithm PADET performs
    work
  • O(t?log p p?mint,d?log(2t /d))
  • and has communication
  • O(t?p?log p p2?mint,d?log(2t /d))
  • when p ? t.

29
Conclusions and open problems
  • Work-optimal Write-All algorithm for t ?(p2?)
  • First message-delay-sensitive analysis of the
    Do-All problem for asynchronous processors in
    message-passing model
  • lower bounds for deterministic and randomized
    algorithms
  • deterministic and randomized algorithms with
    subquadratic(in p and t ) work for any message
    delay d as long as do(t)
  • Among the interesting open questions are
  • is there work-optimal scheduling for t ?(p log
    p)
  • for algorithm PADET how to construct list ? of
    permutations efficiently
  • closing the gap between the upper and the lower
    bounds
  • investigate algorithms that simultaneously
    control work and message complexity
Write a Comment
User Comments (0)
About PowerShow.com