Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) - PowerPoint PPT Presentation

About This Presentation
Title:

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes)

Description:

The master and perhaps certain workers are reliable ... they may return to the master incorrect results due to unintended failures ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 38
Provided by: KevinM136
Category:

less

Transcript and Presenter's Notes

Title: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes)


1
Robust Network Supercomputing with Malicious
Processes(Reliably Executing Tasks Upon
Estimating the Number of Malicious Processes)
  • Kishori M. Konwar
  • Sanguthevar Rajasekaran
  • Alexander A. Shvartsman
  • Computer Science Engineering Department
  • University of Connecticut
  • Storrs, CT

2
Motivation
  • Internet supercomputing is increasingly
    becoming a powerful tool for harnessing massive
    amounts of computational resources
  • availability of high bandwidth Internet
    connections
  • there is an enormous number of processes around
    the world
  • comes at a cost substantially lower than
    acquiring a supercomputer or building a cluster
    of powerful machines

3
(No Transcript)
4
TASKS
5
(No Transcript)
6
PrimeNet Server
  • PrimeNet Server is a distributed, massively
    parallel scientific computing Internet
    Supercomputer
  • Supported by Entropia.com and ranks among the
    most
  • powerful computers in the world
  • A project comprised of about 30,000 PCs and
    laptops
  • Currently sustains a 22,296 billion floating
    point operations
  • per second (gigaflops) (operations that
    involve fractional numbers )

7
SETI_at_home
  • SETI_at_home project a massive distributed
    cooperative computer
  • Used for analysis of gigabytes of data for Search
    for Extraterrestrial Intelligence (SETI)
  • Comprises of millions of voluntary machines
    around
  • SETI_at_home project reported its speed to be more
    than 57,290 billion floating point operations
    per second

8
Reliability Issues
  • The master and perhaps certain workers are
    reliable
  • they will correctly execute the tasks assigned
    by the server
  • However, workers are commonly unreliable
  • they may return to the master incorrect results
    due to unintended failures caused, e.g., by
    over-clocked processors
  • may deceivingly claim to have performed assigned
    work so as to obtain incentive such as getting
    higher rank

9
(No Transcript)
10
Some Previous Studies
  • FGLS05 Assumed the worker processes might act
    maliciously and hence deliberately return wrong
    results.
  • goal is to design algorithm that enable the
    master to accept correct results with high
    probability at a lower cost
  • they provided a randomized algorithm
  • unfortunately the cost complexity results depend
    on several parameters and hard to interpret

11
Some Previous Studies (contd)
  • GM05 considered the problem of maximizing the
    expected number of correct result
  • the tasks are dependent
  • any worker computes correctly with probability p
    lt 1 any incorrectly computed task corrupts all
    dependent tasks
  • the goal is to compute a schedule that maximizes
    expected number of correct results under a given
    time constraint
  • they showed the optimization problem to be
    NP-hard
  • provided some solutions on a restricted DAG

12
Overview
  • Models of Computation
  • Stopping Rule Algorithm based solution
  • Detection of Faulty Processors
  • Performing Tasks with Faulty Workers
  • Conclusions

13
Overview
  • Models of Computation
  • Stopping Rule Algorithm based solution
  • Detection of Faulty Processors
  • Performing Tasks with Faulty Workers
  • Conclusions

14
Models of Computation
  • Processes takes steps in lock steps, i.e., in
    synchrony
  • Processes communicate by exchanging messages
  • The tasks are independent and idempotent
  • Processes are subject to failures and can return
    incorrect results maliciously
  • Workers, P 1,2, . . ., n and a master M

15
Work Complexities
  • CDS01 defined as work complexity or available
    processor steps
  • All steps taken by processes during execution
    of the algorithm are counted including the steps
    of the idling and waiting non-faulty processes
  • work
  • DHW92 define work as the number of performed
    tasks counting multiplicities
  • Approach does not charge for idling and waiting
    this is called task oriented work

16
Few Comments
  • work ?
  • We say that an even E occurs with high
    probability (w.h.p.) to mean that PrE 1
    O(n -?) for some constant ? gt 0.

17
Modeling Failures
  • Failure model Fa
  • f-fraction, 0 lt f lt ½ of the n workers may fail
  • Each possibly faulty worker independently
    exhibits faulty behavior with probability
  • 0 lt p lt ½.
  • The master has no a priori knowledge of f and p.

18
Modeling Failures (contd)
  • Failure model Fb
  • There is a fixed bound on the f-fraction, 0 lt f
    lt ½ of the n workers that can be faulty
  • Any worker from the remaining (1-f)-fraction of
    the workers fails with probability 0 lt p lt1/2
    independently of other workers
  • The master knows the values of f and p.

19
Algorithmic Template
  • procedure for master process M, task T
  • Choose a set S ? P
  • Send task T to each processor p ? S
  • Wait for the results from the processes
    in S
  • Decide on the result value v from the
    responses
  • procedure for worker w ? P
  • Wait to receive a task from master M
  • Upon receiving a task from M
  • Execute the task
  • Send the result to M

20
Overview
  • Models of Computation
  • Stopping Rule Algorithm based solution
  • Detection of Faulty Processors
  • Performing Tasks with Faulty Workers
  • Conclusions

21
(?, ?)-approximation algorithm
  • Z is a random variable distributed in the
    interval 0,1 with mean ?Z
  • Z1, Z2, Z3 .... are independently and identically
    distributed according to the random variable Z
  • An (?, ?)-approximation algorithm, with 0 lt ? lt
    1,
  • ? gt 0 for estimating ?Z satisfies
  • Pr?Z (1- ? ) ? ? ?Z (1 ? )
    gt 1 - ?
  • where is the estimated value of ?Z

22
Stopping Rule Algorithm
  • Dagum, Karp, Luby, and Ross 1995
  • Input Parameters (?, ?) with 0 lt ? lt 1, ? gt 0
  • Let ?1 1 (1 ? ) ? // ? 0.72 ?
    4? log(2/ ? )/?2
  • Initialize N ? 0 , S ? 0
  • While S lt ?1 do N ? N1, S ? S ZN
  • Output Z? ?1 /N

23
Stopping Rule Theorem
  • Theorem (Stopping Rule Theorem) Dagum, Karp,
    Luby, and Ross
  • Let Z be a random variable in 0,1 with ?Z
    EZ gt 0. Let
  • be the estimate produced and let NZ be the number
    of
  • experiments that SRA runs with respect to Z on
    input ? and ?.
  • Then,
  • (i) Pr?Z (1- ? ) ? ? ?Z (1 ? ) gt
    1 - ?
  • (ii) ENZ ? ?1 /?Z and
  • (iii) PrNZ gt(1 ? ) ?1 /?Z ? ? /2

24
Algorithm Af,p to estimate f and p
25
Work Complexity of Af,p
  • Theorem Algorithm Af,p is an (?,
    ?)-approximation algorithm,
  • 0 lt ? lt 1, ? gt 0, for the estimation of f and p
    with work
  • complexity O(log2n), complexity O(n log
    n), message
  • complexity O(log2 n) and time complexity O(log
    n), with high
  • probability.

26
Overview
  • Models of Computation
  • Stopping Rule Algorithm based solution
  • Detection of Faulty Processors
  • Performing Tasks with Faulty Workers
  • Conclusions

27
Detection of Faulty Processors
  • Lemma It is not possible to perform all the n
  • tasks correctly, in the failure model Fa with
    linear
  • complexity (i.e., O(n)) with high
    probability.

28
Detection of Faulty Processors
  • procedure for master process M
  • Initially, F ??
  • For t 0, . k log n, k gt 0
  • Choose a set S ? P \ F
  • Send each process p ? S test
    task
  • Wait for the results from the
    processes in S
  • If the response is faulty
  • F? F ? p p is a faulty
    process
  • End If
  • End For

29
Detection of Faulty Processors
  • Lemma The algorithm detects all faulty
    processes among
  • the n workers in O(log n) time with O(n) work
    with high
  • probability
  • TheoremKarp 04 Suppose that a(x) is a
    non-decreasing,
  • continuous function that is strictly increasing
    on x a(x) gt0,
  • and m(x) is a continuous function. Then for
    every positive real x
  • and every positive integer t,
  • PrT(x) gt u(x) ta(x) ?
    (m(x)/x)t
  • where u(x) is the solution to the equation
    u(x)a(x) u(m(x))
  • with m0(x) 0 and mi1(x) m(mi(x)).

30
Overview
  • Models of Computation
  • Stopping Rule Algorithm based solution
  • Detection of Faulty Processors
  • Performing Tasks with Faulty Workers
  • Conclusions

31
Performing Tasks under Fa
  • procedure for master process M
  • Initially, C ?? , J ? set of n tasks
  • Randomly choose a set, possibly with
    repetition, S?P, Skn/log n workers kgt0 is a
    constant
  • For i 1, , k' log n, k' gt 0
  • Send to each worker p?S a test task
  • Collect the responses from all the
    workers.
  • End For
  • If all the responses from a worker p?S are
    correct then
  • C ? C ? p
  • End if
  • For i1, , n/C
  • Send C jobs from J, not sent in previous
    iteration, one to each worker in C.
  • Collect the responses from the C workers
  • End For

32
Work and Time Complexities
  • Theorem The algorithm performs all n tasks
    correctly in
  • O(log n) time and has O(n) work and
    complexities,
  • with high probability.

33
Overview
  • Models of Computation
  • Stopping Rule Algorithm based solution
  • Detection of Faulty Processors
  • Performing Tasks with Faulty Workers
  • Conclusions

34
Performing Tasks under Fb
  • procedure for master process M,
  • For t 0, . k log n, k gt 0
  • Choose a random permutation ??R Sn
  • Foreach j ? n
  • Send task to processor ?(j)
  • End For
  • Collect the responses from all the
    workers
  • End For
  • Foreach j ? n
  • Choose the majority of the results of
    computation for task
  • as the result
  • End For

35
Work and Time Complexities
  • Theorem The algorithm performs all n tasks
    correctly in
  • O(log n) time and has and work
    complexities O(n log n),
  • for 0 lt p, f lt ½ and (1- f)(1- p) gt ½ with high
    probability

36
Overview
  • Models of Computation
  • Stopping Rule Algorithm based solution
  • Detection of Faulty Processors
  • Performing Tasks with Faulty Workers
  • Conclusions

37
Conclusions
  • Perform tasks under above models where the
    tasks are dependent
  • The dependency graph can be DAG
  • Quantify work and time complexities on some
    characteristics of the DAG
Write a Comment
User Comments (0)
About PowerShow.com