Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes)

About This Presentation

Title:

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes)

Description:

The master and perhaps certain workers are reliable ... they may return to the master incorrect results due to unintended failures ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 38

Provided by: KevinM136

Learn more at: http://groups.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes)

1
Robust Network Supercomputing with Malicious
Processes(Reliably Executing Tasks Upon
Estimating the Number of Malicious Processes)

Kishori M. Konwar
Sanguthevar Rajasekaran
Alexander A. Shvartsman
Computer Science Engineering Department
University of Connecticut
Storrs, CT

2
Motivation

Internet supercomputing is increasingly
becoming a powerful tool for harnessing massive
amounts of computational resources
availability of high bandwidth Internet
connections
there is an enormous number of processes around
the world
comes at a cost substantially lower than
acquiring a supercomputer or building a cluster
of powerful machines

3
(No Transcript)
4
TASKS
5
(No Transcript)
6
PrimeNet Server

PrimeNet Server is a distributed, massively
parallel scientific computing Internet
Supercomputer
Supported by Entropia.com and ranks among the
most
powerful computers in the world
A project comprised of about 30,000 PCs and
laptops
Currently sustains a 22,296 billion floating
point operations
per second (gigaflops) (operations that
involve fractional numbers )

7
SETI_at_home

SETI_at_home project a massive distributed
cooperative computer
Used for analysis of gigabytes of data for Search
for Extraterrestrial Intelligence (SETI)
Comprises of millions of voluntary machines
around
SETI_at_home project reported its speed to be more
than 57,290 billion floating point operations
per second

8
Reliability Issues

The master and perhaps certain workers are
reliable
they will correctly execute the tasks assigned
by the server
However, workers are commonly unreliable
they may return to the master incorrect results
due to unintended failures caused, e.g., by
over-clocked processors
may deceivingly claim to have performed assigned
work so as to obtain incentive such as getting
higher rank

9
(No Transcript)
10
Some Previous Studies

FGLS05 Assumed the worker processes might act
maliciously and hence deliberately return wrong
results.
goal is to design algorithm that enable the
master to accept correct results with high
probability at a lower cost
they provided a randomized algorithm
unfortunately the cost complexity results depend
on several parameters and hard to interpret

11
Some Previous Studies (contd)

GM05 considered the problem of maximizing the
expected number of correct result
the tasks are dependent
any worker computes correctly with probability p
lt 1 any incorrectly computed task corrupts all
dependent tasks
the goal is to compute a schedule that maximizes
expected number of correct results under a given
time constraint
they showed the optimization problem to be
NP-hard
provided some solutions on a restricted DAG

12
Overview

Models of Computation
Stopping Rule Algorithm based solution
Detection of Faulty Processors
Performing Tasks with Faulty Workers
Conclusions

13
Overview

Models of Computation
Stopping Rule Algorithm based solution
Detection of Faulty Processors
Performing Tasks with Faulty Workers
Conclusions

14
Models of Computation

Processes takes steps in lock steps, i.e., in
synchrony
Processes communicate by exchanging messages
The tasks are independent and idempotent
Processes are subject to failures and can return
incorrect results maliciously
Workers, P 1,2, . . ., n and a master M

15
Work Complexities

CDS01 defined as work complexity or available
processor steps
All steps taken by processes during execution
of the algorithm are counted including the steps
of the idling and waiting non-faulty processes
work
DHW92 define work as the number of performed
tasks counting multiplicities
Approach does not charge for idling and waiting
this is called task oriented work

16
Few Comments

work ?
We say that an even E occurs with high
probability (w.h.p.) to mean that PrE 1
O(n -?) for some constant ? gt 0.

17
Modeling Failures

Failure model Fa
f-fraction, 0 lt f lt ½ of the n workers may fail
Each possibly faulty worker independently
exhibits faulty behavior with probability
0 lt p lt ½.
The master has no a priori knowledge of f and p.

18
Modeling Failures (contd)

Failure model Fb
There is a fixed bound on the f-fraction, 0 lt f
lt ½ of the n workers that can be faulty
Any worker from the remaining (1-f)-fraction of
the workers fails with probability 0 lt p lt1/2
independently of other workers
The master knows the values of f and p.

19
Algorithmic Template

procedure for master process M, task T
Choose a set S ? P
Send task T to each processor p ? S
Wait for the results from the processes
in S
Decide on the result value v from the
responses
procedure for worker w ? P
Wait to receive a task from master M
Upon receiving a task from M
Execute the task
Send the result to M

20
Overview

Models of Computation
Stopping Rule Algorithm based solution
Detection of Faulty Processors
Performing Tasks with Faulty Workers
Conclusions

21
(?, ?)-approximation algorithm

Z is a random variable distributed in the
interval 0,1 with mean ?Z
Z1, Z2, Z3 .... are independently and identically
distributed according to the random variable Z
An (?, ?)-approximation algorithm, with 0 lt ? lt
1,
? gt 0 for estimating ?Z satisfies
Pr?Z (1- ? ) ? ? ?Z (1 ? )
gt 1 - ?
where is the estimated value of ?Z

22
Stopping Rule Algorithm

Dagum, Karp, Luby, and Ross 1995
Input Parameters (?, ?) with 0 lt ? lt 1, ? gt 0
Let ?1 1 (1 ? ) ? // ? 0.72 ?
4? log(2/ ? )/?2
Initialize N ? 0 , S ? 0
While S lt ?1 do N ? N1, S ? S ZN
Output Z? ?1 /N

23
Stopping Rule Theorem

Theorem (Stopping Rule Theorem) Dagum, Karp,
Luby, and Ross
Let Z be a random variable in 0,1 with ?Z
EZ gt 0. Let
be the estimate produced and let NZ be the number
of
experiments that SRA runs with respect to Z on
input ? and ?.
Then,
(i) Pr?Z (1- ? ) ? ? ?Z (1 ? ) gt
1 - ?
(ii) ENZ ? ?1 /?Z and
(iii) PrNZ gt(1 ? ) ?1 /?Z ? ? /2

24
Algorithm Af,p to estimate f and p
25
Work Complexity of Af,p

Theorem Algorithm Af,p is an (?,
?)-approximation algorithm,
0 lt ? lt 1, ? gt 0, for the estimation of f and p
with work
complexity O(log2n), complexity O(n log
n), message
complexity O(log2 n) and time complexity O(log
n), with high
probability.

26
Overview

Models of Computation
Stopping Rule Algorithm based solution
Detection of Faulty Processors
Performing Tasks with Faulty Workers
Conclusions

27
Detection of Faulty Processors

Lemma It is not possible to perform all the n
tasks correctly, in the failure model Fa with
linear
complexity (i.e., O(n)) with high
probability.

28
Detection of Faulty Processors

procedure for master process M
Initially, F ??
For t 0, . k log n, k gt 0
Choose a set S ? P \ F
Send each process p ? S test
task
Wait for the results from the
processes in S
If the response is faulty
F? F ? p p is a faulty
process
End If
End For

29
Detection of Faulty Processors

Lemma The algorithm detects all faulty
processes among
the n workers in O(log n) time with O(n) work
with high
probability
TheoremKarp 04 Suppose that a(x) is a
non-decreasing,
continuous function that is strictly increasing
on x a(x) gt0,
and m(x) is a continuous function. Then for
every positive real x
and every positive integer t,
PrT(x) gt u(x) ta(x) ?
(m(x)/x)t
where u(x) is the solution to the equation
u(x)a(x) u(m(x))
with m0(x) 0 and mi1(x) m(mi(x)).

30
Overview

Models of Computation
Stopping Rule Algorithm based solution
Detection of Faulty Processors
Performing Tasks with Faulty Workers
Conclusions

31
Performing Tasks under Fa

procedure for master process M
Initially, C ?? , J ? set of n tasks
Randomly choose a set, possibly with
repetition, S?P, Skn/log n workers kgt0 is a
constant
For i 1, , k' log n, k' gt 0
Send to each worker p?S a test task
Collect the responses from all the
workers.
End For
If all the responses from a worker p?S are
correct then
C ? C ? p
End if
For i1, , n/C
Send C jobs from J, not sent in previous
iteration, one to each worker in C.
Collect the responses from the C workers
End For

32
Work and Time Complexities

Theorem The algorithm performs all n tasks
correctly in
O(log n) time and has O(n) work and
complexities,
with high probability.

33
Overview

Models of Computation
Stopping Rule Algorithm based solution
Detection of Faulty Processors
Performing Tasks with Faulty Workers
Conclusions

34
Performing Tasks under Fb

procedure for master process M,
For t 0, . k log n, k gt 0
Choose a random permutation ??R Sn
Foreach j ? n
Send task to processor ?(j)
End For
Collect the responses from all the
workers
End For
Foreach j ? n
Choose the majority of the results of
computation for task
as the result
End For

35
Work and Time Complexities

Theorem The algorithm performs all n tasks
correctly in
O(log n) time and has and work
complexities O(n log n),
for 0 lt p, f lt ½ and (1- f)(1- p) gt ½ with high
probability

36
Overview

Models of Computation
Stopping Rule Algorithm based solution
Detection of Faulty Processors
Performing Tasks with Faulty Workers
Conclusions

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) - PowerPoint PPT Presentation

Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes)

The master and perhaps certain workers are reliable ... they may return to the master incorrect results due to unintended failures ... – PowerPoint PPT presentation