INSIDE P - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

INSIDE P

Description:

n3 multiplications, can be done in parallel using n3 processors. the n summations for ci,j can be done using n processors in log n time using binary tree ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 24
Provided by: Pao3
Category:
Tags: inside | thewe

less

Transcript and Presenter's Notes

Title: INSIDE P


1
INSIDE P
  • Parallel Computation
  • examples
  • models of computation
  • complexity classes
  • Logarithmic Space
  • the LNL problem
  • alternation

2
Parallel Computing Examples
  • Parallel Matrix Multiplication
  • how to multiply matrices as fast as possible?
  • Find maximum parallelism
  • n3 multiplications, can be done in parallel
    using n3 processors
  • the n summations for ci,j can be done using n
    processors in log n time using binary tree
  • overall log n time, n3 processors
  • can be reduced to O(log n) time, O(n3/log n)
    processors
  • each processor does log n multiplications
  • the first O(log log n) levels of the addition
    trees are computed locally (in time O(log n))

3
Parallel Computing Examples
  • Reachability
  • how to compute graph reachability (connected
    components, transitive closure) as fast as
    possible?
  • Sequential solution based on DFS, inherently
    sequential
  • at least linear in the diameter, but we want
    O(log n) algorithm
  • Let A be the adjacency matrix of G
  • we can compute A2 in log n time
  • A2k O(k log n) time
  • k log n rounds are sufficient, for O(log2n)
    time overall

4
Parallel Computing Discussion
  • The total amount of work (timex of processors)
    can only be higher the the sequential cost
  • this means that parallel processing will not
    make NP-complete problems feasible
  • We have seen 2 positive examples, where we can
    bring the parallel complexity down to poly-log n,
    using polynomial of processors.
  • that is the best we can hope for
  • there are many problems which are not so
    efficiently parallelisable
  • actually, the second example was not very
    efficient the total amount of parallel work was
    O(log n n3), while sequential solution can do it
    in O(n2).

5
Parallel Computing Models
  • (Multitape) Deterministic Turing machines?
  • no, inherently sequential
  • Non-deterministic TMs?
  • there is parallelism, but of very limited type
  • coordination/communication only at the end by
    the decision rule
  • does not reflect real-life parallelism
  • Boolean Circuits?
  • now, this is inherent parallelism
  • we will use Boolean Circuits to define parallell
    complexity classes

6
Models of Parallel Computaion
  • Parallel complexity measures (for a given circuit
    C)
  • depth parallel time
  • size ( of gates) parallel work
  • Let C (C0, C1, ) be a uniform family of
    boolean circuits, and let f(n) and g(n) be
    functions from integers to integers.
  • We say that parallel time of C is at most f(n) if
    for all n the size of Cn is at most f(n).
    Analogously for g(n) with parallel work.
  • Define PT/WK(f(n),g(n)) to be the class of all
    languages L?0,1 such that there is a uniform
    family of circuits C deciding L with O(f(n))
    parallel time and O(g(n)) parallel work.

7
Models of Parallel Computaion
  • Examples MATRIX MULTIPLICATION ?PT/WK(log n,
    n3/log n)
  • REACHABILITY ? PT/WK(n, n3)
  • using the construction proving REACHABILITY is
    in P
  • but we also know
  • REACHABILITY ? PT/WK(log2n, n3 log n)
  • by the repeated squaring of A
  • Note that while in sequential setting we studied
    the time and space separated, here the time and
    work are closely interrelated
  • also, because we dont know THAT much about
    parallel computing, it is prudent to be careful
    about assumptions/statements, we dont always
    known what can we ignore/simplify and what not

8
Models of Parallel Computaion
  • Circuits are about as convenient as turing
    machines
  • RAMs were more convenient in sequential setting
  • can we have parallel RAMs?
  • Yes, we can
  • PRAM
  • set of q RAMs, each with its own local memory
  • the RAMs are different, but uniform
  • q may depend on the input size
  • there is a global shared memory
  • the computation is in lock-step, accessing
    global memory costs 1 step

9
Models of Parallel Computaion
  • There are several variants of PRAMs, depending on
    how to resolve read/write conflicts to the same
    memory cell
  • but all are within O(log n) factor of each other
  • PRAMs correspond to an idealised parallel
    machine, with as many processors as we need, and
    0 communication cost.
  • what cannot be done efficiently on PRAM, cannot
    be done efficiently on any parallel machine (
    perhaps excetp quantum computing and other
    different physical reality approaches)

10
PRAMs and Circuits
  • Theorem If L ? 0,1 is in PT/WK(f(n), g(n)),
    then there is a uniform PRAM that computes the
    corresponding function FL mapping 0,1 to 0,1
    in parallel time O(f(n)) using O(g(n)/f(n))
    processors.
  • i.e. circuits can be efficiently simulated by
    PRAMs
  • from the uniform function encoding C we can
    construct uniform function encoding the PRAM
  • one processor (RAM) will simulate cca g(n)/f(n)
    gates
  • simulating a gate is straightforward
  • just make sure that when it reads its inputs,
    they are already computed
  • i.e. for each gate calculate the time when for
    sure it will have is data ready, and execute its
    program at that time

11
PRAMs and Circuits
  • The converse is also true Also the circuits can
    simulate PRAMs quite efficiently.
  • Theorem If a function F can be computed by
    uniform PRAM in time f(n) with g(n) processors
    (where f(n) and g(n) can be conmputed from 1n in
    log-space), then there is a uniform family C of
    circuits of depth O(f(n)(f(n)log n)) and size
    O(g(n)f(n)(nkf(n)g(n))) which computed the
    ginary representation of F, where nk is the time
    bound of the log-space TM which on input 1n
    outputs the n-th RAM in the family.
  • configuration of PRAM all program counters and
    all registers altered is of size at most
    f(n)g(n)
  • the next configuration can be computed in time
    O(log l), where l is the largest number in a
    register
  • lots of tests in parallel, but conceptually
    straightforward

12
THE CLASS NC
  • We want to define the class of efficiently
    parallelisable languages.
  • We have seen that circuits and PRAMs are
    polynomially equivalent.
  • We want polynomial total work, and poly-log time,
    i.e
  • NC PT/WK(logkn, nk)
  • union over all exponents k, as with P and NP
  • The argument that this is the right class is not
    as strong as with P vs NP, as e.g. log5n might be
    more then ?n for most reasonable n.
  • So, makes sense to define NCj PT/WK(logjn, nk)
  • the questions whether NCj hierarchy is proper is
    an open problem

13
THE CLASS NC
  • Obviously, NC?P, but is it NCP?
  • again, an open problem
  • So, which problems in P are least likely to be in
    NC?
  • the P-complete ones
  • But do the reductions preserve parallel
    complexity classes?
  • Theorem If L reduces to L?NC, then L?NC
  • using the reduction R and the circuit for L we
    have to construct a circuit for L
  • it is enough to countruct the circuit
    constructing R(x) from x, as then we can attach
    the circuit for L to get the output

14
THE CLASS NC
  • It is enough to construct the circuit
    constructing R(x) from x, as then we can attach
    the circuit for L to get the output
  • there is a log-space bound TM R which accepts
    (x,i) iff i-th bit of R(x) is 1
  • we can solve the reachability problem for R
    with a circuit in NC2, to compute i-th bit of
    R(x)
  • doing that in parallel we can compute the whole
    R(x)
  • Corollary If L reduces to L?NCj, where j?2,
    then L?NCj.
  • Example of a P-complete problem ODD MAX FLOW
  • Given a weighted network, is the maximum flow
    value odd?

15
LOGARITHMIC SPACE
  • We know PSPACE NPSPACE, we dont know PNP,
    what about LNL?
  • still an open problem, but we at least know they
    are not too far apart
  • Theorem NC1?L?NL?NC2
  • the 2nd inclusion is trivial
  • the 3rd follows from the reachability method
  • we produce configuration graph and check whether
    accepting configuration is reachable from the
    starting state - can be done in NC2
  • the 1st one is non-trivial
  • we need an algorithm that in log-space evaluates
    a circuit with logarithmic depth and polynomial
    of gates

16
LOGARITHMIC SPACE
  • We need an algorithm that in log-space evaluates
    a circuit with logarithmic depth and polynomial
    of gates
  • generate the circuit (can be done in L, as the
    circuit family is uniform)
  • unwind the circuit into a tree (each gate gets a
    label corresponding to a path from the root to
    it)
  • evaluate the tree circuit
  • using quasi-DFS
  • needs to remember only the current circuit and
    its value
  • therefore can also be done in L
  • we know how to compose log-space bound
    algorithms
  • from the proof that reductions compose

17
LOGARITHMIC SPACE
  • NC1?L?NL?NC2 can actually be generalised to
  • PT/WK(f(n), kf(n)) ?SPACE(f(n))?NSPACE(f(n))?PT/WK
    (f(n)2, kf(n)2)
  • proving very close relationship between space
    and parallel time
  • they are polynomially bounded - parallel
    computation thesis
  • practically interesting for f(n) log(n), as
    the work needs to be polynomial

18
LOGARITHMIC SPACE
  • Theorem REACHABILITY is NL-complete
  • we show how to reduce any language L from NL to
    REACHABILITY
  • let L be decided by log-space bounded TM M
  • given input x, we can construct in logarithmic
    space the configuration graph of M on input x
  • we can assume M has single accepting node n
  • obviously, x?L if and only if in the
    configuration graph the node n is reachable from
    the starting node s

19
ALTERNATION
  • We can view non-determinism as a tree a
    configurations, and for each configuration to
    define leads to acceptance as follows
  • if it is a leaf configuration, it leads to
    acceptance if it is accepting configuration
  • for a non-leaf configuration leads to
    acceptance is defined as OR of its children
  • the TM accepts if its root configuration leads
    to acceptance

20
ALTERNATION
  • This view allows us to generalize the TMs in yet
    another way
  • Definition An alternating TM is a non-det TM M
    in which the set of states K is partitioned into
    two sets KAND and KOR. Let x be an input, and
    consider the tree of computations of M on x.
  • The leads to acceptance (or eventually
    accepting configuration) boolean function f is
    defined for leafs as before, and for inner node v
    as follows
  • if the state of M in v is from KAND, then f(v)
    is the AND of f(i) over the children i of v
  • if the state of M in v is from KOR, then f(v) is
    the OR of f(i) over the children i of v
  • M accepts x if its initial configuration leads to
    acceptance.

21
ALTERNATION
  • We can define ATIME(f(n)), ASPACE(f(n)), AP, AL
    in the analogous way as for normal TMs.
  • Theorem The MONOTONE CIRCUIT VALUE is
    AL-complete
  • first that MONOTONE CIRCUIT VALUE is in AL
  • if evaluating AND gate, go into qAND state and
    non-deterministically choose a child gate to
    evaluate
  • analogously for OR gates
  • accept/reject at a leaf gate
  • a gate is TRUE iff the corresponding state is
    eventually accepting
  • easily by induction on the depth
  • only the current gate needs to be remembered, in
    AL

22
ALTERNATION
  • now completeness
  • for any language from L ? AL and corresponding
    alternating machine M we construct a monotone
    circuit C such that that C(x) is true iff x?L
  • the gates are all pair of form (c, i), where c
    is a configuration of M on input x and i is the
    step number between 0 and xk
  • step numbers make sure the circuit is acyclic
  • the output of (c1,i) leads to (c2, j) iff c2
    yields in one step c1 and ji1
  • if type (AND or OR) of gate (c,i) depends on
    whether the state in c is in KAND or in KOR
  • accepting configurations are TRUE gate,
    rejecting FALSE gate
  • again, by induction we get x?L iff the circuit
    evaluates to true

23
ALTERNATION
  • Corollary ALP
  • both are closed under reductions, and both have
    the same complete problem
  • Corollary ASPACE(f(n)) TIME(kf(n))
  • using the same ideas
  • i.e. the alternating space is the same as
    deterministic time, just one exponent shifted
  • We might see (if we have time, probably not),
    that alternation gives many alternative
    characterizations on PSPACE
  • AP APP ABP PSPACE
Write a Comment
User Comments (0)
About PowerShow.com