Title: Lower Bounds for Property Testing
1Lower Bounds for Property Testing
- Luca Trevisan
- U C Berkeley
2Sub-linear Time Algorithms
- Want to design algorithms that run in less than
linear time - cannot read entire input
- must be probabilistic and approximate
- For optimization problems
- compute numerical apx of optimum cost (and
implicit representation of apx solution?) - For decision problems
- what is approximation?
3Graph Property Testing GGR
- Testing a property P with accuracy e
- Given graph G that has property P
- accept with probability gt3/4
- Given graph G that is e-far from property P
- accept with probability lt1/4
- e-far must change efraction of representation
of G to get property P - Intuition input (not output) is approximate
4Different Representations
- G is represented as adjacency matrix
- e-far must add/remove en2 edges
- G has max degree d and is represented using
adjacency lists - e-far must add/remove edn edges
- (Some extra subtleties in bounded-degree case)
5Purpose of This Talk
- Discuss algorithms and lower bounds for
- Sub-linear time property testing for some basic
graph properties - Sub-linear time approximation algorithms for some
basic optimization problems - (well mostly discuss lower bounds)
6Motivations
- Large data sets
- web, wall-mart, amazon, phone calls, . . .
- linear time can still be infeasible
- Fine print most research on property testing
focuses on problems having no connection to
applications with large data sets - Goal for theory research
- Develop general algorithmic techniques(like
dynamic programming, local search, for P) - Develop general techniques for impossibility
results(like NP-completeness)
7Property Testing and Approximation inAdjacency
Matrix Representation
8Bipartiteness Algorithm GGR,AK
- Testing bipartiteness of a given graph G
- Pick (1/e)polylog(1/e) vertices, and check if
they induce a bipartite graph if so accept
otherwise reject - If G is bipartite then alg accepts with prob 1
- If G is e-far from bipartite, then whp algorithm
discovers an odd cycle (non-trivial to prove) - Running time O ((1/e2)polylog(1/e))
9Lower Bounds BT
- W(1/e1.5) for adaptive algorithms
- W(1/e2) for non-adaptive algorithms
- The bounds apply to the query complexity of the
algorithm(and to running time for a stronger
reason)
10Proof for one-sided error case
- Pick a random graph with edge-probability 3e
- whp it is e-far from bipartite
- Consider view of (possibly adaptive) algorithm
that makes q queries and finds odd cycle w.h.p. - sees Q(eq) edges and O(e2q2) pairs of connected
vertices - a cycle can be discovered only by querying two
vertices in same connected component - it takes W(1/e) such attempts
- q W (1/e1.5 )
11One-sided error non-adaptive
- Pick a random graph with edge-probability 3e
- Consider view of non-adaptive algorithm that
makes q queries - Same as
- Start with q-edges graph
- Independently delete each edge with prob 1-e
- If qo(1/e2) then view is a forest w.p. 1-o(1)
- Proof There are at most O(qt/2) cycles of length
t
12Two-Sided Error
- Two distributions
- Gfar random graph with edge probability 3e
- Gbip first random partition, then each edge
crossing partition exists with prob 6e - Distributions indistinguishable by
- Non-adaptive algorithms of query complexity
o(1/e2) - Adaptive algorithms of query complexity o(1/e1.5)
- Both tight for these distributions
13Generality/Lessons
- Possible lesson try random graph as a possible
distribution of hard instances far from having
the properties - Not good for Triangle freeness property whose
complexity is possibly most interesting open
question in the adjacency matrix model.
14Triangle-free Graphs
- Want to distinguish triangle-free graphs from
graphs where need to remove en2 edges to break
all triangles - Solvable in time super-exponential in 1/e
- Polynomial in 1/e is impossible Alon
- 2poly(1/e) possible?
- Simplest special case of more general (and
important) question
15Sublinear Time Approximation
- Max CUT and other graph problems can be
approximated within (1e) in graphs with at least
an2 edges in time 2poly(1/ea) GGR - Max 3SAT can be approximated within (1e) in
instances with at least an3 clauses in time
2poly(1/ea) and similar results for other
satisfiability problems AFKK - Lower bounds?
16Property Testing and Approximation in Adjacency
List Representation
17Bipartiteness GR
- Testing bipartiteness
- Repeat polylog n times
- Start at random point, and pick sqrt(n) random
walks of length polylog n, if two of them combine
to form an odd cycle reject, otherwise accept - Analysis
- in a graph where you need to remove constant
fraction of edges to make it bipartite, algorithm
finds odd cycle
18Matching Lower Bound GR
- Define two distributions of graphs
- Gfar a random hamiltonian circuit, plus a random
matching(whp 1/100-far from bipartite) - Gbip a random hamiltonian circuit, plus a random
matching conditioned on making the graph
bipartite - Gfar and Gbip are indistinguishable to algorithms
of query complexity o(sqrt(n)).
19Approximation Algorithms
- Minimum spanning tree
- given a connected weighted graph of degree d with
weights in range 1,,w, can approximate MST
weight within (1e) in time about
O(dw/e2)Chazelle, Rubinfeld, T - Max SAT
- Given a CNF where every variable occurs at most d
times, can approximate Max SAT optimum within
.618, presumably also 2/3, in O(d)
timeHopefully will get 3/4-d
20Testing 3-Colorability
- NP-hard in adjacency list representation
- Only for small enough e
- Can find 3-coloring good for 80 of the edges in
a 3-colorable graph using SDP - NP-hard to find 3-coloring good for 98 (?)
fraction of edges - Gives non-tight, and conditional lower bound for
query complexity
21Other Problems
- Query complexity of following problems is
equivalent to query complexity of testing 3col - Testing satisfiability of 3SAT instance
- Every variable occurs in O(1) clauses, adjacency
list representation - Approximating max cut, vertex cover, independent
set, . . ., in bounded-degree graphs - Approximating Max SAT, Max 2SAT, . . .
- Lower bound of sqrt(n) for all problems
- Reduction from bipartiteness
22Tight Lower Bound BOT
- For one-sided error algorithms
- W(n) query complexity to distinguish
3-colorable graphs from graphs that are (1/3
d)-far - Lower bound applies to testing problems that are
solvable in polynomial time - For two-sided error algorithms
- For some e, W(n) query complexity to distinguish
3-colorable graphs from graphs that are e-far.
23Using Reductions. . .
- Unconditionally, algorithms running in time o(n)
cannot - Approximate Max 3SAT better than 7/8
- Approximate Max Cut in bounded-degree graphs
better than 16/17 - . . .
- Hastad97 proved above problems are NP-hard
24The 3-Coloring Lower Bound
- Consider first one-sided error algorithms
- Its enough to find a graph G that is (1/3
d)-far from 3-colorable, but every subgraph of
size lt an is 3-colorable - (for every d there is an a such that . . .)
- Then an algorithm of query complexity lt an either
accepts G (which is wrong) or rejects some
3-colorable graph (which means the algorithm has
not one-sided error)
25The Graph
- Pick a graph of degree O(1/d2) at random (pick so
many random matchings) - Then it is (1/3 d)-far whp
- But, for some a, whp, every subgraph induced by k
lt an vertices contains lt1.5k edges - In a minimal non-3-colorable graph, every vertex
has degree at least 3 - Every subgraph induced by lt an vertices is
3-colorable - Erdos
26Derandomization
- For constants d, e, a, and for every suff large
n, we can explicitly construct a graph - on n vertices,
- max degree d,
- e-far from 3-colorable,
- such that every subset of an vertices induces a
3-colorable subgraph.
27Two-Sided Error Algorithms
- Need to define two distributions of graphs Gcol
and Gfar such that - Graphs in Gcol are (almost) always 3-colorable
- Graphs in Gfar are (almost) always far from
3-colorable - To an algorithm of bounded query complexity, Gcol
and Gfar look (almost) the same
28Main Step
- Define two distributions Dsat and Dfar of
instances of E3LIN-2(systems over GF(2) with 3
variables per equation) - Systems in Dsat are always satisfiable
- Systems in Dfar are (almost) always (1/2-d)-far
from satisfiable - To an algorithm of bounded query complexity, Dsat
and Dfar look the same - We get Gcol and Gfar using reduction
fromapproximate E3LIN-2 to approximate 3-coloring
29E3LIN-2
- X1 X3 X10 0 mod 2
- X2 X3 X4 1 mod 2
- X1 X2 X9 0 mod 2
- . . .
30Main Building Block
- We show that for every c there is a such that
there exists a left-hand side with - n variables, cn equations, 3 variables per
equations, every variable occurs in 3c equations - every an equations are linearly independent
- Pick the left-hand side at random
- repeat 3c times pick at random a set of n/3
disjoint triples of variables - Explicit construction?
- Need strong unique-neighbor expanders
31Distributions
- The left-hand side is always as before
- In Dsat, we pick a random assignment to the
variables, and set right-hand side consistently - always satisfiable
- In Dfar, we pick the right-hand side uniformly at
random - With high probability, (1/2 O(1/sqrt c))-far
32Indistinguishability
- Two distributions differ only in right-hand side
- In Dfar uniformly distributed
- In Dsat, an-wise independent
- Linear independence implies statistical
independence - Look the same to algorithm that sees less than an
equations
33Conclusion of the Argument
- No algorithm of query complexity o(n) can
distinguish satisfiable instances of E3LIN-2 from
instances that are (1/2-d)-far from satisfiable - For some e, no algorithm of query complexity o(n)
can distinguish 3-colorable graphs from graphs
that efar from 3-col. - No algorithm of query complexity o(n) can
approximate Max 3SAT better than 7/8 . . .
34Generality/Lessons
- Reductions are useful and extend results to
several problems - In adjacency matrix (dense graph) setting,
several and general algorithms. Few and ad-hoc
lower bounds - In adjacency list (sparse graph) setting, vice
versa.
35Open Questions
- Show that distinguishing 3-colorable graphs from
(1/3-d)-far graphs requires query complexity W(n) - we can only prove it for one-sided error
- Show that approximating Max SAT better than ¾ and
Max CUT bettter than ½ requires query complexity
W(n) - we only know W(sqrt(n)) implicit in GR
- would explain why we need SDP