Title: Lower Bounds for Property Testing
1Lower Bounds for Property Testing
- Luca Trevisan
- U.C. Berkeley
- Joint work with Andrej Bogdanov and Kenji Obata
2Sub-linear Time Algorithms
- Want to design algorithms that run in less than
linear time (and so cannot read entire input). - Must be probabilistic and approximate
- For optimization problems
- Compute numerical apx of optimum cost (and
implicit representation of apx solution?) - For decision problems
- What is approximation for decision problems?
3(Graph) Property Testing
- Testing a property P with accuracy e in adjacency
matrix representation - Given graph G that has property P, accept with
probability gt3/4 - Given graph G that is e-far from property P
accept with probability lt1/4 - e-far must change efraction of adjacency
matrix to get property P
(add/remove gt en2 edges)
4Example GGR,AK
- Testing bipartiteness of a given graph G
- Pick (1/e)polylog(1/e) vertices, and check if
they induce a bipartite graph if so accept
otherwise reject - If G is bipartite then alg accepts with prob 1
- If G is e-far from bipartite, then whp algorithm
discovers an odd cycle (non-trivial to prove) - Running time O ((1/e2)polylog(1/e))
- We will discuss matching lower bound if time
allows
5Paleontologists approach
6Bounded Degree Graphs
- Testing a property P with accuracy e in adjacency
lists representation - Given graph G that has property P, accept with
probability gt3/4 - Given graph G that is e-far from property P
accept with probability lt1/4 - e-far must change efraction of adjacency
lists entries to get property P
(add/remove gt edn edges)
7Bipartiteness GR
- Testing bipartiteness
- Repeat polylog n times
- Start at random point, and pick sqrt(n) random
walks of length polylog n, if two of them combine
to form an odd cycle reject, otherwise accept - Analysis
- in a graph where you need to remove constant
fraction of edges to make it bipartite, algorithm
finds odd cycle
8Matching Lower Bound GR
- Define two distributions of graphs
- Gfar a random hamiltonian circuit, plus a random
matching(whp 1/100-far from bipartite) - Gbip a random hamiltonian circuit, plus a random
matching conditioned on making the graph
bipartite - Gfar and Gbip are indistinguishable to algorithms
of query complexity o(sqrt(n)).
9Sub-linear Time Approximation
- Minimum spanning tree
- given a connected weighted graph of degree d with
weights in range 1,,w, can approximate MST
weight within (1e) in time about
O(dw/e2)Chazelle, Rubinfeld, T - Max SAT
- Given a CNF where every variable occurs at most d
times, can approximate Max SAT optimum within
.618, presumably also 2/3, in O(d) timework in
progress, hopefully will get 3/4-d
10Sublinear Time Approximation
- Problems restricted to dense instances
- Max CUT and other graph problems can be
approximated within (1e) in graphs with at least
an2 edges in time 2poly(1/ea) GGR - Max 3SAT can be approximated within (1e) in
instances with at least an3 clauses in time
2poly(1/ea) and similar results for other
satisfiability problemsAFKK
11General Goals
- When looking for polynomial-time algorithms
- Several algorithmic techniques of general
applicability - A general technique to prove impossibility
(NP-completeness) - For sublinear-time algorithms
- General algorithmic techniques?
- Impossibility results?
12Testing 3-Colorability
- Easy in adjacency matrix representation
- NP-hard in adjacency list representation
- Only for small enough e
- Can find 3-coloring good for 80 of the edges in
a 3-colorable graph using SDP - NP-hard to find 3-coloring good for 98 (?)
fraction of edges - Non-tight, and conditional lower bound for query
complexity
13Other problems
- The query complexity of following problems is
equivalent to query complexity of testing 3col - Testing satisfiability of 3SAT instance
- Every variable occurs in O(1) clauses, adjacency
list representation - Approximating max cut, vertex cover, independent
set, . . ., in bounded-degree graphs - Approximating Max SAT, Max 2SAT, . . .
- Lower bound of sqrt(n) for all problems
- Nothing better except with complexity assumptions
14Our Results
- For one-sided error algorithms
- W(n) query complexity to distinguish 3-colorable
graphs from graphs that are (1/3 d)-far - Lower bound applies to testing problems that are
solvable in polynomial time - For two-sided error algorithms
- For some e, W(n) query complexity to distinguish
3-colorable graphs from graphs that are e-far.
15Additional Results
- Unconditionally, algorithms running in time o(n)
cannot - Approximate Max 3SAT better than 7/8
- Approximate Max Cut in bounded-degree graphs
better than 16/17 - . . .
- Hastad97 proved above problems are NP-hard
16The 3-Coloring Lower Bound
- Consider first one-sided error algorithms
- Its enough to find a graph G that is (1/3
d)-far from 3-colorable, but every subgraph of
size lt an is 3-colorable - (for every d there is an a such that . . .)
- Then an algorithm of query complexity lt an either
accepts G (which is wrong) or rejects some
3-colorable graph (which means the algorithm has
not one-sided error)
17The Graph
- Pick a graph of degree O(1/d2) at random (pick so
many random matchings) - Then it is (1/3 d)-far whp
- But, for some a, whp, every subgraph induced by k
lt an vertices contains lt1.5k edges - In a minimal non-3-colorable graph, every vertex
has degree at least 3 - Every subgraph induced by lt an vertices is
3-colorable - Erdos
18Explicit Construction
- Can the previous construction be derandomized?
- For constants d, e, a, and for every suff large
n, we can explicitly construct a graph on n
vertices, max degree d, e-far from 3-colorable,
and such that every subset of an vertices induces
a 3-colorable subgraph.
19Explicit Construction
- We construct a 3SAT formula such that for
constants k, e, a - Every variable occurs k times
- No assignment satisfies more than 1-e fraction
of clauses - Every a fraction of clauses is satisfiable
- Then we use (slightly new) reduction from 3SAT to
3Coloring
20The Formula
- Fix a degree-d expander graph G(V,E) such that
for every cut (S,V-S) at least minS,V-S
edges cross the cut(enough d14) - Have two variables xuv and xvu for each egde
(u,v) - For every vertex v have the (3SAT equivalent of)
the constraint - Su xuv 1 Sw xvw
21Structure of the Analysis
- Impossible to satisfy more than a fraction
1/(d1) of the constraints - Can always satisfy half of the constraint
- define an auxiliary network
- show that the auxiliary network has no small cut
because of expansion - then there is a large flow
- use large flow to find assignment for subset of
constraint
22Flow Argument
- Want to satisfy constraints corresponding to
vertices in C, with C lt V/2
Construct flow network with new source s, sink t
obtained by collapsing V-C, and vertices in C
V-C
s
t
C
23Flow Argument
A edges
A
t
- Every cut has size at least C
- There is a 0/1 flow of cost at least C
- Interpreted as an assignment, satisfies all
constraints in C
s
C-A edges
C-A
24Two-Sided Error Algorithms
- Need to define two distributions of graphs Gcol
and Gfar such that - Graphs in Gcol are (almost) always 3-colorable
- Graphs in Gfar are (almost) always far from
3-colorable - To an algorithm of bounded query complexity, Gcol
and Gfar look (almost) the same
25Main Step
- Define two distributions Dsat and Dfar of
instances of E3LIN-2(systems over GF(2) with 3
variables per equation) - Systems in Dsat are always satisfiable
- Systems in Dfar are (almost) always (1/2-d)-far
from satisfiable - To an algorithm of bounded query complexity, Dsat
and Dfar look the same - We get Gcol and Gfar using reduction
fromapproximate E3LIN-2 to approximate
3-coloring
26E3LIN-2
- X1 X3 X10 0 mod 2
- X2 X3 X4 1 mod 2
- X1 X2 X9 0 mod 2
- . . .
27Main Building Block
- We show that for every c there is a such that
there exists a left-hand side with - n variables, cn equations, 3 variables per
equations, every variable occurs in 3c equations - every an equations are linearly independent
- Pick the left-hand side at random
- repeat 3c times pick at random a set of n/3
disjoint triples of variables - Explicit construction?
28Distributions
- The left-hand side is always as before
- In Dsat, we pick a random assignment to the
variables, and set right-hand side consistently - always satisfiable
- In Dfar, we pick the right-hand side uniformly at
random - With high probability, (1/2 O(1/sqrt c))-far
29Indistinguishability
- Two distributions differ only in right-hand side
- In Dfar uniformly distributed
- In Dsat, an-wise independent
- Linear independence implies statistical
independence - Look the same to algorithm that sees less than an
equations
30Conclusion of the Argument
- No algorithm of query complexity o(n) can
distinguish satisfiable instances of E3LIN-2 from
instances that are (1/2-d)-far from satisfiable - For some e, no algorithm of query complexity o(n)
can distinguish 3-colorable graphs from graphs
that efar from 3-col. - No algorithm of query complexity o(n) can
approximate Max 3SAT better than 7/8 . . .
31Open Questions
- Show that distinguishing 3-colorable graphs from
(1/3-d)-far graphs requires query complexity W(n) - we can only prove it for one-sided error
- Show that approximating Max SAT better than ¾ and
Max CUT bettter than ½ requires query complexity
W(n) - we only know W(sqrt(n)) implicit in GR
- would explain why we need SDP
32Back to Dense Graphs
- Recall Alon-Krivelevich bipartiteness test for
the adjacency matrix representation - pick (1/e)polylog(1/e) vertices and look at
induced subgraph - if see odd cycle reject, otherwise accept
- Running time (1/e2)polylog(1/e)
- We prove
- W(1/e2) for non-adaptive algorithms
- W(1/e1.5) for adaptive algorithms
33Two Distributions
- Gfar every edge exists with probability e
- whp it is e/3-far from bipartite
- Gbip pick a random partition, then every edge
that crosses the partition exists with
probability 2e - Thm1 look the same to non-adaptive algorithms
making o(1/e2) queries - Thm2 look the same to adaptive algorithms making
o(1/e1.5) queries
34Proof of a Weaker Statement
- Thm1 (weaker) a non-adaptive algorithm making
qo(1/e2) queries in Gfar is unlikely to see an
odd cycle - Proof
- a non-adaptive algorithm asks about some subgraph
with q edges. - There are at most about qt/2 cycles of length t,
and each one exists with probability etqt/2,
exponentially small in t. - Summing over all t, its still unlikely that
there is a cycle
35Proof of a Weaker Statement
- Thm2 (weaker) an adaptive algorithm making
qo(1/e1.5) queries in Gfar is unlikely to see an
odd cycle - Proof
- the algorithm sees an edge only once in 1/e
queries - the algorithm sees a cycle only after querying a
pair that it already sees as connects - It takes 1/e.5 edges to have 1/e pairs of
connected vertices - It takes 1/e1.5 queries to have so many edges
36Some more open questions
- In adjacency matrix representation, most
interesting problems solvable in constant (in e)
time - For some problems (eg testing triangle-freeness)
analysis uses Szemeredys regularity lemma, and
constant is hyper-exponential in e - Lower bound (1/e)log 1/ e and only and for
one-sided error - Alternative analysis / stronger lower bounds?