Title: An Introduction to Property Testing
1An Introduction to Property Testing
2Property Testing The Art of Uninformed
Decisions
- This talk developed largely from a survey by
Eldar Fischer F97--fun reading.
3The Age-Old Question
- Accuracy or speed?
- Work hard or cut corners?
4The Age-Old Question
- Accuracy or speed?
- Work hard or cut corners?
- In CS heuristics and approximation algorithms
vs. exact algorithms for NP-hard problems.
5The Age-Old Question
6The Age-Old Question
- Both well-explored. But
- Constant time is the new polynomial time.
7The Age-Old Question
- Both well-explored. But
- Constant time is the new polynomial time.
- So, are there any corners left to cut?
8The Setting
- Given a set of n inputs x and a property P,
determine if P(x) holds.
9The Setting
- Given a set of n inputs x and a property P,
determine if P(x) holds. - Generally takes at least n steps (for sequential,
nonadaptive algorithms).
10The Setting
- Given a set of n inputs x and a property P,
determine if P(x) holds. - Generally takes at least n steps. (for
sequential, nonadaptive algorithms). - Possibly infeasible if we are doing internet or
genome analysis, program checking, PCPs,
11First Compromise
- What if we only want a check that rejects any x
such that P(x), with probability gt 2/3? Can we
do better?
12First Compromise
- What if we only want a check that rejects any x
such that P(x), with probability gt 2/3? Can we
do better? - Intuitively, we must expect to look at almost
all input bits if we hope to reject x that are
only one bit away from satisfying P.
13First Compromise
- What if we only want a check that rejects any x
such that P(x), with probability gt 2/3? Can we
do better? - Intuitively, we must expect to look at almost
all input bits if we hope to reject x that are
only one bit away from satisfying P. - So, no.
14Second Compromise
- This kind of failure is universal. So, we must
scale our hopes back.
15Second Compromise
- This kind of failure is universal. So, we must
scale our hopes back. - The problem those almost-correct instances are
too hard.
16Second Compromise
- This kind of failure is universal. So, we must
scale our hopes back. - The problem those almost-correct instances are
too hard. - The solution assume they never occur!
17Second Compromise
- Only worry about instances y that either satisfy
P or are at an edit distance of cn from any
satisfying instance. (we say y is c-bad.)
18Second Compromise
- Only worry about instances y that either satisfy
P or are at an edit distance of cn from any
satisfying instance. (we say y is c-bad.) - Justifying this assumption is app-specific the
excluded middle might not arise, or it might
just be less important.
19Model Decisions
- Adaptive or non-adaptive queries?
20Model Decisions
- Adaptive or non-adaptive queries?
- Adaptivity can be dispensed with at the cost of
(exponentially many) more queries.
21Model Decisions
- Adaptive or non-adaptive queries?
- Adaptivity can be dispensed with at the cost of
(exponentially many) more queries. - One-sided or two-sided error?
22Model Decisions
- Adaptive or non-adaptive queries?
- Adaptivity can be dispensed with at the cost of
(exponentially many) more queries. - One-sided or two-sided error?
- Error probability can be diminished by repeated
trials.
23The Trivial Case Statistical Sampling
- Let P(x) 1 iff x is all-zeroes.
24The Trivial Case Statistical Sampling
- Let P(x) 1 iff x is all-zeroes.
- Then y is c-bad, if and only if y gt cn.
25The Trivial Case Statistical Sampling
- Algorithm sample O(1/c) random, independently
chosen bits of y, accepting iff all bits come up
0.
26The Trivial Case Statistical Sampling
- Algorithm sample O(1/c) random, independently
chosen bits of y, accepting iff all bits come up
0. - If y is c-bad, a 1 will appear with probability
2/3.
27The Trivial Case Statistical Sampling
- Algorithm sample O(1/c) random, independently
chosen bits of y, accepting iff all bits come up
0. - If y is c-bad, a 1 will appear with probability
2/3.
28Sort-Checking EKKRV98
- Given a list L of n numbers, let P(L) be the
property that L is in nondecreasing order. How
to test for P with few queries? - (Now queries are to numbers, not bits.)
29Sort-Checking EKKRV98
- First try Pick k random entries of L, check that
their contents are in nondecreasing order.
30Sort-Checking EKKRV98
- First try Pick k random entries of L, check that
their contents are in nondecreasing order. - Correct on sorted lists
31Sort-Checking EKKRV98
- First try Pick k random entries of L, check that
their contents are in nondecreasing order. - Correct on sorted lists
- Suppose L is c-bad what k will suffice to reject
L with probability 2/3?
32Sort-Checking (contd)
- Uh-oh, what about
L (2, 1, 4, 3, 6, 5, , 2n, 2n - 1)?
33Sort-Checking (contd)
- Uh-oh, what about
L (2, 1, 4, 3, 6, 5, , 2n, 2n - 1)? - Its ½-bad, yet we need k sqrt(n) to succeed.
34Sort-Checking (contd)
- Uh-oh, what about
L (2, 1, 4, 3, 6, 5, , 2n, 2n - 1)? - Its ½-bad, yet we need k sqrt(n) to succeed.
- Modify the algorithm to test adjacent pairs? But
this algorithm, too, has its blind spots.
35An O(1/c log n) solution EKKRV98
- Place the entries on a binary tree by an in-order
traversal
36An O(1/c log n) solution EKKRV98
- Place the entries on a binary tree by an in-order
traversal - Repeat O(1/c) times pick a random i lt n, and
check that Li is sorted with respect to its
path from the root.
37An O(1/c log n) solution EKKRV98
- Place the entries on a binary tree by an in-order
traversal - Repeat O(1/c) times pick a random i lt n, and
check that Li is sorted with respect to its
path from the root. - (Each such check must query the whole path, O(log
n) entries.) - Algorithm is non-adaptive with one-sided error.
38Sortchecking Analysis
- If L is sorted, each check will succeed.
39Sortchecking Analysis
- If L is sorted, each check will succeed.
- What if L is c-bad? Equivalently, what if L
contains no nondecreasing subsequence of length
(1-c)n?
40Sortchecking Analysis
- It turns out that a contrapositive analysis works
more easily.
41Sortchecking Analysis
- It turns out that a contrapositive analysis works
more easily. - That is, suppose most such path-checks for L
succeed we argue that L must be close to a
sorted list L which we will define.
42Sortchecking Analysis (contd)
- Let S be the set of indices for which the
path-check succeeds.
43Sortchecking Analysis (contd)
- Let S be the set of indices for which the
path-check succeeds. - If a path-check of a randomly chosen element
succeeds with probability gt (1 c), then - S gt (1 c)n.
44Sortchecking Analysis (contd)
- Let S be the set of indices for which the
path-check succeeds. - If a path-check of a randomly chosen element
succeeds with probability gt (1 c), then - S gt (1 c)n.
- We claim that L, restricted to S, is in
nondecreasing order!
45Sortchecking Analysis (contd)
- Then, by correcting entries not in S to agree
with the order of S, we get a sorted list L at
edit distance lt cn from L. -
-
46Sortchecking Analysis (contd)
- Then, by correcting entries not in S to agree
with the order of S, we get a sorted list L at
edit distance lt cn from L. -
- So L cannot be c-bad.
47Sortchecking Analysis (contd)
- Thus, if L is c-bad, it fails each path-check
with probability gt c, and O(1/c) path-checks
expose it with probability 2/3.
48Sortchecking Analysis (contd)
- Thus, if L is c-bad, it fails each path-check
with probability gt c, and O(1/c) path-checks
expose it with probability 2/3. - This proves correctness of the (non-adaptive,
one-sided error) algorithm.
49Sortchecking Analysis (contd)
- Thus, if L is c-bad, it fails each path-check
with probability gt c, and O(1/c) path-checks
expose it with probability 2/3. - This proves correctness of the (non-adaptive,
one-sided error) algorithm. - EKKRV98 also shows this is essentially optimal.
50First Moral
- We saw that it can take insight to discover and
analyze the right local signature for the
global property of failing to satisfy P.
51Second Moral
- This, and many other property-testing algorithms,
work because they implicitly define a correction
mechanism for property P.
52Second Moral
- This, and many other property-testing algorithms,
work because they implicitly define a correction
mechanism for property P. - For an algebraic example
53Linearity Testing BLR93
- Given a function f 0,1n ? 0, 1.
54Linearity Testing BLR93
- Given a function f 0,1n ? 0, 1.
- We want to differentiate, probabilistically and
in few queries, between the case where f is
linear - (i.e., f(xy)f(x)f(y) (mod 2) for all x, y),
-
-
55Linearity Testing BLR93
- Given a function f 0,1n ? 0, 1.
- We want to differentiate, probabilistically and
in few queries, between the case where f is
linear - (i.e., f(xy)f(x)f(y) (mod 2) for all x, y),
- and the case where f is c-far from any linear
function.
56Linearity Testing BLR93
- How about the naïve test pick x, y at random,
and check that - f(xy)f(x)f(y) (mod 2)?
57Linearity Testing BLR93
- How about the naïve test pick x, y at random,
and check that - f(xy)f(x)f(y) (mod 2)?
- Previous sorting example warns us not to assume
this is effective
58Linearity Testing BLR93
- How about the naïve test pick x, y at random,
and check that - f(xy)f(x)f(y) (mod 2)?
- Previous sorting example warns us not to assume
this is effective - Are there pseudo-linear functions out there?
59Linearity Test - Analysis
- If f is linear, it always passes the test.
60Linearity Test - Analysis
- If f is linear, it always passes the test.
- Now suppose f passes the test with probability gt
1 d, where d lt 1/12
61Linearity Test - Analysis
- If f is linear, it always passes the test.
- Now suppose f passes the test with probability gt
1 d, where d lt 1/12 - we define a linear function g that is 2d-close to
f.
62Linearity Test - Analysis
- If f is linear, it always passes the test.
- Now suppose f passes the test with probability gt
1 d, where d lt 1/12 - we define a linear function g that is 2d-close to
f. - So, if f is 2d-bad, it fails the test with
probability gt d, and O(1/d) iterations of the
test suffice to reject f with probability 2/3.
63Linearity Test - Analysis
- Define g(x) majority ( f(xr)f(r) ),
- over a random choice of vector r.
-
64Linearity Test - Analysis
- Define g(x) majority ( f(xr)f(r) ),
- over a random choice of vector r.
- f passes the test with probability at most
- 1 t/2, where t is the fraction of entries
where g and f differ. -
65Linearity Test - Analysis
- Define g(x) majority ( f(xr)f(r) ),
- over a random choice of vector r.
- f passes the test with probability at most
- 1 t/2, where t is the fraction of entries
where g and f differ. - 1 t/2 gt 1 d implies t lt 2d,
-
- so f, g are 2d-close, as claimed.
-
66Linearity Test - Analysis
- Now we must show g is linear.
67Linearity Test - Analysis
- Now we must show g is linear.
- For c lt 1, let G_(1-c)
- x with probability gt 1-c over r,
- f(x)
f(xr)f(r) . - Let t_c 1 - G_(1-c)/ 2n.
68Linearity Test - Analysis
- Reasoning as before, we have t_c lt d / c.
69Linearity Test - Analysis
- Reasoning as before, we have t_c lt d / c.
- Thus, t_(1/6) lt 6d lt 1/2.
70Linearity Test - Analysis
- Reasoning as before, we have t_c lt d / c.
- Thus, t_(1/6) lt 6d lt 1/2.
- Then, given any x, there must exist a z such that
z, xz are both in G_5/6.
71Linearity Test - Analysis
- Now, what is Probg(x) f(xr) f(r)?
- (How resoundingly is the majority vote
decided for an arbitrary x?) -
72Linearity Test - Analysis
- Now, what is Probg(x) f(xr) f(r)?
- (How resoundingly is the majority vote
decided for an arbitrary x?) - Its the same as
- Probg(x) f(x (zr)) f(zr),
-
-
73Linearity Test - Analysis
- Now, what is Probg(x) f(xr) f(r)?
- (How resoundingly is the majority vote
decided for an arbitrary x?) - Its the same as
- Probg(x) f(x (zr)) f(zr),
- since for fixed z, zr is uniformly
distributed if r is. -
74Linearity Test - Analysis
- Now f(x (zr)) f(zr)
- f((x z)r) f(r) f(zr)
f(r).
75Linearity Test - Analysis
- Now f(x (zr)) f(zr)
- f((x z)r) f(r) f(zr)
f(r). - Since xz, z are in G_(5/6), with probability
greater than 1 2(1/6) 2/3, this expression
equals g(xz) g(z).
76Linearity Test - Analysis
- Now f(x (zr)) f(zr)
- f((x z)r) f(r) f(zr)
f(r). - Since xz, z are in G_(5/6), with probability
greater than 1 2(1/6) 2/3, this expression
equals g(xz) g(z). - So every xs majority vote is decided by a
- gt 2/3 majority.
77Linearity Test - Analysis
- Finally we show g(xy) g(x) g(y) for all x,
y.
78Linearity Test - Analysis
- Finally we show g(xy) g(x) g(y) for all x,
y. - Choosing a random r, f(xyr) f(r) g(xy)
with probability gt 2/3.
79Linearity Test - Analysis
- Finally we show g(xy) g(x) g(y) for all x,
y. - Choosing a random r, f(xyr) f(r) g(xy)
with probability gt 2/3. - Also, f(x(yr)) f(yr) g(x), and
- f(yr) f(r) g(y), each with probability gt
2/3.
80Linearity Test - Analysis
- Finally we show g(xy) g(x) g(y) for all x,
y. - Choosing a random r, f(xyr) f(r) g(xy)
with probability gt 2/3. - Also, f(x(yr)) f(yr) g(x), and
- f(yr) f(r) g(y), each with probability gt
2/3. - Then with probability gt 1 3 (1/3) gt 0, all 3
occur. Adding, we get g(xy) g(x) g(y).
QED.
81Testing Graph Properties
- A graph property should be invariant under
vertex permutations.
82Testing Graph Properties
- A graph property should be invariant under
vertex permutations. - Two query models i) adjacency matrix queries,
ii) neighborhood queries.
83Testing Graph Properties
- A graph property should be invariant under
vertex permutations. - Two query models i) adjacency matrix queries,
ii) neighborhood queries. - ii) appropriate for sparse graphs.
84Testing Graph Properties
- For hereditary graph properties, most common
testing algorithms simply check random subgraphs
for the property.
85Testing Graph Properties
- For hereditary graph properties, most common
testing algorithms simply check random subgraphs
for the property. - E.g., to test if a graph is triangle-free, check
a small random subgraph for triangles.
86Testing Graph Properties
- For hereditary graph properties, most common
testing algorithms simply check random subgraphs
for the property. - E.g., to test if a graph is triangle-free, check
a small random subgraph for triangles. - Obvious algorithms, but often require very
sophisticated analysis.
87Testing Graph Properties(Contd)
- Efficiently testable properties in model i)
include
88Testing Graph Properties(Contd)
- Efficiently testable properties in model i)
include - Bipartiteness
89Testing Graph Properties(Contd)
- Efficiently testable properties in model i)
include - Bipartiteness
- 3-colorability
90Testing Graph Properties(Contd)
- Efficiently testable properties in model i)
include - Bipartiteness
- 3-colorability
- In fact AS05, every property thats monotone
in the entries of the adjacency matrix!
91Testing Graph Properties(Contd)
- Efficiently testable properties in model i)
include - Bipartiteness
- 3-colorability
- In fact AS05, every property thats monotone
in the entries of the adjacency matrix! - A combinatorial characterization of the testable
graph properties is known AFNS06.
92Lower Bounds via Yaos Principle
- A q-query probabilistic algorithm A testing for
property P can be viewed as a randomized choice
among q-query deterministic algorithms Ti.
93Lower Bounds via Yaos Principle
- A q-query probabilistic algorithm A testing for
property P can be viewed as a randomized choice
among q-query deterministic algorithms Ti. - We will look at the 2-sided error model.
94Lower Bounds via Yaos Principle
- For any distribution D on inputs, the probability
that A accepts its input is a weighted average of
the acceptance probabilities of the Ti.
95Lower Bounds via Yaos Principle
- Suppose we can find (D_Y, D_N) two distributions
on inputs, such that
96Lower Bounds via Yaos Principle
- Suppose we can find (D_Y, D_N) two distributions
on inputs, such that - i) x from D_Y all satisfy property P
97Lower Bounds via Yaos Principle
- Suppose we can find (D_Y, D_N) two distributions
on inputs, such that - i) x from D_Y all satisfy property P
- ii) x from D_N all are c-bad
98Lower Bounds via Yaos Principle
- Suppose we can find (D_Y, D_N) two distributions
on inputs, such that - i) x from D_Y all satisfy property P
- ii) x from D_N all are c-bad
- iii) D_Y and D_N are statistically 1/3-close on
any fixed q entries.
99Lower Bounds via Yaos Principle
- Then, given a non-adaptive deterministic q-query
algorithm Ti, the statistical distance between
Ti(D_Y) and Ti(D_N) is at most 1/3, so the
same holds for our randomized algorithm A!
100Lower Bounds via Yaos Principle
- Then, given a non-adaptive deterministic q-query
algorithm Ti, the statistical distance between
Ti(D_Y) and Ti(D_N) is at most 1/3, so the
same holds for our randomized algorithm A! - Thus A cannot simultaneously accept all
P-satisfying instances with prob. gt 2/3 and
accept all c-bad instances with prob. lt 1/3.
101Example Graph Isomorphism
- Let P(G_1, G_2) be the property that G_1 and G_2
are isomorphic.
102Example Graph Isomorphism
- Let P(G_1, G_2) be the property that G_1 and G_2
are isomorphic. - Let D_Y be distributed as (G, pi(G)), where G is
a random graph and pi a random permutation
103Example Graph Isomorphism
- Let P(G_1, G_2) be the property that G_1 and G_2
are isomorphic. - Let D_Y be distributed as (G, pi(G)), where G is
a random graph and pi a random permutation - Let D_N be distributed as (G_1, G_2), where G_1,
G_2 are independent random graphs.
104Example Graph Isomorphism
- Briefly (G_1, G_2) is almost always far from
satisfying P because for any fixed permutation
pi, the adjacency matrices of G_1 and pi(G_2) are
too unlikely to be similar
105Example Graph Isomorphism
- Briefly (G_1, G_2) is almost always far from
satisfying P because for any fixed permutation
pi, the adjacency matrices of G_1 and pi(G_2) are
too unlikely to be similar - D_Y looks like D_N as long as we dont query
both a lhs vertex of G and its rhs counterpart in
pi(G)and pi is unknown.
106Example Graph Isomorphism
- Briefly (G_1, G_2) is almost always far from
satisfying P because for any fixed permutation
pi, the adjacency matrices of G_1 and pi(G_2) are
too unlikely to be similar - D_Y looks like D_N as long as we dont query
both a lhs vertex of G and its rhs counterpart in
pi(G)and pi is unknown. - This approach proves a sqrt(n) query lower bound.
107Concluding Thoughts
- Property Testing
- Revitalizes the study of familiar properties
108Concluding Thoughts
- Property Testing
- Revitalizes the study of familiar properties
- Leads to simply stated, intuitive, yet
surprisingly tough conjectures
109Concluding Thoughts
- Property Testing
- Contains hidden layers of algorithmic ingenuity
110Concluding Thoughts
- Property Testing
- Contains hidden layers of algorithmic ingenuity
- Brilliantly meets its own lowered standards
111Acknowledgments
- Thanks for Listening!
- Thanks to Russell Impagliazzo and Kirill
Levchenko for their help.
112References
- AS05. Alon, Shapira Every Monotone Graph
Property is Testable, STOC 05. - AFNS06. Alon, Fischer, Newman, Shapira A
combinatorial characterization of the testable
graph properties it's all about regularity, STOC
06. - BLR93 Manuel Blum, Michael Luby, and Ronitt
Rubinfeld. Self-testing/correcting with
applications to numerical problems. Journal of
Computer and System Sciences, 47(3)549595,
1993. (linearity test) - EKKRV98F. Ergun, S. Kannan, S. R. Kumar, R.
Rubinfeld, and M. Viswanathan. Spot-checkers.
STOC 1998. (sort-checking) - F97 Eldar Fischer The Art of Uninformed
Decisions. Bulletin of the EATCS 75 97 (2001)
(survey on property testing)