An Introduction to Property Testing

About This Presentation

Title:

An Introduction to Property Testing

Description:

Property Testing: The Art of Uninformed Decisions' ... over a random choice of vector r. f passes the test with probability at most ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 113

Provided by: ape64

Learn more at: https://cseweb.ucsd.edu

Category:

more less

Transcript and Presenter's Notes

Title: An Introduction to Property Testing

1
An Introduction to Property Testing

Andy Drucker
1/07

2
Property Testing The Art of Uninformed
Decisions

This talk developed largely from a survey by
Eldar Fischer F97--fun reading.

3
The Age-Old Question

Accuracy or speed?
Work hard or cut corners?

4
The Age-Old Question

Accuracy or speed?
Work hard or cut corners?
In CS heuristics and approximation algorithms
vs. exact algorithms for NP-hard problems.

5
The Age-Old Question

Both well-explored. But

6
The Age-Old Question

Both well-explored. But
Constant time is the new polynomial time.

7
The Age-Old Question

Both well-explored. But
Constant time is the new polynomial time.
So, are there any corners left to cut?

8
The Setting

Given a set of n inputs x and a property P,
determine if P(x) holds.

9
The Setting

Given a set of n inputs x and a property P,
determine if P(x) holds.
Generally takes at least n steps (for sequential,
nonadaptive algorithms).

10
The Setting

Given a set of n inputs x and a property P,
determine if P(x) holds.
Generally takes at least n steps. (for
sequential, nonadaptive algorithms).
Possibly infeasible if we are doing internet or
genome analysis, program checking, PCPs,

11
First Compromise

What if we only want a check that rejects any x
such that P(x), with probability gt 2/3? Can we
do better?

12
First Compromise

What if we only want a check that rejects any x
such that P(x), with probability gt 2/3? Can we
do better?
Intuitively, we must expect to look at almost
all input bits if we hope to reject x that are
only one bit away from satisfying P.

13
First Compromise

What if we only want a check that rejects any x
such that P(x), with probability gt 2/3? Can we
do better?
Intuitively, we must expect to look at almost
all input bits if we hope to reject x that are
only one bit away from satisfying P.
So, no.

14
Second Compromise

This kind of failure is universal. So, we must
scale our hopes back.

15
Second Compromise

This kind of failure is universal. So, we must
scale our hopes back.
The problem those almost-correct instances are
too hard.

16
Second Compromise

This kind of failure is universal. So, we must
scale our hopes back.
The problem those almost-correct instances are
too hard.
The solution assume they never occur!

17
Second Compromise

Only worry about instances y that either satisfy
P or are at an edit distance of cn from any
satisfying instance. (we say y is c-bad.)

18
Second Compromise

Only worry about instances y that either satisfy
P or are at an edit distance of cn from any
satisfying instance. (we say y is c-bad.)
Justifying this assumption is app-specific the
excluded middle might not arise, or it might
just be less important.

19
Model Decisions

Adaptive or non-adaptive queries?

20
Model Decisions

Adaptive or non-adaptive queries?
Adaptivity can be dispensed with at the cost of
(exponentially many) more queries.

21
Model Decisions

Adaptive or non-adaptive queries?
Adaptivity can be dispensed with at the cost of
(exponentially many) more queries.
One-sided or two-sided error?

22
Model Decisions

Adaptive or non-adaptive queries?
Adaptivity can be dispensed with at the cost of
(exponentially many) more queries.
One-sided or two-sided error?
Error probability can be diminished by repeated
trials.

23
The Trivial Case Statistical Sampling

Let P(x) 1 iff x is all-zeroes.

24
The Trivial Case Statistical Sampling

Let P(x) 1 iff x is all-zeroes.
Then y is c-bad, if and only if y gt cn.

25
The Trivial Case Statistical Sampling

Algorithm sample O(1/c) random, independently
chosen bits of y, accepting iff all bits come up
0.

26
The Trivial Case Statistical Sampling

Algorithm sample O(1/c) random, independently
chosen bits of y, accepting iff all bits come up
0.
If y is c-bad, a 1 will appear with probability
2/3.

27
The Trivial Case Statistical Sampling

Algorithm sample O(1/c) random, independently
chosen bits of y, accepting iff all bits come up
0.
If y is c-bad, a 1 will appear with probability
2/3.

28
Sort-Checking EKKRV98

Given a list L of n numbers, let P(L) be the
property that L is in nondecreasing order. How
to test for P with few queries?
(Now queries are to numbers, not bits.)

29
Sort-Checking EKKRV98

First try Pick k random entries of L, check that
their contents are in nondecreasing order.

30
Sort-Checking EKKRV98

First try Pick k random entries of L, check that
their contents are in nondecreasing order.
Correct on sorted lists

31
Sort-Checking EKKRV98

First try Pick k random entries of L, check that
their contents are in nondecreasing order.
Correct on sorted lists
Suppose L is c-bad what k will suffice to reject
L with probability 2/3?

32
Sort-Checking (contd)

Uh-oh, what about
L (2, 1, 4, 3, 6, 5, , 2n, 2n - 1)?

33
Sort-Checking (contd)

Uh-oh, what about
L (2, 1, 4, 3, 6, 5, , 2n, 2n - 1)?
Its ½-bad, yet we need k sqrt(n) to succeed.

34
Sort-Checking (contd)

Uh-oh, what about
L (2, 1, 4, 3, 6, 5, , 2n, 2n - 1)?
Its ½-bad, yet we need k sqrt(n) to succeed.
Modify the algorithm to test adjacent pairs? But
this algorithm, too, has its blind spots.

35
An O(1/c log n) solution EKKRV98

Place the entries on a binary tree by an in-order
traversal

36
An O(1/c log n) solution EKKRV98

Place the entries on a binary tree by an in-order
traversal
Repeat O(1/c) times pick a random i lt n, and
check that Li is sorted with respect to its
path from the root.

37
An O(1/c log n) solution EKKRV98

Place the entries on a binary tree by an in-order
traversal
Repeat O(1/c) times pick a random i lt n, and
check that Li is sorted with respect to its
path from the root.
(Each such check must query the whole path, O(log
n) entries.)
Algorithm is non-adaptive with one-sided error.

38
Sortchecking Analysis

If L is sorted, each check will succeed.

39
Sortchecking Analysis

If L is sorted, each check will succeed.
What if L is c-bad? Equivalently, what if L
contains no nondecreasing subsequence of length
(1-c)n?

40
Sortchecking Analysis

It turns out that a contrapositive analysis works
more easily.

41
Sortchecking Analysis

It turns out that a contrapositive analysis works
more easily.
That is, suppose most such path-checks for L
succeed we argue that L must be close to a
sorted list L which we will define.

42
Sortchecking Analysis (contd)

Let S be the set of indices for which the
path-check succeeds.

43
Sortchecking Analysis (contd)

Let S be the set of indices for which the
path-check succeeds.
If a path-check of a randomly chosen element
succeeds with probability gt (1 c), then
S gt (1 c)n.

44
Sortchecking Analysis (contd)

Let S be the set of indices for which the
path-check succeeds.
If a path-check of a randomly chosen element
succeeds with probability gt (1 c), then
S gt (1 c)n.
We claim that L, restricted to S, is in
nondecreasing order!

45
Sortchecking Analysis (contd)

Then, by correcting entries not in S to agree
with the order of S, we get a sorted list L at
edit distance lt cn from L.

46
Sortchecking Analysis (contd)

Then, by correcting entries not in S to agree
with the order of S, we get a sorted list L at
edit distance lt cn from L.
So L cannot be c-bad.

47
Sortchecking Analysis (contd)

Thus, if L is c-bad, it fails each path-check
with probability gt c, and O(1/c) path-checks
expose it with probability 2/3.

48
Sortchecking Analysis (contd)

Thus, if L is c-bad, it fails each path-check
with probability gt c, and O(1/c) path-checks
expose it with probability 2/3.
This proves correctness of the (non-adaptive,
one-sided error) algorithm.

49
Sortchecking Analysis (contd)

Thus, if L is c-bad, it fails each path-check
with probability gt c, and O(1/c) path-checks
expose it with probability 2/3.
This proves correctness of the (non-adaptive,
one-sided error) algorithm.
EKKRV98 also shows this is essentially optimal.

50
First Moral

We saw that it can take insight to discover and
analyze the right local signature for the
global property of failing to satisfy P.

51
Second Moral

This, and many other property-testing algorithms,
work because they implicitly define a correction
mechanism for property P.

52
Second Moral

This, and many other property-testing algorithms,
work because they implicitly define a correction
mechanism for property P.
For an algebraic example

53
Linearity Testing BLR93

Given a function f 0,1n ? 0, 1.

54
Linearity Testing BLR93

Given a function f 0,1n ? 0, 1.
We want to differentiate, probabilistically and
in few queries, between the case where f is
linear
(i.e., f(xy)f(x)f(y) (mod 2) for all x, y),

55
Linearity Testing BLR93

Given a function f 0,1n ? 0, 1.
We want to differentiate, probabilistically and
in few queries, between the case where f is
linear
(i.e., f(xy)f(x)f(y) (mod 2) for all x, y),
and the case where f is c-far from any linear
function.

56
Linearity Testing BLR93

How about the naïve test pick x, y at random,
and check that
f(xy)f(x)f(y) (mod 2)?

57
Linearity Testing BLR93

How about the naïve test pick x, y at random,
and check that
f(xy)f(x)f(y) (mod 2)?
Previous sorting example warns us not to assume
this is effective

58
Linearity Testing BLR93

How about the naïve test pick x, y at random,
and check that
f(xy)f(x)f(y) (mod 2)?
Previous sorting example warns us not to assume
this is effective
Are there pseudo-linear functions out there?

59
Linearity Test - Analysis

If f is linear, it always passes the test.

60
Linearity Test - Analysis

If f is linear, it always passes the test.
Now suppose f passes the test with probability gt
1 d, where d lt 1/12

61
Linearity Test - Analysis

If f is linear, it always passes the test.
Now suppose f passes the test with probability gt
1 d, where d lt 1/12
we define a linear function g that is 2d-close to
f.

62
Linearity Test - Analysis

If f is linear, it always passes the test.
Now suppose f passes the test with probability gt
1 d, where d lt 1/12
we define a linear function g that is 2d-close to
f.
So, if f is 2d-bad, it fails the test with
probability gt d, and O(1/d) iterations of the
test suffice to reject f with probability 2/3.

63
Linearity Test - Analysis

Define g(x) majority ( f(xr)f(r) ),
over a random choice of vector r.

64
Linearity Test - Analysis

Define g(x) majority ( f(xr)f(r) ),
over a random choice of vector r.
f passes the test with probability at most
1 t/2, where t is the fraction of entries
where g and f differ.

65
Linearity Test - Analysis

Define g(x) majority ( f(xr)f(r) ),
over a random choice of vector r.
f passes the test with probability at most
1 t/2, where t is the fraction of entries
where g and f differ.
1 t/2 gt 1 d implies t lt 2d,
so f, g are 2d-close, as claimed.

66
Linearity Test - Analysis

Now we must show g is linear.

67
Linearity Test - Analysis

Now we must show g is linear.
For c lt 1, let G_(1-c)
x with probability gt 1-c over r,
f(x)
f(xr)f(r) .
Let t_c 1 - G_(1-c)/ 2n.

68
Linearity Test - Analysis

Reasoning as before, we have t_c lt d / c.

69
Linearity Test - Analysis

Reasoning as before, we have t_c lt d / c.
Thus, t_(1/6) lt 6d lt 1/2.

70
Linearity Test - Analysis

Reasoning as before, we have t_c lt d / c.
Thus, t_(1/6) lt 6d lt 1/2.
Then, given any x, there must exist a z such that
z, xz are both in G_5/6.

71
Linearity Test - Analysis

Now, what is Probg(x) f(xr) f(r)?
(How resoundingly is the majority vote
decided for an arbitrary x?)

72
Linearity Test - Analysis

Now, what is Probg(x) f(xr) f(r)?
(How resoundingly is the majority vote
decided for an arbitrary x?)
Its the same as
Probg(x) f(x (zr)) f(zr),

73
Linearity Test - Analysis

Now, what is Probg(x) f(xr) f(r)?
(How resoundingly is the majority vote
decided for an arbitrary x?)
Its the same as
Probg(x) f(x (zr)) f(zr),
since for fixed z, zr is uniformly
distributed if r is.

74
Linearity Test - Analysis

Now f(x (zr)) f(zr)
f((x z)r) f(r) f(zr)
f(r).

75
Linearity Test - Analysis

Now f(x (zr)) f(zr)
f((x z)r) f(r) f(zr)
f(r).
Since xz, z are in G_(5/6), with probability
greater than 1 2(1/6) 2/3, this expression
equals g(xz) g(z).

76
Linearity Test - Analysis

Now f(x (zr)) f(zr)
f((x z)r) f(r) f(zr)
f(r).
Since xz, z are in G_(5/6), with probability
greater than 1 2(1/6) 2/3, this expression
equals g(xz) g(z).
So every xs majority vote is decided by a
gt 2/3 majority.

77
Linearity Test - Analysis

Finally we show g(xy) g(x) g(y) for all x,
y.

78
Linearity Test - Analysis

Finally we show g(xy) g(x) g(y) for all x,
y.
Choosing a random r, f(xyr) f(r) g(xy)
with probability gt 2/3.

79
Linearity Test - Analysis

Finally we show g(xy) g(x) g(y) for all x,
y.
Choosing a random r, f(xyr) f(r) g(xy)
with probability gt 2/3.
Also, f(x(yr)) f(yr) g(x), and
f(yr) f(r) g(y), each with probability gt
2/3.

80
Linearity Test - Analysis

Finally we show g(xy) g(x) g(y) for all x,
y.
Choosing a random r, f(xyr) f(r) g(xy)
with probability gt 2/3.
Also, f(x(yr)) f(yr) g(x), and
f(yr) f(r) g(y), each with probability gt
2/3.
Then with probability gt 1 3 (1/3) gt 0, all 3
occur. Adding, we get g(xy) g(x) g(y).
QED.

81
Testing Graph Properties

A graph property should be invariant under
vertex permutations.

82
Testing Graph Properties

A graph property should be invariant under
vertex permutations.
Two query models i) adjacency matrix queries,
ii) neighborhood queries.

83
Testing Graph Properties

A graph property should be invariant under
vertex permutations.
Two query models i) adjacency matrix queries,
ii) neighborhood queries.
ii) appropriate for sparse graphs.

84
Testing Graph Properties

For hereditary graph properties, most common
testing algorithms simply check random subgraphs
for the property.

85
Testing Graph Properties

For hereditary graph properties, most common
testing algorithms simply check random subgraphs
for the property.
E.g., to test if a graph is triangle-free, check
a small random subgraph for triangles.

86
Testing Graph Properties

For hereditary graph properties, most common
testing algorithms simply check random subgraphs
for the property.
E.g., to test if a graph is triangle-free, check
a small random subgraph for triangles.
Obvious algorithms, but often require very
sophisticated analysis.

87
Testing Graph Properties(Contd)

Efficiently testable properties in model i)
include

88
Testing Graph Properties(Contd)

Efficiently testable properties in model i)
include
Bipartiteness

89
Testing Graph Properties(Contd)

Efficiently testable properties in model i)
include
Bipartiteness
3-colorability

90
Testing Graph Properties(Contd)

Efficiently testable properties in model i)
include
Bipartiteness
3-colorability
In fact AS05, every property thats monotone
in the entries of the adjacency matrix!

91
Testing Graph Properties(Contd)

Efficiently testable properties in model i)
include
Bipartiteness
3-colorability
In fact AS05, every property thats monotone
in the entries of the adjacency matrix!
A combinatorial characterization of the testable
graph properties is known AFNS06.

92
Lower Bounds via Yaos Principle

A q-query probabilistic algorithm A testing for
property P can be viewed as a randomized choice
among q-query deterministic algorithms Ti.

93
Lower Bounds via Yaos Principle

A q-query probabilistic algorithm A testing for
property P can be viewed as a randomized choice
among q-query deterministic algorithms Ti.
We will look at the 2-sided error model.

94
Lower Bounds via Yaos Principle

For any distribution D on inputs, the probability
that A accepts its input is a weighted average of
the acceptance probabilities of the Ti.

95
Lower Bounds via Yaos Principle

Suppose we can find (D_Y, D_N) two distributions
on inputs, such that

96
Lower Bounds via Yaos Principle

Suppose we can find (D_Y, D_N) two distributions
on inputs, such that
i) x from D_Y all satisfy property P

97
Lower Bounds via Yaos Principle

Suppose we can find (D_Y, D_N) two distributions
on inputs, such that
i) x from D_Y all satisfy property P
ii) x from D_N all are c-bad

98
Lower Bounds via Yaos Principle

Suppose we can find (D_Y, D_N) two distributions
on inputs, such that
i) x from D_Y all satisfy property P
ii) x from D_N all are c-bad
iii) D_Y and D_N are statistically 1/3-close on
any fixed q entries.

99
Lower Bounds via Yaos Principle

Then, given a non-adaptive deterministic q-query
algorithm Ti, the statistical distance between
Ti(D_Y) and Ti(D_N) is at most 1/3, so the
same holds for our randomized algorithm A!

100
Lower Bounds via Yaos Principle

Then, given a non-adaptive deterministic q-query
algorithm Ti, the statistical distance between
Ti(D_Y) and Ti(D_N) is at most 1/3, so the
same holds for our randomized algorithm A!
Thus A cannot simultaneously accept all
P-satisfying instances with prob. gt 2/3 and
accept all c-bad instances with prob. lt 1/3.

101
Example Graph Isomorphism

Let P(G_1, G_2) be the property that G_1 and G_2
are isomorphic.

102
Example Graph Isomorphism

Let P(G_1, G_2) be the property that G_1 and G_2
are isomorphic.
Let D_Y be distributed as (G, pi(G)), where G is
a random graph and pi a random permutation

103
Example Graph Isomorphism

Let P(G_1, G_2) be the property that G_1 and G_2
are isomorphic.
Let D_Y be distributed as (G, pi(G)), where G is
a random graph and pi a random permutation
Let D_N be distributed as (G_1, G_2), where G_1,
G_2 are independent random graphs.

104
Example Graph Isomorphism

Briefly (G_1, G_2) is almost always far from
satisfying P because for any fixed permutation
pi, the adjacency matrices of G_1 and pi(G_2) are
too unlikely to be similar

105
Example Graph Isomorphism

Briefly (G_1, G_2) is almost always far from
satisfying P because for any fixed permutation
pi, the adjacency matrices of G_1 and pi(G_2) are
too unlikely to be similar
D_Y looks like D_N as long as we dont query
both a lhs vertex of G and its rhs counterpart in
pi(G)and pi is unknown.

106
Example Graph Isomorphism

Briefly (G_1, G_2) is almost always far from
satisfying P because for any fixed permutation
pi, the adjacency matrices of G_1 and pi(G_2) are
too unlikely to be similar
D_Y looks like D_N as long as we dont query
both a lhs vertex of G and its rhs counterpart in
pi(G)and pi is unknown.
This approach proves a sqrt(n) query lower bound.

107
Concluding Thoughts

Property Testing
Revitalizes the study of familiar properties

108
Concluding Thoughts

Property Testing
Revitalizes the study of familiar properties
Leads to simply stated, intuitive, yet
surprisingly tough conjectures

109
Concluding Thoughts

Property Testing
Contains hidden layers of algorithmic ingenuity

110
Concluding Thoughts

Property Testing
Contains hidden layers of algorithmic ingenuity
Brilliantly meets its own lowered standards

111
Acknowledgments

Thanks for Listening!
Thanks to Russell Impagliazzo and Kirill
Levchenko for their help.

112
References

AS05. Alon, Shapira Every Monotone Graph
Property is Testable, STOC 05.
AFNS06. Alon, Fischer, Newman, Shapira A
combinatorial characterization of the testable
graph properties it's all about regularity, STOC
06.
BLR93 Manuel Blum, Michael Luby, and Ronitt
Rubinfeld. Self-testing/correcting with
applications to numerical problems. Journal of
Computer and System Sciences, 47(3)549595,
1993. (linearity test)
EKKRV98F. Ergun, S. Kannan, S. R. Kumar, R.
Rubinfeld, and M. Viswanathan. Spot-checkers.
STOC 1998. (sort-checking)
F97 Eldar Fischer The Art of Uninformed
Decisions. Bulletin of the EATCS 75 97 (2001)
(survey on property testing)