Title: Provably hard problems below the satisfiability threshold
1Provably hard problems below the satisfiability
threshold
A sharp threshold in proof complexity yields
lower bounds for satisfiability search
Paul Beame Univ. of Washington
Dimitris Achlioptas Microsoft Research
Michael Molloy Univ. of Toronto
2CNF Satisfiability
- (x1 ? x2 ? x4) ? (x1 ? x3) ? (x3 ? x2) ? (x4 ?
x3) - NP-complete but many heuristics because of its
practical importance - presumably exponential in the worst case
- If you know formula is satisfiable
- How hard is it to find assignment?
- No lower bounds known for interesting heuristics.
3Satisfiability Algorithms
- Local search (incomplete)
- GSAT Selman,Levesque,Mitchell 92
- Walksat Kautz,Selman 96
- Backtracking search (complete)
- DPLL Davis,Putnam 60
Davis,Logeman,Loveland 62 - DPLL clause learning
4Backtracking search/DPLL
- Select a literal l (some x or ?x)
- Remove all clauses containing l
- Shrink all clauses containing l
- While there are 1-clauses
- Pick some (arbitrary) 1-clause, satisfy it and
simplify - If there is a 0-clause (contradiction)
- Backtrack to last free step
Free step
Yields residual formula
many options for select
5Resolution
- Start with clauses of CNF formula F
- Resolution rule
- Given (A ? x), (B ? ?x) can derive
(A ? B) - F is unsatisfiable ? 0-clause derivable
- Proof size of clauses
Running DPLL (with any select) on an
unsatisfable formula F results in a
tree-resolution proof of ? F
6Random CNF formulas
- Random 2-CNF formula with sn clauses
- is satisfiable w.h.p. for s ? 1
- and simple DPLL will find a satisfying assignment
in linear time w.h.p. - is unsatisfiable w.h.p. for s ? 1
- and simple DPLL will finish and yield a
resolution proof of unsatisfiability in linear
time w.h.p.
7DPLL on random 3-CNF
Can prove 2W(n/D1e ) time is required for
unsatisfiable formulas above the threshold
What about satisfiable formulas below threshold?
D ratio of clauses to variables
n 50 variables
8Phase transitions and algorithmic complexity
- Easy connection
- Hardest random problems will always be at a
monotone sharp threshold bn if it exists - Can randomly reduce satisfiable problems of lower
density to those at the threshold - Given a formula with Dn clauses D? b can always
add (b-D-e) n random clauses to make it a random
problem nearly at the threshold and use that soln - Can reduce unsatisfiable problems of larger
density to those at the threshold - Given a formula with Dn clauses D? b ignore all
but the first (be) n of them
9Hard satisfiable formulas?
With non-deterministic select we could simply
guess n correct value assignments. .... How can a
satisfiable formula possibly be hard?
Any implementation of select must run in
polynomial time. . Very simple heuristics used
in practice
10Some standard select rules for DPLL algorithms
- UC
- Pick variables in a fixed order
- Always set True first
- UCwm
- Pick variables in a fixed order
- Apply a majority vote among 3-clauses for
assigning each value - GUC
- Pick a variable v in a shortest clause C
- Set v to satisfy C
11Contributions
- These natural DPLL algorithms take exponential
time on satisfiable formulas - ? family of unsatisfiable random formulas
parametrized by s s.t. w.h.p. - s ? 1 ? linear size resolution proofs
- s ? 1 ? only exponential size
resolution proofs possible
12Key property of each of the select rules weve
seen
- On random 3-CNF, before the first backtrack
occurs, the residual formula is a uniformly
random mix of 2-clauses and 3-clauses - If it has m2 2-clauses and m3 3-clauses then it
is equally likely to be any formula with these
properties - key property ? proofs of algorithms success
without backtracking
13What do long runs look like?
Residual formula at each node is a mix of 2-
and 3-clauses
Residual formula at is unsatisfiable
2rn
Algorithms proof of unsatisfiability is
exponentially long
Every resolution
14Proof Complexity
Theorem. A random CNF formula with Dn 3-clauses
and sn 2-clauses where s ? 1
has no resolution refutation of size 2rn w.h.p.
Chvátal-Szemerédi 88
Achlioptas,B.,Molloy 2001
Formula is unsatisfiable w.h.p. for D ? 4.57
s ? 1-e and D ? ????
15Non-rigorous results
Kirkpatrick, Monasson, Selman, Zecchina 97
2-clause ratio
s
We can add 2/3 n 3-clauses but not ?n 2-clauses
1
UNSAT
SAT
4.26
2/3
3-clause ratio D
16Rigorous results Achlioptas, Kirousis,
Kranakis, Krizanc 97
2-clause ratio
We can add 2/3 n 3-clauses but not ?n 2-clauses
1
?
UNSAT
s
?
SAT
4.57
8/3
2/3
2.28
D
3-clause ratio
17Proof Complexity
Theorem. A random CNF formula with Dn 3-clauses
and sn 2-clauses where s ? 1
has no resolution refutation of size 2rn w.h.p.
Achlioptas,B.,Molloy 2001
Formula is unsatisfiable w.h.p. for D ? 4.57
D ? 2.281 and s ? 1-e for e ? .0001
Sharp threshold since resolution is linear for s
? 1e
18These DPLL algorithms follow trajectories
2-clause ratio
1
Chao,Franco 88
Frieze,Suen 95
s
Achlioptas 00
Achlioptas,Sorkin 00
UC
GUC
2/3
3.26
3-clause ratio
8/3
D
19DPLL crossing into the bad zone
2-clause ratio
Algorithm Trajectory
1
Provably UNSAT Hard
s
Provably SAT Easy
4.57
3.26
4.26
3-clause ratio
D
20Exponential lower bounds far below the threshold.
Theorem. Let A? UC, UCwm, GUC. Let
DUC 3.81 DUCwm 3.83 DGUC
4.01
W.h.p. algorithm A takes more than 2rn steps on
a random 3-CNF with DAn clauses
Lower bound also applies to any resolution-based
algorithm that extends the first branch of the
execution of A
21Related Work
- Experiments suggested DPLL algorithms may not be
polynomial all the way to the threshold - Cocco, Monasson 01 applied non-rigorous methods
to suggest exponential GUC behavior below the
threshold - Assumed every branch of GUC tree operates like an
independent version of the first branch - Independent of our work
22Implications for phase transitions and
algorithmic complexity
- Difference between polynomial and exponential
hardness is not necessarily a function of the
phase transition - Applies in both phases, not just the
over-constrained phase - Algorithmically dependent
- A good algorithm will have a transition in a
different place from a bad algorithm - Cant study the hardness transition in the
absence of the study of algorithms
23Proof Ideas
- Connection between pure literals and resolution
proof size Chvátal,Szemerédi 88
Ben-Sasson,Wigderson 99 - pure literals are those that occur only
positively or only negatively in a formula - Digraph structure of random 2-CNF subformula
- New graph-theoretic notion clan
- generalization of connected component
- Sharp concentration properties for clan size
- moment generating function argument
- Amortization of pure literals across clans
24Resolution proof size and pure literals
Ben-Sasson,Wigderson 99
- If formula has an a s.t.
- Every subformula with ? a n clauses has at least
one pure literal - Every subformula with between a n and a n
clauses has a linear of pure literals - Then
- all resolution proofs of the formula
require size 2rn
25Basic idea of argument
- By sparsity of the 2-clause part of the formula,
any subset of the 2-clauses will have lots of
pure literals - Clan size analysis amortization
- In a subformula involving both 2-clauses and
3-clauses, either there are - so many 3-clauses that they create lots of new
pure literals on their own , or - so few 3-clauses that they cant cover all the
pure literals in the 2-clauses - analysis of clans
easy case
262-CNF Digraph on literals
x
x
c
c
y
y
w
w
z
z
d
d
(?d ? y) (?y ? x) (?z ? y) (?c ? w) (?x ?
w) (?w ? z)
27Hyper/Digraph on literals
x
c
y
w
f
g
z
d
(a ? b ? z) (f ? g ? ?w)
28Pure literals
x
x
c
c
y
y
w
w
f
g
z
z
d
d
a
b
29Pure cycle
x
x
c
c
y
y
w
w
f
g
z
z
d
d
a
b
30Pure Items Clans of G
- Clans
- small subgraphs of G
- one clan per vertex they cover G
- analog of connected components in sparse random
graphs - pure items typically two per clan ? leaves in
acyclic connected components in an ordinary graph - mostly constant size
- never more than log3n vertices
- if x? clan(y) then y? clan(x)
31What are clans?
Simpler notion first in(y) for vertex y in an
ordinary digraph
32in(y) in ordinary digraph
x
v
y
t
w
z
Subgraph of vertices that can reach y
Ancestors of y
u
33clan(y) in ordinary digraph
x
v
y
t
w
z
Descendants of ancestors of y
u
34clan(y) in 2-CNF digraph
35A complication - bad events
x
x
c
c
w
w
z
z
y
d
(?d ? y) (?z ? y) (?c ? w) (?x ? w) (?w ? z)
(w ? d)
36in(y) in a bad case
37clan(y) in a bad case
This can cascade and get even worse!
38Analysis
- If we ignore bad edges in(y) is dominated by a
component process in a sub-critical random
undirected graph - like trimmed out-trees Bollobás,Borgs,Chayes,Kim,
Wilson - Ignoring bad edges clan(y) is dominated by a
2-level process - run a component process to get in(y)
- take the union of in(y) independent component
processes added to in(y)
39Analysis
- w.h.p. no more than one bad event happens per
clan - in(y) is always dominated by the 2-level
component process - w.h.p. no more than Clog n bad events occur in
the whole digraph - fewer than polylog n literals interact with bad
clans - rest of clans dominated by 2-level process
40Analysis
- Ordinary sub-critical component process on 2n
vertices w.h.p. - of vertices with component size ? i is at most
2n (1-s)i for some fixed s ?0 - We show sub-critical 2-level component process on
2n vertices w.h.p. - for i ? i0, of vertices with 2-level size ? i
is at most 2n (1-t)i for some fixed t ?0
This is false for a 3-level component process!
41Open problem
Conjecture. For every D 2/3 there exists an s
? 1 such that a random (2,3)-CNF with Dn
3-clauses and sn 2-clauses is w.h.p. unsatisfiable
1
UNSAT
SAT
4.57
3.26
2/3
42Open problem
Conjecture. For every D 2/3 there exists an s
? 1 such that a random (2,3)-CNF with Dn
3-clauses and sn 2-clauses is w.h.p. unsatisfiable
Implies. For every card-game algorithm A there
exists a critical density DA such that for random
3-CNF formulas with Dn clauses
For D ? DA w.h.p. A takes linear time For D ? DA
w.h.p. A takes exponential time