Title: NP-Complete Problems
1NP-Complete Problems
- Polynomial time vs exponential time
- Polynomial O(nk), where n is the input size
(e.g., number of nodes in a graph, the length of
strings , etc) of our problem and k is a constant
(e.g., k1, 2, 3, etc). - Exponential time 2n or nn.
- n 2, 10, 20,
30 - 2n 4 1024 1 million
1000 million - Suppose our computer can solve a problem of size
k (i.e., compute 2k operations) in a
hour/week/month. If the new computer is 1024
times faster than ours, then the new computer can
solve the problem of size k10 in the same time.
The improvement is very little. - Hardware improvement has little use for solving
problems - that require exponential running time.
- Exponential running time is considered as
not efficient.
2Story
- All algorithms we have studied so far are
polynomial time algorithms. - Facts people have not yet found any polynomial
time algorithms for some famous problems, (e.g.,
Hamilton Circuit, longest simple path, Steiner
trees). - Question Do there exist polynomial time
algorithms for those famous problems? - Answer No body knows.
3Story
- Research topic Prove that polynomial time
algorithms do not exist for those famous
problems, e.g., Hamilton circuit problem. - You can get Turing award if you can give the
proof. - In order to answer the above question, people
define two classes of problems, P class and NP
class. - To answer if P?NP, a rich area, NP-completeness
theory is developed.
4Class P and Class NP
- Class P contains those problems that are solvable
in polynomial time. - They are problems that can be solved in O(nk)
time, where n is the input size and k is a
constant. - Class NP consists of those problem that are
verifiable in polynomial time. - What we mean here is that if we were somehow
given a solution, then we can verify that the
solution is correct in time polynomial in the
input size to the problem. - Example Hamilton Circuit given an order of the
n distinct vertices (v1, v2, , vn), we can test
if (vi, v i1) is an edge in G for i1, 2, ,
n-1 and (vn, v1) is an edge in G in time O(n)
(polynomial in the input size).
5Class P and Class NP
- Based on definitions, P?NP.
- If we can design a polynomial time algorithm for
problem A, then problem A is in P. - However, if we have not been able to design a
polynomial time algorithm for problem A, then
there are two possibilities - polynomial time algorithm does not exist for
problem A or - we are not smart.
- Open problem P?NP?
- Clay 1 million prize.
6Polynomial-Time Reductions
- Suppose we have a black box (an algorithm) that
could solve instances of a problem X If we give
the input of an instance of X, then in a single
step, the black box will return the correct
answer. - Question
- Can arbitrary instances of problem Y be solved
using polynomial number of standard computational
steps, plus a polynomial number of calls to a
black box that solves problem X? - If yes, then Y is polynomial-time reducible to X.
7 NP-Complete
- A problem X is NP-complete if it is in NP and
any problem Y in NP has a polynomial time
reduction to X. - it is the hardest problem in NP
- If an NP-complete problem can be solved in
polynomial time, then any problem in class NP
can be solved in polynomial time. - The first NPC problem is Satisfiability probelm
- Proved by Cook in 1971 and obtains the Turing
Award for this work
8 Boolean formula
- A boolean formula f(x1, x2, xn), where xi are
boolean variables (either 0 or 1), contains
boolean variables and boolean operations AND, OR
and NOT . - Clause variables and their negations are
connected with OR operation, e.g., (x1 OR NOTx2
OR x5) - Conjunctive normal form of boolean formula
- contains m clauses connected with AND
operation. - Example
- (x1 OR NOT x2) AND (x1 OR NOT x3 OR x6) AND
(x2 OR x6) AND (NOT x3 OR x5). - Here we have four clauses.
9Satisfiability problem
- Input conjunctive normal form with n variables,
x1, x2, , xn. - Problem find an assignment of x1, x2, , xn
(setting each xi to be 0 or 1) such that the
formula is true (satisfied). - Example conjunctive normal form is
- (x1 OR NOTx2) AND (NOT x1 OR x3).
- The formula is true for assignment
- x11, x20, x31.
- Note for n Boolean variables, there are 2n
assignments. - Testing if formula1 can be done in polynomial
time for any given assignment. - Given an assignment that satisfies formula1 is
hard.
10The First NP-complete Problem
- Theorem Satisfiability problem is NP-complete.
- It is the first NP-complete problem.
- S. A. Cook in 1971 http//en.wikipedia.org/wiki/St
ephen_Cook - Won Turing prize for his work.
- Significance
- If Satisfiability problem can be solved in
polynomial time, then ALL problems in class NP
can be solved in polynomial time. - If you want to solve P?NP, then you should work
on NPC problems such as satisfiability problem. - We can use the first NPC problem, Satisfiability
problem, to show that other problems are also
NP-complete.
11How to show that a problem is NPC?
- To show that problem A is NP-complete, we can
- First find a problem B that has been proved to be
NP-complete. - Show that if Problem A can be solved in
polynomial time, then problem B can also be
solved in polynomial time. - That is, to give a polynomial time reduction from
B to A. - Remarks Since a NPC problem, problem B, is the
hardest in class NP, problem A is also the hardest
12Hamilton circuit and Longest Simple Path
- Hamilton circuit a circuit uses every vertex
of the graph exactly once except for the last
vertex, which duplicates the first vertex. - It was shown to be NP-complete.
- Longest Simple Path
- Input Vv1, v2, ..., vn be a set of nodes
in a graph and d(vi, vj) the distance between
vi and vj,, find a longest simple path from u to
v . - Theorem 2 The longest simple path problem is
NP-complete.
13Theorem 2 The longest simple path (LSP) problem
is NP-complete.
- Proof
- Hamilton Circuit Problem (HC) Given a graph
G(V, E), find a Hamilton Circuit. - We want to show that if we can solve the longest
simple path problem in polynomial time, then we
can also solve the Hamilton circuit problem in
polynomial time. - Design a polynomial time algorithm to solve HC by
using an algorithm for LSP. - Step 0 Set the length of each edge in G to be
1 - Step 1 for each edge (u, v)?E do
- find the longest simple path P
from u to v in G. - Step 2 if the length of P is n-1 then by
adding edge (u, v) we - obtain an Hamilton circuit in G.
- Step 3 if no Hamilton circuit is found for
every (u, v) then - print no Hamilton circuit
exists - Conclusion
- if LSP can be solved in polynomial time, then HC
can also be solved in polynomial. - Since HC was proved to be NP-complete, LSP is
also NP-complete.
14Some basic NP-complete problems
- 3-Satisfiability Each clause contains at most
three variavles or their negations. - Vertex Cover Given a graph G(V, E), find a
subset V of V such that for each edge (u, v) in
E, at least one of u and v is in V and the size
of V is minimized. - Hamilton Circuit (definition was given before)
- History Satisfiability?3-Satisfiability?vertex
cover?Hamilton circuit. - Those proofs are very hard.
15Approximation Algorithms
- Concepts
- Knapsack
- Steiner Minimum Tree
- TSP
- Vertex Cover
16Concepts of Approximation Algorithms
- Optimization Problem
- The solution of the problem is associated with a
cost (value). - We want to maximize the cost or minimize the
cost. - Minimum spanning tree and shortest path are
optimization problems. - Euler circuit problem is NOT an optimization
problem. (it is a decision problem.)
17Approximation Algorithm
- An algorithm A is an approximation algorithm , if
given any instance I, it finds a candidate
solution s(I) - How good an approximation algorithm is?
- We use performance ratio to measure the quality
of an approximation algorithm.
18Performance ratio
- For minimization problem, the performance ratio
of algorithm A is defined as a number r such that
for any instance I of the problem, - where OPT(I) is the value of the optimal solution
for instance I and A(I) is the value of the
solution returned by algorithm A on instance I.
19Performance ratio
- For maximization problem, the performance ratio
of algorithm A is defined as a number r such that
for any instance I of the problem, - OPT(I)
- A(I)
- is at most r (r?1), where OPT(I) is the
value of the optimal solution for instance I and
A(I) is the value of the solution returned by
algorithm A on instance I.
20Simplified Knapsack Problem
- Given a finite set U of items, a size s(u) ? Z,
a capacity B?maxs(u)u ? U, find a subset U'?U
such that and such that the above summation
is as large as possible. (It is NP-hard.)
21Ratio-2 Algorithm
- Sort u's based on s(u)'s in increasing order.
- Select the smallest remaining u until no more u
can be added. - Compare the total value of selected items with
the item of the largest size, and select the
larger one. - Theorem The algorithm has performance ratio 2.
22Proof
- Case 1 the total of selected items ? 0.5B (got
it!) - Case 2 the total of selected items lt 0.5B.
- No remaining item left we get optimal.
- There are some remaining items the size of the
smallest remaining item gt0.5B. (Otherwise, we
can add it in.) - Selecting the largest item gives ratio-2.
23The 0-1 Knapsack problem
- The 0-1 knapsack problem
- N items, where the i-th item is worth vi dollars
and weight wi pounds. - vi and wi are integers.
- A thief can carry at most W (integer) pounds.
- How to take as valuable a load as possible.
- An item cannot be divided into pieces.
- The fractional knapsack problem
- The same setting, but the thief can take
fractions of items.
24Ratio-2 Algorithm
- Delete the items i with wigtW.
- Sort items in decreasing order based on vi/wi.
- Select the first k items item 1, item 2, , item
k such that - w1w2, wk ?W and w1w2, wk w
k1gtW. - 4. Compare vk1 with v1v2vk and select the
larger one. - Theorem The algorithm has performance ratio 2.
25Proof of ratio 2
- C(opt) the cost of optimum solution
- C(fopt) the optimal cost of the fractional
version. - C(opt)?C(fopt).
- v1v2vk v k1gt C(fopt).
- So, either v1v2vk gt0.5 C(fopt)?0.5c(opt)
- or v k1 gt0.5
C(fopt)?0.5c(opt). - Since the algorithm choose the larger one from
v1v2vk and v k1 - We know that the cost of the solution obtained by
the algorithm is at least 0.5 C(fopt)?c(opt).
26Steiner Minimum Tree
- Steiner minimum tree in the plane
- Input a set of points R (regular points) in the
plane. - Output a tree with smallest weight which
contains all the nodes in R. - Weight weight on an edge connecting two points
(x1,y1) and (x2,y2) in the plane is defined as
the Euclidean distance
27- Example Dark points are regular points.
28Triangle inequality
- Key for our approximation algorithm.
- For any three points in the plane, we have
- dist(a, c ) dist(a, b) dist(b, c).
- Examples
c
5
4
a
b
3
29Approximation algorithm(Steiner minimum tree in
the plane)
- Compute a minimum spanning tree for R as the
approximation solution for the Steiner minimum
tree problem. - How good the algorithm is? (in terms of the
quality of the solutions) - Theorem The performance ratio of the
approximation algorithm is 2.
30Proof
- We want to show that for any instance (input) I,
A(I)/OPT(I) r (r1), where A(I) is the cost
of the solution obtained from our spanning tree
algorithm, and OPT(I) is the cost of an optimal
solution.
31- Assume that T is the optimal solution for
instance I. Consider a traversal of T.
- Each edge in T is visited at most twice. Thus,
the total weight of the traversal is at most
twice of the weight of T, i.e., - w(traversal)2w(T)2OPT(I). .........(1)
32- Based on the traversal, we can get a spanning
tree ST as follows (Directly connect two nodes
in R based on the visited order of the traversal.)
From triangle inequality, w(ST)w(traversal)
2OPT(I). ..........(2)
33- Inequality(2) says that the cost of the spanning
tree ST is less than or equal to twice of the
cost of an optimal solution. - So, if we can compute ST, then we can get a
solution with cost2OPT(I).(Great! But finding
ST may also be very hard, since ST is obtained
from the optimal solution T, which we do not
know.) - We can find a minimum spanning tree MST for R in
polynomial time. - By definition of MST, w(MST) w(ST) 2OPT(I).
- Therefore, the performance ratio is 2.
34Story
- The method was known long time ago. The
performance ratio was conjectured to be - Du and Hwang (1990 ) proved that the conjecture
is true.
35Graph Steiner minimum tree
- Input a graph G(V,E), a weight w(e) for each
e?E, and a subset R?V. - Output a tree with minimum weight which contains
all the nodes in R. - The nodes in R are called regular points. Note
that, the Steiner minimum tree could contain some
nodes in V-R and the nodes in V-R are called
Steiner points.
36- Example Let G be shown in Figure a. Ra,b,c.
The Steiner minimum tree T(a,d),(b,d),(c,d)
which is shown in Figure b. - Theorem Graph Steiner minimum tree problem is
NP-complete.
37Approximation algorithm(Graph Steiner minimum
tree)
- For each pair of nodes u and v in R, compute the
shortest path from u to v and assign the cost of
the shortest path from u to v as the length of
edge (u, v). (a complete graph is given) - Compute a minimum spanning tree for the modified
complete graph. - Include the nodes in the shortest paths used.
38- Theorem The performance ratio of this algorithm
is 2. - Proof
- We only have to prove that Triangle Inequality
holds. If - dist(a,c)gtdist(a,b)dist(b,c) ......(3)
- then we modify the path from a to c like
- a?b?c
- Thus, (3) is impossible.
39g
a
e
c
d
f
b
The given graph
40e-c-g /7
g /3
e /4
a
c
d
f/ 2
e /3
b
f-c-g/5
Modified complete graph
41g/3
a
c
d
f /2
e /3
b
The minimum spanning tree
42g
2
1
2
a
e
c
d
1
1
f
b
1
The approximate Steiner tree
43Approximation Algorithm for TSP with triangle
inequality
- Given n points in a plane, find a tour to visit
each city exactly once. - Assumption the triangle inequality holds. That
is, d (a, c) d (a, b) d (b, c). - This condition is reasonable, for example,
whenever the cities are points in the plane and
the distance between two points is the Euclidean
distance. - Theorem TSP with triangle inequality is also
NP-hard.
44Ratio 2 Algorithm
- Algorithm A
- Compute a minimum spanning tree algorithm (Figure
a) - Visit all the cities by traversing twice around
the tree. This visits some cities more than once.
(Figure b) - Shortcut the tour by going directly to the next
unvisited city. (Figure c)
45 46Proof of Ratio 2
- The cost of a minimum spanning tree cost(t), is
not greater than opt(TSP), the cost of an optimal
TSP. (Why? n-1 edges in a spanning tree. n edges
in TSP. Delete one edge in TSP, we get a spanning
tree. Minimum spanning tree has the smallest
cost.) - The cost of the TSP produced by our algorithm is
less than 2cost(T) and thus is less than
2opt(TSP).
47- Center Selection Problem
- Problem Given a set of points V in the plane (or
some other metric space), find k points c1, c2,
.., ck such that for each v in V, - min i1, 2, , k d(v, ci) ? d
- and d is minimized.
48- Farthest-point clustering algorithm
- Step 1 arbitrarily select a point in V as c1.
- Step 2 let i2.
- Step 3 pick a point ci from V c1, c2, ,
ci-1 to maximize min c1ci, c2ci,,ci-1
ci. - Step 4 ii1
- Step 5 repeat Steps 3 and 4 until ik.
-
49- Theorem Farthest-point clustering algorithm has
ratio-2. - Proof Let c i be an point in V that maximize
- ?imin c1ci, c2ci,,ci-1 ci.
- We have ?i ? ?i-1 for any i.
- Since two, say ci and cj (igtj), of the k1
points must be in the same group (in an opt
solution), ?i ?2 opt. - Thus, ?k1 ? 2 opt.
- For any v in V, by the definition of ?k1 ,
- min c1v, c2v,,ck v ? ?k1 .
- So the algorithm has ratio-2.
50Vertex Cover Problem
- Given a graph G(V, E), find V'?V with minimum
number of vertices such that for each edge (u,
v)?E at least one of u and v is in V. - V' is called vertex cover.
- The problem is NP-hard.
- A ratio-2 algorithm exists for vertex cover
problem.