NP-Complete Problems
  • Polynomial time vs exponential time
  • Polynomial O(nk), where n is the input size
    (e.g., number of nodes in a graph, the length of
    strings , etc) of our problem and k is a constant
    (e.g., k1, 2, 3, etc).
  • Exponential time 2n or nn.
  • n 2, 10, 20,
  • 2n 4 1024 1 million
    1000 million
  • Suppose our computer can solve a problem of size
    k (i.e., compute 2k operations) in a
    hour/week/month. If the new computer is 1024
    times faster than ours, then the new computer can
    solve the problem of size k10 in the same time.
    The improvement is very little.
  • Hardware improvement has little use for solving
  • that require exponential running time.
  • Exponential running time is considered as
    not efficient.

  • All algorithms we have studied so far are
    polynomial time algorithms.
  • Facts people have not yet found any polynomial
    time algorithms for some famous problems, (e.g.,
    Hamilton Circuit, longest simple path, Steiner
  • Question Do there exist polynomial time
    algorithms for those famous problems?
  • Answer No body knows.

  • Research topic Prove that polynomial time
    algorithms do not exist for those famous
    problems, e.g., Hamilton circuit problem.
  • You can get Turing award if you can give the
  • In order to answer the above question, people
    define two classes of problems, P class and NP
  • To answer if P?NP, a rich area, NP-completeness
    theory is developed.

Class P and Class NP
  • Class P contains those problems that are solvable
    in polynomial time.
  • They are problems that can be solved in O(nk)
    time, where n is the input size and k is a
  • Class NP consists of those problem that are
    verifiable in polynomial time.
  • What we mean here is that if we were somehow
    given a solution, then we can verify that the
    solution is correct in time polynomial in the
    input size to the problem.
  • Example Hamilton Circuit given an order of the
    n distinct vertices (v1, v2, , vn), we can test
    if (vi, v i1) is an edge in G for i1, 2, ,
    n-1 and (vn, v1) is an edge in G in time O(n)
    (polynomial in the input size).

Class P and Class NP
  • Based on definitions, P?NP.
  • If we can design a polynomial time algorithm for
    problem A, then problem A is in P.
  • However, if we have not been able to design a
    polynomial time algorithm for problem A, then
    there are two possibilities
  • polynomial time algorithm does not exist for
    problem A or
  • we are not smart.
  • Open problem P?NP?
  • Clay 1 million prize.

Polynomial-Time Reductions
  • Suppose we have a black box (an algorithm) that
    could solve instances of a problem X If we give
    the input of an instance of X, then in a single
    step, the black box will return the correct
  • Question
  • Can arbitrary instances of problem Y be solved
    using polynomial number of standard computational
    steps, plus a polynomial number of calls to a
    black box that solves problem X?
  • If yes, then Y is polynomial-time reducible to X.

  • A problem X is NP-complete if it is in NP and
    any problem Y in NP has a polynomial time
    reduction to X.
  • it is the hardest problem in NP
  • If an NP-complete problem can be solved in
    polynomial time, then any problem in class NP
    can be solved in polynomial time.
  • The first NPC problem is Satisfiability probelm
  • Proved by Cook in 1971 and obtains the Turing
    Award for this work

Boolean formula
  • A boolean formula f(x1, x2, xn), where xi are
    boolean variables (either 0 or 1), contains
    boolean variables and boolean operations AND, OR
    and NOT .
  • Clause variables and their negations are
    connected with OR operation, e.g., (x1 OR NOTx2
    OR x5)
  • Conjunctive normal form of boolean formula
  • contains m clauses connected with AND
  • Example
  • (x1 OR NOT x2) AND (x1 OR NOT x3 OR x6) AND
    (x2 OR x6) AND (NOT x3 OR x5).
  • Here we have four clauses.

Satisfiability problem
  • Input conjunctive normal form with n variables,
    x1, x2, , xn.
  • Problem find an assignment of x1, x2, , xn
    (setting each xi to be 0 or 1) such that the
    formula is true (satisfied).
  • Example conjunctive normal form is
  • (x1 OR NOTx2) AND (NOT x1 OR x3).
  • The formula is true for assignment
  • x11, x20, x31.
  • Note for n Boolean variables, there are 2n
  • Testing if formula1 can be done in polynomial
    time for any given assignment.
  • Given an assignment that satisfies formula1 is

The First NP-complete Problem
  • Theorem Satisfiability problem is NP-complete.
  • It is the first NP-complete problem.
  • S. A. Cook in 1971 http//
  • Won Turing prize for his work.
  • Significance
  • If Satisfiability problem can be solved in
    polynomial time, then ALL problems in class NP
    can be solved in polynomial time.
  • If you want to solve P?NP, then you should work
    on NPC problems such as satisfiability problem.
  • We can use the first NPC problem, Satisfiability
    problem, to show that other problems are also

How to show that a problem is NPC?
  • To show that problem A is NP-complete, we can
  • First find a problem B that has been proved to be
  • Show that if Problem A can be solved in
    polynomial time, then problem B can also be
    solved in polynomial time.
  • That is, to give a polynomial time reduction from
    B to A.
  • Remarks Since a NPC problem, problem B, is the
    hardest in class NP, problem A is also the hardest

Hamilton circuit and Longest Simple Path
  • Hamilton circuit a circuit uses every vertex
    of the graph exactly once except for the last
    vertex, which duplicates the first vertex.
  • It was shown to be NP-complete.
  • Longest Simple Path
  • Input Vv1, v2, ..., vn be a set of nodes
    in a graph and d(vi, vj) the distance between
    vi and vj,, find a longest simple path from u to
    v .
  • Theorem 2 The longest simple path problem is

Theorem 2 The longest simple path (LSP) problem
is NP-complete.
  • Proof
  • Hamilton Circuit Problem (HC) Given a graph
    G(V, E), find a Hamilton Circuit.
  • We want to show that if we can solve the longest
    simple path problem in polynomial time, then we
    can also solve the Hamilton circuit problem in
    polynomial time.
  • Design a polynomial time algorithm to solve HC by
    using an algorithm for LSP.
  • Step 0 Set the length of each edge in G to be
  • Step 1 for each edge (u, v)?E do
  • find the longest simple path P
    from u to v in G.
  • Step 2 if the length of P is n-1 then by
    adding edge (u, v) we
  • obtain an Hamilton circuit in G.
  • Step 3 if no Hamilton circuit is found for
    every (u, v) then
  • print no Hamilton circuit
  • Conclusion
  • if LSP can be solved in polynomial time, then HC
    can also be solved in polynomial.
  • Since HC was proved to be NP-complete, LSP is
    also NP-complete.

Some basic NP-complete problems
  • 3-Satisfiability Each clause contains at most
    three variavles or their negations.
  • Vertex Cover Given a graph G(V, E), find a
    subset V of V such that for each edge (u, v) in
    E, at least one of u and v is in V and the size
    of V is minimized.
  • Hamilton Circuit (definition was given before)
  • History Satisfiability?3-Satisfiability?vertex
    cover?Hamilton circuit.
  • Those proofs are very hard.

Approximation Algorithms
  • Concepts
  • Knapsack
  • Steiner Minimum Tree
  • TSP
  • Vertex Cover

Concepts of Approximation Algorithms
  • Optimization Problem
  • The solution of the problem is associated with a
    cost (value).
  • We want to maximize the cost or minimize the
  • Minimum spanning tree and shortest path are
    optimization problems.
  • Euler circuit problem is NOT an optimization
    problem. (it is a decision problem.)

Approximation Algorithm
  • An algorithm A is an approximation algorithm , if
    given any instance I, it finds a candidate
    solution s(I)
  • How good an approximation algorithm is?
  • We use performance ratio to measure the quality
    of an approximation algorithm.

Performance ratio
  • For minimization problem, the performance ratio
    of algorithm A is defined as a number r such that
    for any instance I of the problem,
  • where OPT(I) is the value of the optimal solution
    for instance I and A(I) is the value of the
    solution returned by algorithm A on instance I.

Performance ratio
  • For maximization problem, the performance ratio
    of algorithm A is defined as a number r such that
    for any instance I of the problem,
  • OPT(I)
  • A(I)
  • is at most r (r?1), where OPT(I) is the
    value of the optimal solution for instance I and
    A(I) is the value of the solution returned by
    algorithm A on instance I.

Simplified Knapsack Problem
  • Given a finite set U of items, a size s(u) ? Z,
    a capacity B?maxs(u)u ? U, find a subset U'?U
    such that and such that the above summation
    is as large as possible. (It is NP-hard.)

Ratio-2 Algorithm
  • Sort u's based on s(u)'s in increasing order.
  • Select the smallest remaining u until no more u
    can be added.
  • Compare the total value of selected items with
    the item of the largest size, and select the
    larger one.
  • Theorem The algorithm has performance ratio 2.

  • Case 1 the total of selected items ? 0.5B (got
  • Case 2 the total of selected items lt 0.5B.
  • No remaining item left we get optimal.
  • There are some remaining items the size of the
    smallest remaining item gt0.5B. (Otherwise, we
    can add it in.)
  • Selecting the largest item gives ratio-2.

The 0-1 Knapsack problem
  • The 0-1 knapsack problem
  • N items, where the i-th item is worth vi dollars
    and weight wi pounds.
  • vi and wi are integers.
  • A thief can carry at most W (integer) pounds.
  • How to take as valuable a load as possible.
  • An item cannot be divided into pieces.
  • The fractional knapsack problem
  • The same setting, but the thief can take
    fractions of items.

Ratio-2 Algorithm
  • Delete the items i with wigtW.
  • Sort items in decreasing order based on vi/wi.
  • Select the first k items item 1, item 2, , item
    k such that
  • w1w2, wk ?W and w1w2, wk w
  • 4. Compare vk1 with v1v2vk and select the
    larger one.
  • Theorem The algorithm has performance ratio 2.

Proof of ratio 2
  • C(opt) the cost of optimum solution
  • C(fopt) the optimal cost of the fractional
  • C(opt)?C(fopt).
  • v1v2vk v k1gt C(fopt).
  • So, either v1v2vk gt0.5 C(fopt)?0.5c(opt)
  • or v k1 gt0.5
  • Since the algorithm choose the larger one from
    v1v2vk and v k1
  • We know that the cost of the solution obtained by
    the algorithm is at least 0.5 C(fopt)?c(opt).

Steiner Minimum Tree
  • Steiner minimum tree in the plane
  • Input a set of points R (regular points) in the
  • Output a tree with smallest weight which
    contains all the nodes in R.
  • Weight weight on an edge connecting two points
    (x1,y1) and (x2,y2) in the plane is defined as
    the Euclidean distance

  • Example Dark points are regular points.

Triangle inequality
  • Key for our approximation algorithm.
  • For any three points in the plane, we have
  • dist(a, c ) dist(a, b) dist(b, c).
  • Examples

Approximation algorithm(Steiner minimum tree in
the plane)
  • Compute a minimum spanning tree for R as the
    approximation solution for the Steiner minimum
    tree problem.
  • How good the algorithm is? (in terms of the
    quality of the solutions)
  • Theorem The performance ratio of the
    approximation algorithm is 2.

  • We want to show that for any instance (input) I,
    A(I)/OPT(I) r (r1), where A(I) is the cost
    of the solution obtained from our spanning tree
    algorithm, and OPT(I) is the cost of an optimal

  • Assume that T is the optimal solution for
    instance I. Consider a traversal of T.
  • Each edge in T is visited at most twice. Thus,
    the total weight of the traversal is at most
    twice of the weight of T, i.e.,
  • w(traversal)2w(T)2OPT(I). .........(1)

  • Based on the traversal, we can get a spanning
    tree ST as follows (Directly connect two nodes
    in R based on the visited order of the traversal.)

From triangle inequality, w(ST)w(traversal)
2OPT(I). ..........(2)
  • Inequality(2) says that the cost of the spanning
    tree ST is less than or equal to twice of the
    cost of an optimal solution.
  • So, if we can compute ST, then we can get a
    solution with cost2OPT(I).(Great! But finding
    ST may also be very hard, since ST is obtained
    from the optimal solution T, which we do not
  • We can find a minimum spanning tree MST for R in
    polynomial time.
  • By definition of MST, w(MST) w(ST) 2OPT(I).
  • Therefore, the performance ratio is 2.

  • The method was known long time ago. The
    performance ratio was conjectured to be
  • Du and Hwang (1990 ) proved that the conjecture
    is true.

Graph Steiner minimum tree
  • Input a graph G(V,E), a weight w(e) for each
    e?E, and a subset R?V.
  • Output a tree with minimum weight which contains
    all the nodes in R.
  • The nodes in R are called regular points. Note
    that, the Steiner minimum tree could contain some
    nodes in V-R and the nodes in V-R are called
    Steiner points.

  • Example Let G be shown in Figure a. Ra,b,c.
    The Steiner minimum tree T(a,d),(b,d),(c,d)
    which is shown in Figure b.
  • Theorem Graph Steiner minimum tree problem is

Approximation algorithm(Graph Steiner minimum
  1. For each pair of nodes u and v in R, compute the
    shortest path from u to v and assign the cost of
    the shortest path from u to v as the length of
    edge (u, v). (a complete graph is given)
  2. Compute a minimum spanning tree for the modified
    complete graph.
  3. Include the nodes in the shortest paths used.

  • Theorem The performance ratio of this algorithm
    is 2.
  • Proof
  • We only have to prove that Triangle Inequality
    holds. If
  • dist(a,c)gtdist(a,b)dist(b,c) ......(3)
  • then we modify the path from a to c like
  • a?b?c
  • Thus, (3) is impossible.

  • Example II-1

The given graph
  • Example II-2

e-c-g /7
g /3
e /4
f/ 2
e /3
Modified complete graph
  • Example II-3

f /2
e /3
The minimum spanning tree
  • Example II-4

The approximate Steiner tree
Approximation Algorithm for TSP with triangle
  • Given n points in a plane, find a tour to visit
    each city exactly once.
  • Assumption the triangle inequality holds. That
    is, d (a, c) d (a, b) d (b, c).
  • This condition is reasonable, for example,
    whenever the cities are points in the plane and
    the distance between two points is the Euclidean
  • Theorem TSP with triangle inequality is also

Ratio 2 Algorithm
  • Algorithm A
  • Compute a minimum spanning tree algorithm (Figure
  • Visit all the cities by traversing twice around
    the tree. This visits some cities more than once.
    (Figure b)
  • Shortcut the tour by going directly to the next
    unvisited city. (Figure c)

  • Example

Proof of Ratio 2
  1. The cost of a minimum spanning tree cost(t), is
    not greater than opt(TSP), the cost of an optimal
    TSP. (Why? n-1 edges in a spanning tree. n edges
    in TSP. Delete one edge in TSP, we get a spanning
    tree. Minimum spanning tree has the smallest
  2. The cost of the TSP produced by our algorithm is
    less than 2cost(T) and thus is less than

  • Center Selection Problem
  • Problem Given a set of points V in the plane (or
    some other metric space), find k points c1, c2,
    .., ck such that for each v in V,
  • min i1, 2, , k d(v, ci) ? d
  • and d is minimized.

  • Farthest-point clustering algorithm
  • Step 1 arbitrarily select a point in V as c1.
  • Step 2 let i2.
  • Step 3 pick a point ci from V c1, c2, ,
    ci-1 to maximize min c1ci, c2ci,,ci-1
  • Step 4 ii1
  • Step 5 repeat Steps 3 and 4 until ik.

  • Theorem Farthest-point clustering algorithm has
  • Proof Let c i be an point in V that maximize
  • ?imin c1ci, c2ci,,ci-1 ci.
  • We have ?i ? ?i-1 for any i.
  • Since two, say ci and cj (igtj), of the k1
    points must be in the same group (in an opt
    solution), ?i ?2 opt.
  • Thus, ?k1 ? 2 opt.
  • For any v in V, by the definition of ?k1 ,
  • min c1v, c2v,,ck v ? ?k1 .
  • So the algorithm has ratio-2.

Vertex Cover Problem
  • Given a graph G(V, E), find V'?V with minimum
    number of vertices such that for each edge (u,
    v)?E at least one of u and v is in V.
  • V' is called vertex cover.
  • The problem is NP-hard.
  • A ratio-2 algorithm exists for vertex cover
