Title: CSC 332 Algorithms and Data Structures
1CSC 332 Algorithms and Data Structures
Dr. Paige H. Meeker Computer Science Presbyterian
College, Clinton, SC
2NP-Complete
- There are many problems in computer science that
can be solved quickly and efficiently weve
talked about several in class - NP-Complete problems are problems that cant be
solved quickly.
3Is it important?
- Suppose you are in industry. One day, your boss
tells you the company is heading into the
whohoo market and he needs a good method for
determining whether or not any given set of
specifications for the whohoo components can be
met and, if so, for constructing a design that
meets them. You are the chief algorithm
designer you must find an efficient algorithm to
do this.
4Whohoo-ing
- Once you make sure you completely understand the
problem, you begin to research and work.
However, weeks later you are no closer to a
solution that is better than searching all
possible designs. The boss will not be happy.
What do you do?
5Whohoo-ing
- Tell the boss Im dumb find someone else.
- Tell the boss I cant find an efficient
solution, because no such algorithm is possible. - Tell the boss I cant find an efficient
solution, and neither can any of these other
famous people.
6NP-Completeness
- Provides a straight-forward technique for proving
that a given problem is just as hard as a large
number of other problems proven to be so
difficult that no expert has been able to solve
efficiently.
7Does this solve the problem?
- You still need to find some solution for the
Whohoo problem However, knowing it is an
NP-Complete problem will provide information
about what approach you should take and what to
avoid. - So, what do you do?
8Semi-solving
- Use a heuristic find some method that works in
a reasonable number of common cases - Solve the problem approximately instead of
exactly - Use an exponential algorithm anyway to find the
exact solution - Choose a better abstraction dont ignore
seemingly unimportant details they may change
an unsolvable problem into one that is
manageable.
9Problem Classification Computational Complexity
Theory
- Subject dedicated to classifying problems by how
hard they are. Many different classifications
but the most common are - P Problems that can be solved in polynomial
time - NP Nondeterministic Polynomial Time you
guess the solution and check in polynomial time
if your guess was correct.
10Problem Classification Computational Complexity
Theory
- Other classes include
- PSPACE Problems that can be solved using a
reasonable amount of memory - EXPTIME Problems that can be solved in
exponential time - Undecidable Problems where it has been proven
that no algorithm exists to solve them.
11NP-Completeness
- Concerned with the first two classifications P
vs. NP - NP-complete problems are the most difficult
problems in NP in the sense that they are the
ones most likely not to be in P. The reason is
that if one could find a way to solve any
NP-complete problem quickly (in polynomial time),
then they could use that algorithm to solve all
NP-complete problems quickly. The complexity
class consisting of all NP-complete problems is
sometimes referred to as NP-C.
12Formal Definition
- A decision problem C is NP-complete if it is
complete for NP, meaning that - it is in NP and
- it is NP-hard, i.e. every other problem in NP is
reducible to it. - "Reducible" here means that for every problem L,
there is a polynomial-time many-one reduction, a
deterministic algorithm which transforms
instances l (element of) L into instances of c
(element of) C, such that the answer to c is YES
if and only if the answer to l is YES. To prove
that an NP problem A is in fact an NP-complete
problem it is sufficient to show that an already
known NP-complete problem reduces to A. - A consequence of this definition is that if we
had a polynomial time algorithm for C, we could
solve all problems in NP in polynomial time.
13Problem Examples
- Example 1 Long simple paths. A simple path in a
graph is just one without any repeated edges or
vertices. To describe the problem of finding long
paths in terms of complexity theory, we need to
formalize it as a yes-or-no question given a
graph G, vertices s and t, and a number k, does
there exist a simple path from s to t with at
least k edges? A solution to this problem would
then consist of such a path. - Why is this in NP? If you're given a path, you
can quickly look at it and add up the length,
double-checking that it really is a path with
length at least k. This can all be done in linear
time, so certainly it can be done in polynomial
time. - However we don't know whether this problem is in
P I haven't told you a good way for finding such
a path (with time polynomial in m,n, and K). And
in fact this problem is NP-complete, so we
believe that no such algorithm exists. (NOTE
This is not a formal proof by any stretch of the
imagination!) - There are algorithms that solve the problem for
instance, list all 2m subsets of edges and check
whether any of them solves the problem. But as
far as we know there is no algorithm that runs in
polynomial time.
14Problem Examples
- Example 2 Cryptography.
- Suppose we have an encryption function e.g.
codeRSA(key,text) The "RSA" encryption works by
performing some simple integer arithmetic on the
code and the key, which consists of a pair (p,q)
of large prime numbers. One can perform the
encryption only knowing the product pq but to
decrypt the code you instead need to know a
different product, (p-1)(q-1). A standard
assumption in cryptography is the "known
plaintext attack" we have the code for some
message, and we know (or can guess) the text of
that message. We want to use that information to
discover the key, so we can decrypt other
messages sent using the same key. - Formalized as an NP problem, we simply want to
find a key for which codeRSA(key,text). If
you're given a key, you can test it by doing the
encryption yourself, so this is in NP. - The hard question is, how do you find the key?
For the code to be strong we hope it isn't
possible to do much better than a brute force
search. - Another common use of RSA involves "public key
cryptography" a user of the system publishes the
product pq, but doesn't publish p, q, or
(p-1)(q-1). That way anyone can send a message to
that user by using the RSA encryption, but only
the user can decrypt it. Breaking this scheme can
also be thought of as a different NP problem
given a composite number pq, find a factorization
into smaller numbers. - One can test a factorization quickly (just
multiply the factors back together again), so the
problem is in NP. Finding a factorization seems
to be difficult, and we think it may not be in P.
However there is some strong evidence that it is
not NP-complete either it seems to be one of the
(very rare) examples of problems between P and
NP-complete in difficulty.
15Problem Examples
- Example 3 Chess.
- We've seen in the news a match between the world
chess champion, Gary Kasparov, and a very fast
chess computer, Deep Blue. - What is involved in chess programming?
Essentially the sequences of possible moves form
a tree The first player has a choice of 20
different moves (most of which are not very
good), after each of which the second player has
a choice of many responses, and so on. Chess
playing programs work by traversing this tree
finding what the possible consequences would be
of each different move. - The tree of moves is not very deep -- a typical
chess game might last 40 moves, and it is rare
for one to reach 200 moves. Since each move
involves a step by each player, there are at most
400 positions involved in most games. If we
traversed the tree of chess positions only to
that depth, we would only need enough memory to
store the 400 positions on a single path at a
time. This much memory is easily available on the
smallest computers you are likely to use. - So perfect chess playing is a problem in PSPACE.
(Actually one must be more careful in
definitions. There is only a finite number of
positions in chess, so in principle you could
write down the solution in constant time. But
that constant would be very large. Generalized
versions of chess on larger boards are in
PSPACE.) - The reason this deep game-tree search method
can't be used in practice is that the tree of
moves is very bushy, so that even though it is
not deep it has an enormous number of vertices.
We won't run out of space if we try to traverse
it, but we will run out of time before we get
even a small fraction of the way through. Some
pruning methods, notably "alpha-beta search" can
help reduce the portion of the tree that needs to
be examined, but not enough to solve this
difficulty. For this reason, actual chess
programs instead only search a much smaller depth
(such as up to 7 moves), at which point they
don't have enough information to evaluate the
true consequences of the moves and are forced to
guess by using heuristic "evaluation functions"
that measure simple quantities such as the total
number of pieces left.
16Problem Examples
- Example 4 Knots.
- If I give you a three-dimensional polygon (e.g.
as a sequence of vertex coordinate triples), is
there some way of twisting and bending the
polygon around until it becomes flat? Or is it
knotted? - There is an algorithm for solving this problem,
which is very complicated and has not really been
adequately analyzed. However it runs in at least
exponential time. - One way of proving that certain polygons are not
knots is to find a collection of triangles
forming a surface with the polygon as its
boundary. However this is not always possible
(without adding exponentially many new vertices)
and even when possible it's NP-complete to find
these triangles. - There are also some heuristics based on finding a
non-Euclidean geometry for the space outside of a
knot that work very well for many knots, but are
not known to work for all knots. So this is one
of the rare examples of a problem that can often
be solved efficiently in practice even though it
is theoretically not known to be in P. - Certain related problems in higher dimensions (is
this four-dimensional surface equivalent to a
four-dimensional sphere) are provably undecidable.
17Problem Examples
- Example 5 Halting problem.
- Suppose you're working on a lab for a programming
class, have written your program, and start to
run it. After five minutes, it is still going.
Does this mean it's in an infinite loop, or is it
just slow? - It would be convenient if your compiler could
tell you that your program has an infinite loop.
However this is an undecidable problem there is
no program that will always correctly detect
infinite loops. - Some people have used this idea as evidence that
people are inherently smarter than computers,
since it shows that there are problems computers
can't solve. However it's not clear to me that
people can solve them either. Here's an example - main() int x 3 for () for (int a 1 a
lt x a) for (int b 1 b lt x b) for (int
c 1 c lt x c) for (int i 3 i lt x i)
if(pow(a,i) pow(b,i) pow(c,i)) exit x
- This program searches for solutions to Fermat's
last theorem. Does it halt? (You can assume I'm
using a multiple-precision integer package
instead of built in integers, so don't worry
about arithmetic overflow complications.) To be
able to answer this, you have to understand the
recent proof of Fermat's last theorem. There are
many similar problems for which no proof is
known, so we are clueless whether the
corresponding programs halt.
18Problems of Complexity Theory
- Does PNP?
- If its always easy to check a solution, should
it also be easy to find the solution?
19Why are we so interested?
- One of the most tantalizing parts of the NP-C?P
problem is that so many NP-C problems look very
similar to problems that we CAN solve in
polynomial time. For example
20Shortest vs Longest Simple Paths
- Given a graph, we can find the shortest paths
from a single source in a directed graph in
O(V,E) time. Finding the LONGEST simple path
between two vertices is difficult. Even just
trying to find out if a graph contains a path of
a certain number of edges is NP-C
21Euler Tour vs. Hamiltonian Cycle
- A Euler Tour of a connected, directed graph is a
cycle that traverses each edge of the graph
exactly once, though we may visit a vertex more
than once. We can do this in O(E) time. A
Hamiltonian Cycle of a directed graph G(V,E) is
a simple cycle that contains each vertex in V.
This is an NP-C problem even if the graph is
undirected!
222-CNF Satisfiability vs. 3-CNF Satisfiability
- A boolean formula contains variables whose values
are 0 or 1 connectives such as AND and OR and
NOT and parenthesis. A boolean formula is
satisfiable if you can assign the values of 0 or
1 to the variables in such a way that you get a
true result. If there are 2 variables per set of
(), we can solve this problem in polynomial time.
If there are 3 or more, the problem is NP-C.
23- The theory of NP-completeness is a solution to
the practical problem of applying complexity
theory to individual problems. NP-complete
problems are defined in a precise sense as the
hardest problems in P. Even though we don't know
whether there is any problem in NP that is not in
P, we can point to an NP-complete problem and say
that if there are any hard problems in NP, that
problems is one of the hard ones. (Conversely if
everything in NP is easy, those problems are
easy. So NP-completeness can be thought of as a
way of making the big PNP question equivalent to
smaller questions about the hardness of
individual problems.) - So if we believe that P and NP are unequal, and
we prove that some problem is NP-complete, we
should believe that it doesn't have a fast
algorithm. - For unknown reasons, most problems we've looked
at in NP turn out either to be in P or
NP-complete. So the theory of NP-completeness
turns out to be a good way of showing that a
problem is likely to be hard, because it applies
to a lot of problems. But there are problems that
are in NP, not known to be in P, and not likely
to be NP-complete.
24Reduction
- What is reduction?
- What does it mean if a problem is reducible to
another, it is also NP-hard? - Just a complex way of saying one problem is
easier than another
25Reduction
- Intuitively, a problem Q can be reduced to
another problem Q if any instance of Q can be
easily rephrased as an instance of Q, the
solution of which provides a solution to the
instance of Q.
26Reduction
- Given two problems, A and B, we say that A is
easier than (reducible to) B, and write A lt B, if
we can write down an algorithm for solving A that
uses a small number of calls to a subroutine for
B (with everything outside the subroutine calls
being fast, polynomial time). - Then if A lt B, and B is in P, so is A we can
write down a polynomial algorithm for A by
expanding the subroutine calls to use the fast
algorithm for B. - Basically, if one problem can be solved in
polynomial time, so can the other.
27Its all in how you phrase things
- Remember the Eularian tour? Can we find a path in
a graph that visits each edge exactly once? - Yes as long as certain facts about the graph
are true either way, we can quickly find an
answer of yes and here it is or no, cant be
done here - Lets change the parameters a little
- Does a given graph have a cycle that visits each
vertex exactly once?
28Hamiltonian Cycle
- Finding if a graph has a Hamiltonian cycle is
NP-Complete. If you could solve it in polynomial
time, you could also solve these famous problems - Vertex Cover
- 3-Satisfiability
- Traveling Salesman
- Satisfiability
- Hamiltonian Path
- Longest Path
- Any other problem in NP that is polynomial
reducible to any of these i.e. all of them!
29Cooks Theorem
- The very first NP-complete problem goes to a
decision problem from Boolean logic
Satisfiability problem (SAT for short) - Its a very complicated proof if youre
interested, come by my office
306 Basic NP-Complete problems
- 3-SAT
- 3DM (3-Dimensional Matching)
- Vertex Cover (VC)
- Clique
- Hamiltonial Circuit (HC)
- Partition
313-SAT
- INSTANCE A collection C of clauses on a finite
set U of variables such that the number of
elements in each clause is exactly 3. - QUESTION Is there a truth assignment for U that
satisfies all the clauses in C?
323DM
- Instance A set M (subset of) WxXxY, where W, X,
and Y are disjoint sets having the same number q
of elements - Question Does M contain a matching, that is, a
subset M (subset of) M such that the number of
elements in Mq and no two elements of M agree
in any coordinate?
33Vertex Cover
- Instance A graph G(V,E) and a positive integer
K lt V. - Question Is there a vertex cover of size K or
less for G? i.e. is there a subset V of V such
that VltK and, for each edge (u,v), at least
one of u or v belongs to V?
34Clique
- Instance A graph G(V,E) and a positive integer
JltV - Question Does G contain a clique of size J or
more, that is a subset V of V such that VgtJ
and every two vertices in V are joined by an
edge in E?
35Hamiltonian Circuit
- Instance A graph G(V,E)
- Question Does G contain a Hamiltonian circuit,
that is, an ordering ltv1,v2,vngt of the vertices
of G, where nV, such that (vn,v1) is in E and
(vi,v(i1)) is in E for all i, 1ltiltn?
36Partition
- Instance A finite set A and a size s(a) that
is a positive integer for each a in A. - Question Is there a subset A of A such that the
sum of s(a) in A the sum of s(a) in A-A?
37Diagram of transformation used to prove the 6
basic problems are NP-C(See Handout)
38How to determine if they are NP-Complete?
- Step 1 Can you guess a solution?
- Step 2 Can you transform a KNOWN NP-Complete
problem into this one using a polynomial time
algorithm? - That means, for every instance of the known
problem, there is a mapping to at least one
instance of the problem youre trying to prove to
be NP-C AND that this mapping can be found in
polynomial time
39NP-Completeness Proofs
- Prove that your problem is in NP.
- Select a known NP-C Problem
- Describe an algorithm that computes a function
which maps every instance of the NP-C known
problem to ONE instance of your problem. - Prove that the function is correct.
- Prove that the algorithm that computes the
function runs in polynomial time.