Title: Approximation Algorithms
1Approximation Algorithms
2Motivation
- By now weve seen many NP-Complete problems.
- We conjecture none of them has polynomial time
algorithm.
3Motivation
- Is this a dead-end? Should we give up altogether?
?
4Motivation
- Or maybe we can settle for good approximation
algorithms?
5Introduction
- Objectives
- To formalize the notion of approximation.
- To demonstrate several such algorithms.
- Overview
- Optimization and Approximation
- VERTEX-COVER, SET-COVER
6Optimization
- Many of the problems weve encountered so far are
really optimization problems. - I.e - the task can be naturally rephrased as
finding a maximal/minimal solution. - For example finding a maximal clique in a graph.
7Approximation
- An algorithm which returns an answer C which is
close to the optimal solution C is called an
approximation algorithm. - Closeness is usually measured by the ratio
bound ?(n) the algorithm produces. - Which is a function that satisfies, for any input
size n, maxC/C,C/C??(n).
8VERTEX-COVER
- Instance an undirected graph G(V,E).
- Problem find a set C?V of minimal size s.t. for
any (u,v)?E, either u?C or v?C.
Example
9Minimum VC NP-hard
- Proof It is enough to show the decision problem
below is NP-Complete
- Instance an undirected graph G(V,E) and a
number k. - Problem to decide if there exists a set V?V of
size k s.t for any (u,v)?E, u?V or v?V.
This follows immediately from the following
observation.
10Minimum VC NP-hard
- Observation Let G(V,E) be an undirected graph.
The complement V\C of a vertex-cover C is an
independent-set of G. - Proof Two vertices outside a vertex-cover cannot
be connected by an edge. ?
11VC - Approximation Algorithm
COR(B) 523-524
- C ? ?
- E ? E
- while E ? ?
- do let (u,v) be an arbitrary edge of E
- C ? C ? u,v
- remove from E every edge incident to either
u or v. - return C.
12Demo
Compare this cover to the one from the example
13Polynomial Time
- C ? ?
- E ? E
- while E ? ? do
- let (u,v) be an arbitrary edge of E
- C ? C ? u,v
- remove from E every edge incident to either u or
v - return C
14Correctness
- The set of vertices our algorithm returns is
clearly a vertex-cover, since we iterate until
every edge is covered.
15How Good an Approximation is it?
Observe the set of edges our algorithm chooses
? any VC contains 1 in each
our VC contains both, hence at most twice as
large
16The Traveling Salesman Problem
17The Mission A Tour Around the World
18The Problem Traveling Costs Money
1795
19Introduction
- Objectives
- To explore the Traveling Salesman Problem.
- Overview
- TSP Formal definition Examples
- TSP is NP-hard
- Approximation algorithm for special cases
- Inapproximability result
20TSP
- Instance a complete weighted undirected graph
G(V,E) (all weights are non-negative). - Problem to find a Hamiltonian cycle of minimal
cost.
3
2
10
1
4
3
5
21Polynomial Algorithm for TSP?
What about the greedy strategy At any point,
choose the closest vertex not explored yet?
22The Greedy trategy Fails
10
?
?
5
12
2
3
?
0
1
23The Greedy trategy Fails
10
?
?
5
12
2
3
?
0
1
24TSP is NP-hard
- The corresponding decision problem
- Instance a complete weighted undirected graph
G(V,E) and a number k. - Problem to find a Hamiltonian path whose cost is
at most k.
25TSP is NP-hard
verify!
- Theorem HAM-CYCLE ?p TSP.
- Proof By the straightforward efficient reduction
illustrated below
0
1
0
0
1
k0
0
HAM-CYCLE
TSP
26What Next?
- Well show an approximation algorithm for TSP,
- which yields a ratio-bound of 2
- for cost functions which satisfy a certain
property.
27The Triangle Inequality
- Definition Well say the cost function c
satisfies the triangle inequality, if - ?u,v,w?V c(u,v)c(v,w)?c(u,w)
28Approximation Algorithm
COR(B) 525-527
- 1. Grow a Minimum Spanning Tree (MST) for G.
- 2. Return the cycle resulting from a preorder
walk on that tree.
29Demonstration and Analysis
The cost of a minimal hamiltonian cycle ? the
cost of a MST
?
30Demonstration and Analysis
The cost of a preorder walk is twice the cost of
the tree
31Demonstration and Analysis
Due to the triangle inequality, the hamiltonian
cycle is not worse.
32What About the General Case?
COR(B) 528
- Well show TSP cannot be approximated within any
constant factor ??1 - By showing the corresponding gap version is
NP-hard.
33gap-TSP?
- Instance a complete weighted undirected graph
G(V,E). - Problem to distinguish between the following two
cases - There exists a hamiltonian cycle, whose cost
is at most V. - The cost of every hamiltonian cycle is more
than ?V.
YES
NO
34Instances
min cost
35What Should an Algorithm for gap-TSP Return?
min cost
DONT-CARE...
36gap-TSP Approximation
- Observation Efficient approximation of factor ?
for TSP implies an efficient algorithm for
gap-TSP?.
37gap-TSP is NP-hard
- Theorem For any constant ??1,
- HAM-CYCLE ?p gap-TSP?.
- Proof Idea Edges from G cost 1. Other edges cost
much more.
38The Reduction Illustrated
1
?V1
1
1
?V1
1
HAM-CYCLE
gap-TSP
Verify (a) correctness (b) efficiency
39Approximating TSP is NP-hard
Approximating TSP within factor ? is NP-hard
40Summary
?
- Weve studied the Traveling Salesman Problem
(TSP). - Weve seen it is NP-hard.
- Nevertheless, when the cost function satisfies
the triangle inequality, there exists an
approximation algorithm with ratio-bound 2.
41Summary
?
- For the general case weve proven there is
probably no efficient approximation algorithm for
TSP. - Moreover, weve demonstrated a generic method for
showing approximation problems are NP-hard.
42SET-COVER
- Instance a finite set X and a family F of
subsets of X, such that - Problem to find a set C?F of minimal size which
covers X, i.e -
43SET-COVER Example
44SET-COVER is NP-Hard
- Proof Observe the corresponding decision
problem. - Clearly, its in NP (Check!).
- Well sketch a reduction from (decision)
VERTEX-COVER to it
45VERTEX-COVER ?p SET-COVER
one element for every edge
one set for every vertex, containing the edges it
covers
46Greedy Algorithm
COR(B) 530-533
- C ? ?
- U ? X
- while U ? ? do
- select S?F that maximizes S?U
- C ? C ? S
- U ? U - S
- return C
47Demonstration
compare to the optimal cover
0
1
2
3
4
5
48Is Being Greedy Worthwhile? How Do We Proceed
From Here?
- We can easily bound the approximation ratio by
logn. - A more careful analysis yields a tight bound of
lnn.
49Loose Ratio-Bound
- Claim If ? cover of size k, then after k
iterations the algorithm covered at least ½ of
the elements.
Suppose it doesnt and observe the situation
after k iterations
50Loose Ratio-Bound
- Claim If ? cover of size k, then after k
iterations the algorithm covered at least ½ of
the elements.
Since this part ? can also be covered by k sets...
gt½
what we covered
51Loose Ratio-Bound
- Claim If ? cover of size k, then after k
iterations the algorithm covered at least ½ of
the elements.
there must be a set not chosen yet, whose size is
at least ½n1/k
gt½
what we covered
52Loose Ratio-Bound
- Claim If ? cover of size k, then after k
iterations the algorithm covered at least ½ of
the elements.
gt½
Thus in each of the k iterations weve covered at
least ½n1/k new elements
what we covered
53Loose Ratio-Bound
- Claim If ? cover of size k, then after k
iterations the algorithm covered at least ½ of
the elements.
Therefore after klogn iterations (i.e - after
choosing klogn sets) all the n elements must be
covered, and the bound is proved.
54Tight Ratio-Bound
- Claim The greedy algorithm approximates the
optimal set-cover within factor - H(max S S?F )
- Where H(d) is the d-th harmonic number
55Tight Ratio-Bound
56Claims Proof
- Whenever the algorithm chooses a set, charge 1.
- Split the cost between all covered vertices.
57Analysis
- That is, we charge every element x?X with
- Where Si is the first set which covers x.
cx
58Lemma
Number of members of S left uncovered after i
iterations
Let k be the smallest index, for which uk0.
?1?i?k Si covers ui-1-ui elements from S
59Lemma
This last observation yields
Our greedy strategy promises Si (1?i?k) covers at
least as many new elements as S.
Since for any 1?i?C we defined ui as
S-(S1?...?Si)...
For any bgta?N, H(b)-H(a)1/(a1)...1/(b)?(b-a)1
/b
This is a telescopic sum
uk0
H(0)0
u0S
60Analysis
- Now we can finally complete our analysis
61Summary
?
- As it turns out, we can sometimes find efficient
approximation algorithms for NP-hard problems. - Weve seen two such algorithms
- for VERTEX-COVER (factor 2)
- for SET-COVER (logarithmic factor).
62The Subset Sum Problem
- Problem definition
- Given a finite set S and a target t, find a
subset S ? S whose elements sum to t - All possible sums
- S x1, x2, .., xn
- Li set of all possible sums of x1, x2, .., xi
- Example
- S 1, 4, 5
- L1 0, 1
- L2 0, 1, 4, 5 L1 ? (L1 x2)
- L3 0, 1, 4, 5, 6, 9, 10 L2 ? (L2 x3)
- Li Li-1 ? (Li-1xi)
63Subset Sum, revisited
- Given a set S of numbers, find a subset S that
adds up to some target number t. - To find the largest possible sum that doesnt
exceed t - T 0
- for each x in S
- T union(T, xT)
- remove elements from T that exceed t
-
- return largest element in T
- (Aside How should we implement T?)
x T adds x to each element in the set T
Potential doubling at each step
Complexity O(2n)
64Trimming
- To reduce the size of the set T at each stage, we
apply a trimming process. - For example, if z and y are consecutive elements
and (1-d)y ? z lt y, then remove z. - If d0.1, 10,11,12,15,20,21,22,23,24,29 ?
10,12,15,20,23,29
65Subset Sum with Trimming
- Incorporate trimming in the previous algorithm
- T 0
- for each x in S
- T union(T, xT)
- T trim(d, T)
- remove elements from T that exceed t
-
- return largest element in T
- Trimming only eliminates values, it doesnt
create new ones. So the final result is still
the sum of a subset of S that is less than t.
0 ? d ? 1/n
66- At each stage, values in the trimmed T are within
a factor somewhere between (1-d) and 1 of the
corresponding values in the untrimmed T. - The final result (after n iterations) is within a
factor somewhere between (1-d)n and 1 of the
result produced by the original algorithm.
67- After trimming, the ratio between successive
elements in T is at least 1/(1-d), and all of the
values are between 0 and t. - Hence the maximum number of elements in T is
log(1/(1-d)) t ? (log t / d). - This is enough to give us a polynomial bound on
the running time of the algorithm.
68Subset Sum Trim
- Want to reduce the size of a list by trimming
- L An original list
- L The list after trimming L
- d trimming parameter, 0..1
- y an element that is removed from L
- z corresponding (representing) element in L
(also in L) - (y-z)/y ? d
- (1-d)y ? z ? y
- Example
- L 10, 11, 12, 15, 20, 21, 22, 23, 24, 29
- d 0.1
- L 10, 12, 15, 20, 23,
29 - 11 is represented by 10. (11-10)/11 ? 0.1
- 21, 22 are represented by 20. (21-20)/21 ? 0.1
- 24 is represented by 23. (24-23)/24 ? 0.1
69Subset Sum Trim (2)
- Trim(L, d) // L y1, y2, .., ym
- L y1
- last y1 // most recent element z in L which
represent elements in L - for i 2 to m do
- if last lt (1-d) yi then // (1-d)y ? z ? y
- // yi is appended into L when it cannot
be represented by last - append yi onto the end of L
- last yi
- return L
- Example
- L 10, 11, 12, 15, 20, 21, 22, 23, 24, 29
- d 0.1
- L 10, 12, 15, 20, 23,
29 - O(m)
70Subset Sum Approximate Algorithm
- Approx_subset_sum(S, t, e) // Sx1,x2,..,xn
- L0 0
- for i 1 to n do
- Li Li-1 ? (Li-1xi)
- Li Trim(Li, e/n)
- Remove elements that are greater than t from
Li - return the largest element in Ln
- Example
- L 104, 102, 201, 101, t308, e0.20, d
e/n0.05 - L0 0
- L1 0, 104
- L2 0, 102, 104, 206
- After trimming 104 L2 0, 102, 206
- L3 0, 102, 201, 206, 303, 407
- After trimming 206 L3 0, 102, 201, 303, 407
- After removing 407 L3 0, 102, 201, 303
- L4 0, 101, 102, 201, 203, 302, 303, 404
- After trimming 102, 203, 303 L4 0, 101, 201,
302, 404 - After removing 404 L4 0, 101, 201, 302
71Subset Sum - Correctness
- The approximation solution C is not smaller than
(1-e) times of an optimal solution C - i.e., C(1-e) ? C
- Proof
- for every element y in L there is a z in L such
that - (1-e/n)y ? z ? y
- for every element y in Li there is a z in Li
such that - (1-e/n)i y ? z ? y
- If y is an optimal solution in Ln, then there is
a corresponding z in Ln - (1-e/n)n y ? z ? y
- Since (1-e) lt (1-e/n)n (1-e/n)n is increasing
- (1-e) y ? (1-e/n)n y ? z
- (1-e) y ? z
- So the value z returned is not smaller than 1-e
times the optimal solution y
72Subset Sum Correctness (2)
- The approximation algorithm is fully polynomial
- Proof
- Successive elements z and z in Li must have the
relationship - z/z 1/(1-e/n)
- i,e, they differ by a factor of at least
1/(1-e/n) - The number of elements in each Li is at most
- log 1/(1-e/n) t t is the largest
- ln t / (-ln(1-e/n))
- ? (ln t) / (-(-e/n)) Eq. 2.10 x/(1x) ?
ln(1x) ? x, for x gt -1 - ? (n ln t) / e
- So the length of Li is polynomial
- So the running time of the algorithm is polynomial
73Summary
- Not all problems are computable.
- Some problems can be solved in polynomial time
(P). - Some problems can be verified in polynomial time
(NP). - Nobody knows whether PNP.
- But the existence of NP-complete problems is
often taken as an indication that P?NP. - In the meantime, we use approximation to find
good-enough solutions to hard problems.
74Whats Next?
?
- But where can we draw the line?
- Does every NP-hard problem have an approximation?
- And to within which factor?
- Can approximation be NP-hard as well?