Title: Approximation Algorithms
1Approximation Algorithms
- Load Balancing
- k-center selection
- Pricing Method
- Vertex Cover
- Set Cover
- Bin Packing
- TSP
2Approximation Algorithms
- Q. Suppose I need to solve an NP-hard problem.
What should I do? - A. Theory says you're unlikely to find a
poly-time algorithm. - Must sacrifice one of three desired features.
- Solve problem to optimality.
- Solve problem in poly-time.
- Solve arbitrary instances of the problem.
- ?-approximation algorithm.
- Guaranteed to run in poly-time.
- Guaranteed to solve arbitrary instance of the
problem - Guaranteed to find solution within ratio ? of
true optimum. - Challenge. Need to prove a solution's value is
close to optimum, without even knowing what
optimum value is!
311.1 Load Balancing
4Load Balancing
- Input. m identical machines n jobs, job j has
processing time tj. - Job j must run contiguously on one machine.
- A machine can process at most one job at a time.
- Def. Let J(i) be the subset of jobs assigned to
machine i. The - load of machine i is Li ?j ? J(i) tj.
- Def. The makespan is the maximum load on any
machine L maxi Li. - Load balancing. Assign each job to a machine to
minimize makespan.
5Load Balancing List Scheduling
- List-scheduling algorithm.
- Consider n jobs in some fixed order.
- Assign job j to machine whose load is smallest so
far. - Implementation. O(n log n) using a priority
queue.
List-Scheduling(m, n, t1,t2,,tn) for i 1
to m Li ? 0 J(i) ? ? for j
1 to n i argmink Lk J(i) ? J(i)
? j Li ? Li tj
load on machine i
jobs assigned to machine i
machine i has smallest load
assign job j to machine i
update load of machine i
6Load Balancing List Scheduling Analysis
- Theorem. Graham, 1966 Greedy algorithm is a
2-approximation. - First worst-case analysis of an approximation
algorithm. - Need to compare resulting solution with optimal
makespan L. - Lemma 1. The optimal makespan L ? maxj tj.
- Pf. Some machine must process the most
time-consuming job. ? - Lemma 2. The optimal makespan
- Pf.
- The total processing time is ?j tj .
- One of m machines must do at least a 1/m fraction
of total work. ?
7Load Balancing List Scheduling Analysis
- Theorem. Greedy algorithm is a 2-approximation.
- Pf. Consider load Li of bottleneck machine i.
- Let j be last job scheduled on machine i.
- When job j assigned to machine i, i had smallest
load. Its load before assignment is Li - tj ?
Li - tj ? Lk for all 1 ? k ? m.
blue jobs scheduled before j
machine i
j
0
L Li
Li - tj
8Load Balancing List Scheduling Analysis
- Theorem. Greedy algorithm is a 2-approximation.
- Pf. Consider load Li of bottleneck machine i.
- Let j be last job scheduled on machine i.
- When job j assigned to machine i, i had smallest
load. Its load before assignment is Li - tj ?
Li - tj ? Lk for all 1 ? k ? m. - Sum inequalities over all k and divide by m
- (correct the second eqn. to j)
- Now ?
Lemma 1
Lemma 2
9Load Balancing List Scheduling Analysis
- Q. Is our analysis tight?
- A. Essentially yes.
- Ex m machines, m(m-1) jobs length 1 jobs, one
job of length m
machine 2 idle
machine 3 idle
machine 4 idle
machine 5 idle
m 10
machine 6 idle
machine 7 idle
machine 8 idle
machine 9 idle
machine 10 idle
list scheduling makespan 19
10Load Balancing List Scheduling Analysis
- Q. Is our analysis tight?
- A. Essentially yes.
- Ex m machines, m(m-1) jobs length 1 jobs, one
job of length m
m 10
optimal makespan 10
11Load Balancing LPT Rule
- Longest processing time (LPT). Sort n jobs in
descending order of processing time, and then run
list scheduling algorithm.
LPT-List-Scheduling(m, n, t1,t2,,tn) Sort
jobs so that t1 t2 tn for i 1 to
m Li ? 0 J(i) ? ? for j
1 to n i argmink Lk J(i) ? J(i) ?
j Li ? Li tj
load on machine i
jobs assigned to machine i
machine i has smallest load
assign job j to machine i
update load of machine i
12Load Balancing LPT Rule
- Observation. If at most m jobs, then
list-scheduling is optimal. - Pf. Each job put on its own machine. ?
- Lemma 3. If there are more than m jobs, L ? 2
tm1. - Pf.
- Consider first m1 jobs t1, , tm1.
- Since the ti's are in descending order, each
takes at least tm1 time. - There are m1 jobs and m machines, so by
pigeonhole principle, at least one machine gets
two jobs. ? - tj lt t(m1) lt ½ L
- Theorem. LPT rule is a 3/2 approximation
algorithm. - Pf. Same basic approach as for list scheduling.
- ?
Lemma 3( by observation, can assume number of
jobs gt m )
13Load Balancing LPT Rule
- Q. Is our 3/2 analysis tight?
- A. No.
- Theorem. Graham, 1969 LPT rule is a
4/3-approximation. - Pf. More sophisticated analysis of same
algorithm. - Q. Is Graham's 4/3 analysis tight?
- A. Essentially yes.
- Ex m machines, n 2m1 jobs, 2 jobs of length
m1, m2, , 2m-1 and one job of length m.
1411.2 Center Selection
15Center Selection Problem
- Input. Set of n sites s1, , sn.
- Center selection problem. Select k centers C so
that maximum distance from a site to nearest
center is minimized.
k 4
site
16Center Selection Problem
- Input. Set of n sites s1, , sn.
- Center selection problem. Select k centers C so
that maximum distance from a site to nearest
center is minimized. - Notation.
- dist(x, y) distance between x and y.
- dist(si, C) min c ? C dist(si, c) distance
from si to closest center. - r(C) maxi dist(si, C) smallest covering
radius. - Goal. Find set of centers C that minimizes r(C),
subject to C k. - Distance function properties.
- dist(x, x) 0 (identity)
- dist(x, y) dist(y, x) (symmetry)
- dist(x, y) ? dist(x, z) dist(z, y) (triangle
inequality)
17Center Selection Example
- Ex each site is a point in the plane, a center
can be any point in the plane, dist(x, y)
Euclidean distance. - Remark search can be infinite!
r(C)
center
site
18Greedy Algorithm A False Start
- Greedy algorithm. Put the first center at the
best possible location for a single center, and
then keep adding centers so as to reduce the
covering radius each time by as much as possible.
- Remark arbitrarily bad!
greedy center 1
center
k 2 centers
site
19Center Selection Greedy Algorithm
- Greedy algorithm. Repeatedly choose the next
center to be the site farthest from any existing
center. - Observation. Upon termination all centers in C
are pairwise at least r(C) apart. - Pf. By construction of algorithm.
Greedy-Center-Selection(k, n, s1,s2,,sn) C
? repeat k times Select a site si
with maximum dist(si, C) Add si to C
return C
site farthest from any center
20Center Selection Analysis of Greedy Algorithm
- Theorem. Let C be an optimal set of centers.
Then r(C) ? 2r(C). - Pf. (by contradiction) Assume r(C) lt ½ r(C).
- For each site ci in C, consider ball of radius ½
r(C) around it. - Exactly one ci in each ball let ci be the site
paired with ci. - Consider any site s and its closest center ci in
C. - dist(s, C) ? dist(s, ci) ? dist(s, ci)
dist(ci, ci) ? 2r(C). - Thus r(C) ? 2r(C). ?
?-inequality
? r(C) since ci is closest center
½ r(C)
½ r(C)
ci
½ r(C)
C
ci
sites
s
21Center Selection
- Theorem. Let C be an optimal set of centers.
Then r(C) ? 2r(C). - Theorem. Greedy algorithm is a 2-approximation
for center selection problem. - Remark. Greedy algorithm always places centers
at sites, but is still within a factor of 2 of
best solution that is allowed to place centers
anywhere. - Question. Is there hope of a 3/2-approximation?
4/3?
e.g., points in the plane
Theorem. Unless P NP, there no ?-approximation
for center-selectionproblem for any ? lt 2.
2211.4 The Pricing Method Vertex Cover
23Weighted Vertex Cover
- Weighted vertex cover. Given a graph G with
vertex weights, find a vertex cover of minimum
weight.
4
2
4
2
9
2
9
2
weight 9
weight 2 2 4
24Weighted Vertex Cover
- Pricing method. Each edge must be covered by
some vertex i. Edge e pays price pe ? 0 to use
vertex i. - Fairness. Edges incident to vertex i should pay
? wi in total. - Claim. For any vertex cover S and any fair
prices pe ?e pe ? w(S). - Proof. ?
4
2
9
2
sum fairness inequalitiesfor each node in S
each edge e covered byat least one node in S
25Pricing Method
- Pricing method. Set prices and find vertex cover
simultaneously.
Weighted-Vertex-Cover-Approx(G, w) foreach e
in E pe 0 while (? edge i-j such that
neither i nor j are tight) select such an
edge e increase pe without violating
fairness S ? set of all tight nodes
return S
26Pricing Method
price of edge a-b
vertex weight
Figure 11.8
27Pricing Method Analysis
- Theorem. Pricing method is a 2-approximation.
- Pf.
- Algorithm terminates since at least one new node
becomes tight after each iteration of while loop. - Let S set of all tight nodes upon termination
of algorithm. S is a vertex cover if some edge
i-j is uncovered, then neither i nor j is tight.
But then while loop would not terminate. - Let S be optimal vertex cover. We show w(S) ?
2w(S).
all nodes in S are tight
S ? V,prices ? 0
fairness lemma
each edge counted twice
28Extra Slides
29Load Balancing on 2 Machines
- Claim. Load balancing is hard even if only 2
machines. - Pf. NUMBER-PARTITIONING ? P LOAD-BALANCE.
NP-complete by Exercise 8.26
a
d
b
c
f
g
e
length of job f
Machine 1
a
d
f
machine 1
yes
Machine 2
b
c
e
g
machine 2
Time
L
0
30Center Selection Hardness of Approximation
- Theorem. Unless P NP, there is no
?-approximation algorithm formetric k-center
problem for any ? lt 2. - Pf. We show how we could use a (2 - ?)
approximation algorithm for k-center to solve
DOMINATING-SET in poly-time. - Let G (V, E), k be an instance of
DOMINATING-SET. - Construct instance G' of k-center with sites V
and distances - d(u, v) 2 if (u, v) ? E
- d(u, v) 1 if (u, v) ? E
- Note that G' satisfies the triangle inequality.
- Claim G has dominating set of size k iff there
exists k centers C with r(C) 1. - Thus, if G has a dominating set of size k, a (2 -
?)-approximation algorithm on G' must find a
solution C with r(C) 1 since it cannot use
any edge of distance 2.
see Exercise 8.29
31Vertex Cover Approximation
- A vertex cover is a subset of vertices such that
every edge in the graph is incident to at least
one of these vertices. - The vertex cover optimization problem is to ?nd a
vertex cover of minimum size. - For a good strategy, a heuristic is needed
32Vertex Cover
- Consider an arbitrary edge (u, v) in the graph.
One of its two vertices must be in the cover, but
we do not know which one. - The idea of this heuristic is to simply put both
vertices into the vertex cover. - Then we remove all edges that are incident to u
and v (since they are now all covered), and
recurse on the remaining edges. - For every one vertex that must be in the cover,
we put two into our cover, so it is easy to see
that the cover we generate is at most twice the
size of the optimum cover.
33Proof of aprroximation ratio
- Claim approx VC yields a factor-2 approximation
- Proof Consider the set C output by ApproxVC. Let
C be the optimum VC. Let A be the set of edges
selected by the line marked with () in the
?gure. Observe that the size of C is exactly
2Abecause we add two vertices for each such
edge. However note that in the optimum VC one of
these two vertices must have been added to the
VC, and thus the size of C is at least A. Thus
we have - C
- ---- A lt C
- 2
- Therefore
- C
- ---- lt 2
- C
34Example
35Approximate VC Algorithm Naive Approach
- ApproxVC
- C empty-set
- while (E is nonempty) do
- () let (u,v) be any edge of E
- add both u and v to C
- remove from E all edges incident to either u or
v -
- return C
-
- Can we improve on it ?
- Why not consider vertices with higher degrees
first (Greedy Strategy)
36Greedy VC
- Greedy Approximation for VC GreedyVC(G(V,E))
- C empty-set
- while (E is nonempty) do
- let u be the vertex of maximum degree in G
- add u to C
- remove from E all edges incident to u
-
- return C
- For the example, it yields the optimum solution
37Greedy VC Example
- Can we prove Greedy VC outperforms the other one
? - NO !
- It can even perform poorly than it.
- However, it should also be pointed out that the
vertex cover constructed by the greedy heuristic
is (for typical graphs) smaller than that one
computed by the 2-for-1 heuristic, so it would
probably be wise to run both algorithms and take
the better of the two.
38Third Attempt Use Matching
- A matching is a subset of edges that have no
vertices in common - A matching is maximal if no more edges can be
added to it. - Maximal matchings will help us ?nd good vertex
covers, and moreover, they are easy to generate
repeatedly pick edges that are disjoint from the
ones chosen already, until this is no longer
possible. - Any vertex cover of a graph G must be at least as
large as the number of edges in any matching in
G that is, any matching provides a lower bound
on OPT. This is simply because each edge of the
matching must be covered by one of its endpoints
in any vertex cover!
39Example
- Figure below shows how to convert from Maximal
Matching to Vertex Cover - a) A matching b) Completion to MaxMatch c)
Its VC
40Vertex Cover from Matching
- let S be a set that contains both endpoints of
each edge in a maximal matching M. -
- Then S must be a vertex coverif it isnt, that
is, if it doesnt touch some edge e ? E, then M
could not possibly be maximal since we could
still add e to it. But our cover S has 2M
vertices - We know that any vertex cover must have size at
least M. - Algorithm
- Find a maximal matching M 8 E
- Return S all endpoints of edges in M
41Vertex cover from Matching
- This simple procedure always returns a vertex
cover whose size is at most twice optimal! - In summary, even though we have no way of ?nding
the best vertex cover, we can easily ?nd another
structure, a maximal matching, with two key
properties -
- 1. Its size gives us a lower bound on the
optimal vertex cover. - 2. It can be used to build a vertex cover,
whose size can be related to that of the optimal
cover using property 1. - Alpha lt 2
42Set Cover Problem Revisited
- Given a pair (X,F) where X x1,x2,...,xm is a
?nite set (a domain of elements) and F
S1,S2,...,Sn is a family of subsets of X, such
that every element of X belongs to at least one
set of F. -
- For C ? F. (This is a collection of sets over X.)
We say that C covers the domain if every element
of X is in some set of C - The problem is to ?nd the minimum-sized subset C
of F that covers X.
43Set Cover
- Vertex Cover is a type of set cover problem. The
domain to be covered are the edges, and each
vertex covers the subset of incident edges. - Decision-problem formulation of set cover (does
there exist a set cover of size at most k?) is
NP-complete -
- There is a factor-2 approximation for the vertex
cover problem, but it cannot be applied to
generate a factor2 approximation for set cover.
44Set Cover
- It is known that there is no constant factor
approximation to the set cover problem - There is however the greedy heuristic, which
achieves an approximation bound of ln m, where m
X, the size of the underlying domain, we will
leave the proof. -
- A simple greedy approach to set cover works by at
each stage selecting the set that covers the
greatest number of uncovered elements
45Set Cover The Approx. Algorithm
- Greedy-Set-Cover(X, F)
- U X // U are the items to be
covered - C empty // C will be the sets in the
cover - while (U is nonempty) // there is someone left
to cover - select S in F that covers the most elements of U
- addS to C
- UU-S
-
- return C
-
46Set Cover Bad Example
- The optimal set cover consists of sets S5 and S6,
each of size 16. Initially all three sets S1, S5,
and S6 have 16 elements. If ties are broken in
the worst possible way, the greedy algorithm will
first select set S1. We remove all the covered
elements. Now S2, S5 and S6 all cover 8 of the
remaining elements. Again, if we choose poorly,
S2 is chosen. The pattern repeats, choosing S3
(size 4), S4 (size 2) and finally S5 and S6 (each
of size 1).
47Bin Packing
- Bin packing is another well-known NP-complete
problem, which is a variant of the knapsack
problem - Given a set of n objects, where si denotes the
size of the ith object (0 lt si lt 1. for
simplification) , put objects into bins - Size of a bin is 1 at max.
- Use fewest bins as possible
- Ex Fit object sto a truck etc.
48Bin Packing Example
49Bin Packing Approximation Factor
- Theorem The ?rst-?t heuristic achieves a ratio
bound of 2. - Proof Consider an instance s1,...,sn of the
bin packing problem. Let S S i si denote the sum
of all the object sizes. Let b denote the
optimal number of bins, and bff denote the number
of bins used by ?rst-?t. - b gt S since no bin can hold a total capacity
of more than 1 unit, and even if we were to fill
each bin exactly to its capacity, we would need
at least S bins
50Bin Packing Analysis
- We claim that bff lt 2S.
- To see this, let ti denote the total size of the
objects that first-fit puts into bin i. - Consider bins i and i 1 filled by first-fit.
Assume that indexing is cyclical, so if i is the
last index (i bff ) then i1 1. - We claim that ti ti 1 gt 1. If not, then the
contents of bins i and i 1 could both be put
into the same bin, and hence first-fit would
never have started to fill the second bin,
preferring to keep everything in the first bin.
Thus we have - bff
- Si1 ( ti ti1 ) gt bff
51Bin Packing Analysis
- But this sum adds up all the elements twice, so
it has a total value of 2S. Thus we have 2S gt
bff . - Combining this with the fact that b gt S we
have - bff lt 2S lt 2b showing bff /b lt 2 as
required - best fit attempts put the object into the bin
in which it fits most closely with the available
space (approx. ratio 17/10) - first fit decreasing, in which the objects are
first sorted - in decreasing order of size (approx. ratio
11/9)
52Traveling Salesman Problem (TSP)
- In the TSP, given a complete undirected graph
with nonnegative edge weights, - Find a cycle that visits all vertices and is of
minimum - cost. (NP-Complete)
- Distances should satisfy the triangle
inequality - for all u, v, w ? c(u, w) lt c(u, v)c(v, w)
- (c(u,v) cost on edge uv or cost of shortest
path) - There is an approx. Algorithm forTSP with a
ratio of 2 (the tour that it produces cannot be
worse than twice the cost of the optimal tour)
53TSP Observations
- A TSP with one edge removed is a spanning tree
(not necessarily a minimum spanning tree) - Therefore, the cost of the minimum TSP tour is at
least as large as the cost of the MST. - MST can be computed ef?ciently, using, for
example, either Kruskals or Prims algorithm - If we can ?nd some way to convert the MST into a
TSP tour while increasing its cost by at most a
constant factor, then we will have an
approximation for TSP. - We will see that if the edge weights satisfy the
triangle inequality, then this is possible. -
54TSP ? MST
- Given any free tree there is a tour of the tree
called a twice around tour that traverses the
edges of the tree twice, once in each direction.
The ?gure below shows an example. - MST twice round tour Short Cut
Optimal TSP
55TSP
- This path is not simple because it revisits
vertices, but we can make it simple by
short-cutting, that is, we skip over previously
visited vertices - the ?nal order in which vertices are visited
using the short-cuts is exactly the same as a
preorder traversal of the MST - The triangle inequality assures us that the path
length will not increase when we take short-cuts.
56Approximate Algorithm for TSP
- ApproxTSP(G(V,E))
- T minimum spanning tree for G
- r any vertex
- L list of vertices visited by a preorder walk
of T - starting with r
- return L
57Approx.TSP Analysis
- Claim Approx-TSP has a ratio bound of 2.
- Proof Let H denote the tour produced by this
algorithm and let H be the optimum tour. Let T
be the minimum spanning tree. - We can remove any edge of H resulting in a
spanning tree, and since T is the minimum cost
spanning tree we have - c(T) lt
c(H). - Twice around tour of T has cost 2c(T), since
every edge in T is hit twice. By the triangle
inequality, when we short-cut an edge of T to
form H we do not increase the cost of the tour,
and so we have -
c(H) lt 2c(T). - Combining these we have
- c(H) /2 lt c(T) lt c(H) ,
therefore - c(H) / c(H) lt 2.
58Graph Partitioning
- Input An undirected graph G (VE) with
nonnegative edge weights a real number a ? (0,
1/2. - Output A partition of the vertices into two
groups A and B, each of size at least a V - Goal Minimize the capacity of the cut (A,B).
- Applications from circuit layout to program
analysis to image segmentation. - Graph Partitioning is NP Hard
- Removing the restriction on the sizes of A and B
would give the MINIMUM CUT problem, which we know
to be efficiently solvable using flow techniques.
59Acknowledgements
- The last few algorithms are dependent on David
Mountc 451 Course, University of Waterloo