Randomized Approximation Algorithms for - PowerPoint PPT Presentation

About This Presentation

Title:

Randomized Approximation Algorithms for

Description:

some constant 0 1) (Raz and Safra 1997) - lower bound can be generalized in terms of ... B. C. unknown. initially unknown, but can be queried. columns are ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 56

Provided by: bhas7

Learn more at: https://www.cs.uic.edu

Category:

more less

Transcript and Presenter's Notes

Title: Randomized Approximation Algorithms for

1

Randomized Approximation Algorithms for
Offline and Online Set Multicover Problems
Bhaskar DasGupta
Department of Computer Science
Univ of IL at Chicago
dasgupta_at_cs.uic.edu
Joint works with Piotr Berman (Penn State) and
Eduardo Sontag (Rutgers)
collection of results that appeared in
APPROX-2004, WADS-2005 and to appear in Discrete
Applied Math (special issue on computational
biology)
Supported by NSF grants CCR-0206795,
CCR-0208749 and a CAREER award IIS-0346973

More interesting title for the theoretical
computer science community
Randomized Approximation Algorithms for
Set Multicover Problems
with Applications to
Reverse Engineering of Protein and Gene Networks

More interesting title for the biological
community
Randomized Approximation Algorithms for
Set Multicover Problems
with Applications to
Reverse Engineering of Protein and Gene Networks

Set k-multicover (SCk)
Input Universe U1,2,?,n, sets S1,S2,?,Sm ? U,
integer (coverage factor) k?1
Valid Solution cover every element of universe
?k times
subset of indices I ? 1,2,?,m such that
?x?U j?I x?Sj ? k
Objective minimize number of picked sets I
k1 ? simply called (unweighted) set-cover
a well-studied problem
Special case of interest in our applications
k is large, e.g., kn-1

5
(maximum size of any set)

Known positive results
Set-cover (k1)
can approximate with approx. ratio of 1ln a
(determinstic or randomized)
Johnson 1974, Chvátal 1979, Lovász 1975
Set-multicover (kgt1)
same holds for k?1
e.g., primal-dual fitting Rajagopalan and
Vazirani 1999

Known negative results for setcover (i.e., k1)
- (modulo NP ? DTIME(nloglog n))
approx ratio better than (1-?)ln n is not
possible for any constant 0???1 (Feige
1998)
- (modulo NP?P)
better than (1-?)ln n not possible for
some constant 0???1) (Raz and Safra 1997)
- lower bound can be generalized in terms of
set size a
better than ln a-O(ln ln a) is not
possible
(Trevisan, 2001)

r(a,k) approx. ratio of an algorithm as function
of a,k
We know that for greedy algorithm r(a,k) ? 1ln a
at every step select set that contains maximum
number of elements not covered k times yet
Can we design algorithm such that r(a,k)
decreases with increasing k ?
possible approaches
improved analysis of greedy?
randomized approach (LP rounding) ?
?

Our results (very roughly)
n number of elements of universe U
k number of times each element must be covered
a maximum size of any set
Greedy would not do any better
r(a,k)?(log n) even if k is large, e.g, kn
But can design randomized algorithm based on
LProunding approach such that the expected
approx. ratio is better
Er(a,k) ? max2o(1), ln(a/k) (as appears in
conference proceedings)
? (further
improvement (via comments from Feige))
? max1o(1), ln(a/k)

More precise bounds on Er(a,k)
1ln a if
k1
(1e-(k-1)/5) ln(a/(k-1)) if
a/(k-1) ? e2 ?7.4 and kgt1
min22e-(k-1)/5,20.46 a/k if ¼ ? a/(k-1) ?
e2 and kgt1
12(a/k)½ if
a/(k-1) ? ¼ and kgt1

Er(a,k)
10

Can Er(a,k) coverge to 1 at a much faster rate?
Probably not...for example, problem can be shown
to be APX-hard for a/k ? 1
Can we prove matching lower bounds of the form
max 1o(1) , 1ln(a/k) ?
Do not know...

How about the weighted case?
each set has arbitrary positive weight
minimize sum of weights of selected sets
It seems that the multi-cover version may not be
much easier than the single-cover version
take single-cover instance
add few new elements and new must-select sets
with almost-zero weights that covers original
elements
k-1 times and all new elements k times

Our randomized algorithm
Standard LP-relaxation for set multicover (SCk)
selection variable xi for each set Si (1 ? i ?
m)
minimize
subject to

0 ? xi ? 1 for all i
13

Our randomized algorithm
Solve the LP-relaxation
Select a scaling factor ? carefully
ln a if k1
ln (a/(k-1)) if a/(k-1)?e2 and k?1
2 if ¼?a/(k-1)?e2 and
k?1
1(a/k)½ otherwise
Deterministic rounding select Si if ?xi?1
C0 Si ?xi?1
Randomized rounding select Si?S1,?,Sm\C0 with
prob. ?xi
C1 collection of such selected sets
Greedy choice if an element u?U is covered less
than k
times, pick sets from S1,?,Sm\(C0 ?C1)
arbitrarily

Most non-trivial part of the analysis involved
proving the following bound for Er(a,k)
Er(a,k) ? (1e-(k-1)/5) ln(a/(k-1)) if
a/(k-1) ? e2 and kgt1
Needed to do an amortized analysis of the
interaction between the deterministic and
randomized rounding steps with the greedy step.
For tight analysis, the standard Chernoff bounds
were not always sufficient and hence needed to
devise more appropriate bounds for certain
parameter ranges.

Proof of the simplest of the bounds
Er(a,k) ? 12(a/k)½ if a/k ? ¼
Notational simplification
a (a/k)½ 2
thus, ß 1(1/a)
need to show that Er(a,k) ? 1(2/a)
(x1, x2, ...,xn) is the solution vector for the
LP
thus, OPT
Also, obviously, OPT (n k)/a n a2

Focus on a single element j?U
Remember the algorithm
Deterministic rounding select Si if ?xi?1
C0 Si ?xi?1
Let C0,j those sets in C0 that contained j
Randomized rounding select Si?S1,?,Sm\C0 with
prob. ?xi
C1 collection of such selected sets
Let C1,j those sets in C1 that contained
j
p sum of prob. of those sets that
contained j
Greedy choice if an element j?U is covered less
than k times, pick sets from S1,?,Sm\(C0 ?C1)
that contains j arbitrarily let C2 be all such
sets selected
Let C2,j be those sets in C2 that
contained j

What is E C0 C1 ?
Obvious.
E C0C1 ß( ) ? (1a-1).OPT
( no set is both in C0 and C1 )

What is E C2,j ?
Suppose that C0,jk-f for some f
S1, S2, ...,Sk-f,Sk-f1,....Sk-f
?

C0,j
?f?, say
and xj ? 1 for any j imply
19

(Focus on a single element j?U)
Goal is to
first determine E C0 C1
then determine
E C2,j
sum it up over all j to get E C2
finallly determine E C0 C1 C2

What is E C2,j ? (contd.)

C1,j f-C2,j and thus after some algebra
21

What is E C2,j ? (contd.)

22
(No Transcript)
23

One application
We used the randomized algorithm for robust
string barcoding
Check the publications in the software webpage
http//dna.engr.uconn.edu/software/barcode/
(joint project with Kishori Konwar, Ion Mandoiu
and Alex Shvartsman at Univ. of Connecticut)

Another (the original) motivation for looking at
set-multicover
Reverse engineering of biological networks

25
Biological problem via Differential Equations
Linear Algebraic formulation
Set-multicover formulation
Randomized Algorithm
Selection of appropriate biological experiments
Biological Motivation
26
Biological problem via Differential Equations
Linear Algebraic formulation
Set multicover formulation
Randomized Algorithm
Selection of appropriate biological experiments
Biological Motivation
27
n
1
m
m
1
1
1
1
1
Ai

Bj
n
n
n

A
B
C

unknown

initially unknown,
but can be queried
columns are linearly
independent

0 ?
0 ?
Get Zero structure of jth column Cj
Query jth column Bj
0 ?
0 ?
28
1
m
m
n
1
1
B1
B0
B2
B4
B3
1
1
0 2 0 1 3 4 1 2 0 0 0 0
5 0 1
1

3 37 1 10
4 5 52 2 16
0 0 -5 0 -1

-1 1 3
-1 4
0 0 -1

x
n
n
n
B
C
A
(columns are in general position)
B2
0 ?0 0 ?0 0 ?0 ?0 ?0 0 0 0 0 ?0
0 ?0
? ? ? ? ? ? ? ? ?
37 52 -5
what is B2 ?
C0 zero structure of C known
unknown
initially unknown but can query columns
29

Rough objective obtain as much information about
A performing as few queries as possible
Obviously, the best we can hope is to identify A
upto scaling

30
n
1
B1
B0
B2
B4
B3
1
1
1

3 37 1 10
4 5 52 2 16
0 0 -5 0 -1

0 ?0 0 ?0 0 ?0 ?0 ?0 0 0 0 0 ?0
0 ?0
? ? ? ? ? ? ? ? ?

x
n
n
n
B
A
C0
J1? 2 n-1
37 52 -5
10 16 -1
0 0 ?0 0
?0 ?0
can be recovered (upto scaling)
A
31

Suppose we query columns Bj for j?J j1,?, jl
Let Jij j?J and cij0
Suppose Ji ? n-1.Then,each Ai is uniquely
determined upto a scalar multiple (theoretically
the best possible)
Thus, the combinatorial question is
find J of minimum cardinality such that
Ji ? n-1 for all i

Combinatorial Question
Input sets Ji ? 1,2,,n for 1 ? i ? m
Valid Solution a subset ? ? 1,2,...,m such
that
? 1 ? i ? n J? ??? and i?J? ? n-1
Goal minimize ?
This is the set-multicover problem with coverage
factor n-1
More generally, one can ask for lower coverage
factor, n-k for some k?1, to allow fewer queries
but resulting in ambiguous determination of A

33
Biological problem via Differential Equations
Linear Algebraic formulation
Combinatorial Algorithms (randomized)
Combinatorial formulation
Selection of appropriate biological experiments
34

Time evolution of state variables
(x1(t),x2(t),?,xn(t)) given by a set of
differential equations
?x1/?t f1(x1,x2,?,xn,p1,p2,
?,pm)
?x/?t f(x,p) ? ?
?xn/?t fn(x1,x2,?,xn,p1,p2
,?,pm)
p(p1,p2,?,pm) represents concentration of
certain enzymes
f(x?,p?)0
p? is wild type (i.e. normal) condition of p
x? is corresponding steday-state
condition

Goal
We are interested in obtaining information about
the sign of ?fi/?xj(x?,p?)
e.g., if ?fi/?xj ? 0, then xj has a positive
(catalytic) effect on the formation of xi

Assumption
We do not know f, but do know that certain
parameters pj do not effect certain variables xi
This gives zero structure of matrix C
matrix C0(c0ij) with c0ij0 ? ?fi/?xj0

m experiments
change one parameter, say pk (1 ? k ? m)
for perturbed p ? p?, measure steady state vector
x ?(p)
estimate n sensitivities

where ej is the jth canonical basis vector

consider matrix B (bij)

In practice, perturbation experiment involves
letting the system relax to steady state
measure expression profiles of variables xi
(e.g., using microarrys)

Biology to linear algebra (continued)
Let A be the Jacobian matrix ?f/?x
Let C be the negative of the Jacobian matrix
?f/?p
From f(?(p),p)0, taking derivative with respect
to p and using chain rules, we get CAB.
This gives the linear algebraic formulation of
the problem.

Online Set-multicover

Performance measure
Via competitive ratio
ratio of the total cost of the online algorithm
to that of an optimal offline algorithm that
knows the entire input in advance
For randomized algorithm, we measure the expected
competitive ratio

Parameters of interest
(for performance measure)
frequency m
(maximum number of sets in which any presented
element belongs)
unknown
maximum set size d
(maximum number of presented elements a set
contains)
unknown
total number of elements in the universe n
( d) unknown
coverage factor k
given

Previous result
Alon, Awerbuch, Azar, Buchbinder, and Naor
(STOC 2003 and SODA 2004)
considered k1
both deterministic and randomized algorithms
competitive ratio O(log m log n),
worst-case/expected
almost matching lower bound of

for deterministic algorithms and almost all
parameter values
44

Our improved algorithm
Expected competitive ratio of
O(log m log n)

O(log m log d)
d ? n
log2m ln d lower order term
small precise constants
ratio improves with larger k
c largest weight / smallest weight
45

Even more precise smaller constants for
unweighted k1 case
via improved analysis

Our lower bounds on competitive ratio
(for deterministic algorithms)

unweighted case
weighted case
for many values of parameters
47

Work concurrent to our conference publication
Alon, Azar and Gutner (SPAA 2005)
different version of the online problem (weighted
case)
same element can be presented multiple times
if the same element is presented k times, our
goal is to cover it by at least k different sets
expected competitive ratio O(log m log n)
easy to see that it applies to our version with
same bounds
Conversely,
our algorithm and analysis can be easily adapted
to provide expected competitive ratio of
log2m ln (d/....)
for the above version

Yet another version of online set-cover
Awerbuch, Azar, Fiat, Leighton (STOC 96)
elements presented one at a time
allowed to pick k sets at a given time for a
specified k
goal maximize number of presented elements for
which
at least one set containing the element was
selected before the element was presented
provides efficient radomized approximation
algorithms and matching lower bounds

Our algorithmic approach
Randomized version of the so-called winnowing
approach
(deterministic) winnowing approach was first
used long ago
N. Littlestone, Learning Quickly When Irrelevant
Attributes Abound A New Linear-Threshold
Algorithm, Machine Learning, 2, pp. 285-318,
1988.
this approach was also used by Alon, Awerbuch,
Azar, Buchbinder and Naor in their STOC-2003
paper

Very very rough description of our approach
every set starts with zero probability of
selection
start with an empty solution
when the next element i is presented
if already k sets contain i, terminate
appropriately increase probabilities of all
sets containing i (promotion step of winnowing)
select sets containing i with the above
probabilities
if still k sets not selected, then just select
more sets greedily
select the least-cost set not selected already,
then the next least-cost sets etc.

Many desirable (and, sometimes conflicting goals)
increase in probability of each set should not be
too much
else, e.g., randomized step may select too many
sets
increase in probability of each set should not be
too little
else, e.g., optimal sets may be missed too many
times,
greedy step may dominate too
much
light sets should be preferable over heavy
sets unless heavy sets are in an optimal solution
increase in probability should be somehow
inversely linked to the frequency of i to
eliminate selection of too many sets in the
randomized step

52
(No Transcript)
53

Slightly improved algorithm for unweighted case
(expected competitive ratio has better
constants/asymptotic)
Modify the promotion step slightly

change
to
54

New expected competitive ratio

Motivation for the online version
Similar to before except that
we use fluorescent proteins instead of
microarrays
Fluorescent proteins can be used to know the rate
at which a certain gene transcribes in a cell
under a set of conditions.
a priori matrix C is not known completely but to
be learnt by doing experiments