Costeffective Outbreak Detection in Networks

About This Presentation

Title:

Costeffective Outbreak Detection in Networks

Description:

Has nice properties w.r.t optimization (like convex function, we ... Nice. ... But it has a nice theorem that guarantee lower bounding for greedy algorithm (so ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 27

Provided by: tyy

Category:

more less

Transcript and Presenter's Notes

Title: Costeffective Outbreak Detection in Networks

1
Cost-effective Outbreak Detection in Networks

(Leskovec, Krause, Guestrin, Faloutsos,
VanBriesen, Grance, KDD 2007)

2
What this paper is about

Task Given a network, which node should we
monitor to optimally catch outbreak of important
events while minmizing the cost?
Approach Greedily, by leveraging submodularity
property
This is a set optimization problem. Many
realistic problem (e.g., sampling strategy in AL)
can be reduced to this paradigm

Example Blogosphere

4
Key Idea in this paper

The paper demonstrate the steps in solving the
problem using Submodularity optimization
Show the objectives are submodular
Devise greedy algorithm (submodularity give a
good lower bound)
Submodularity also give bound to special cases
(non-uniform cost, on-line selection)
Submodularity allow Efficient algorithm

5
Submodularity

What is submodularity?
A property of set function.
Informally - If a function has the diminishing
returns property, it is submodular
Law of diminishing return - Adding an element to
a smaller set returns bigger utility than adding
it to a larger set
Has nice properties w.r.t optimization (like
convex function, we will see later).

6
Submodularity

Formally - A set function with the following
property

Example Blogosphere

What if your objective is to read the most
effective set of blogs. Is this problem
submodular?
8
Submodularity

Why this is important in this context?
Turned out that many realistic objectives in
outbreak detection is submodular (I.e., exhibit a
diminishing return property)
Many other optimization problems are also
submodular ( See tutorial from Select lab
http//www.select.cs.cmu.edu/tutorials/icml08submo
dularity.html )

9
Submodularity

Examples
A lot of machine learning algorithms !
Set cover
Forward feature selection
Mutual Information
Factorization in structure learning
And there is nice connection to convexity
anyone?

10
What to Optimize here?

One can think of many things
Fraction of events detected by the certain
placement of sensors (Detection Likelihood)
Time passed from outbreak till detection
(Detection Time)
Number of node touched by the event before the
detection (Population Affected)
Any of them, if expressed as Expected Penalty,
would satisfy submodularity property.

11
What is the trick?

(how to elicit submodular function?)
Any of them, if expressed as Expected Penalty,
can satisfy submodularity.
So, our next task is to express those objectives
in such a way.

12
Objectives

Present objective as Maximize Expected Penalty
Reduction (how much saving in penalty you can get
by betting on set A)

13
Property of the objective

Is this Submodular?
Do they meet the criteria?
R(0) 0
R is non-decreasing
Marginal gain diminishes as the set get bigger.
Hence, this function is Submodular

14
What this means?

In uniform cost case (all node has the same
placement cost), a greedy algorithm is only
constant factor away from the optimal solution.
Nice.
Notice that objective function have to be
expressed in terms of expected penalty in order
to be submodular.

By the way, Submodular function is in general
NP-hard.
But it has a nice theorem that guarantee lower
bounding for greedy algorithm (so go greedy,)

16
Greedy Algorithm

The authors proposed the following greedy
algorithm for uniform cost case
Start with A , (null set)
At step k, adds the node sk which maximizes the
marginal gain, stop when Budge max out.

17
Non-uniform case CEF

Non-uniform cost case is also bounded

18
Computing the online bound

You can also compute the tighter upper bound for
submodular function
(This is useful for evaluation)

19
Efficient Algorithm - CELF
20
Efficient Algorithm - CELF
21
Multicriterion Optimization

If there no set A s.t. Ri(A) Ri(A) for all i
and strictly better for at least one event, j
(Rj(A) Rj(A) for some j) te solution is called
pareto-optimal
For some positive weight ? consider objective
R(A) ? ?i Ri(A). Any solution maximizing R(A)
is guaranteed to be Pereto-optimal
Under submodularity assumption, this objective
also is submodular. Nice.

22
Experiments

45K blogs, 1M links, 17,589 information cascades
True optimal somewhere between green and blue
13.8 away from the optimal (much closer than
the guaranteed bound)
Note this is the evaluation on the training data
- this is the utility score ( penalty reduction)
based on the labeled data.

23
Experiments

Cost sensitive algorithm achieve the same level
of utility score at 1500 posts, as opposed to
10,710 posts.
Under non-uniform cost, summarization type
blogs score much higher than large one.

24
Experiments