Costeffective Outbreak Detection in Networks - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Costeffective Outbreak Detection in Networks

Description:

Has nice properties w.r.t optimization (like convex function, we ... Nice. ... But it has a nice theorem that guarantee lower bounding for greedy algorithm (so ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 27
Provided by: tyy
Category:

less

Transcript and Presenter's Notes

Title: Costeffective Outbreak Detection in Networks


1
Cost-effective Outbreak Detection in Networks
  • (Leskovec, Krause, Guestrin, Faloutsos,
    VanBriesen, Grance, KDD 2007)

2
What this paper is about
  • Task Given a network, which node should we
    monitor to optimally catch outbreak of important
    events while minmizing the cost?
  • Approach Greedily, by leveraging submodularity
    property
  • This is a set optimization problem. Many
    realistic problem (e.g., sampling strategy in AL)
    can be reduced to this paradigm

3
  • Example Blogosphere

4
Key Idea in this paper
  • The paper demonstrate the steps in solving the
    problem using Submodularity optimization
  • Show the objectives are submodular
  • Devise greedy algorithm (submodularity give a
    good lower bound)
  • Submodularity also give bound to special cases
    (non-uniform cost, on-line selection)
  • Submodularity allow Efficient algorithm

5
Submodularity
  • What is submodularity?
  • A property of set function.
  • Informally - If a function has the diminishing
    returns property, it is submodular
  • Law of diminishing return - Adding an element to
    a smaller set returns bigger utility than adding
    it to a larger set
  • Has nice properties w.r.t optimization (like
    convex function, we will see later).

6
Submodularity
  • Formally - A set function with the following
    property

7
  • Example Blogosphere

What if your objective is to read the most
effective set of blogs. Is this problem
submodular?
8
Submodularity
  • Why this is important in this context?
  • Turned out that many realistic objectives in
    outbreak detection is submodular (I.e., exhibit a
    diminishing return property)
  • Many other optimization problems are also
    submodular ( See tutorial from Select lab
    http//www.select.cs.cmu.edu/tutorials/icml08submo
    dularity.html )

9
Submodularity
  • Examples
  • A lot of machine learning algorithms !
  • Set cover
  • Forward feature selection
  • Mutual Information
  • Factorization in structure learning
  • And there is nice connection to convexity
    anyone?

10
What to Optimize here?
  • One can think of many things
  • Fraction of events detected by the certain
    placement of sensors (Detection Likelihood)
  • Time passed from outbreak till detection
    (Detection Time)
  • Number of node touched by the event before the
    detection (Population Affected)
  • Any of them, if expressed as Expected Penalty,
    would satisfy submodularity property.

11
What is the trick?
  • (how to elicit submodular function?)
  • Any of them, if expressed as Expected Penalty,
    can satisfy submodularity.
  • So, our next task is to express those objectives
    in such a way.

12
Objectives
  • Present objective as Maximize Expected Penalty
    Reduction (how much saving in penalty you can get
    by betting on set A)

13
Property of the objective
  • Is this Submodular?
  • Do they meet the criteria?
  • R(0) 0
  • R is non-decreasing
  • Marginal gain diminishes as the set get bigger.
  • Hence, this function is Submodular

14
What this means?
  • In uniform cost case (all node has the same
    placement cost), a greedy algorithm is only
    constant factor away from the optimal solution.
    Nice.
  • Notice that objective function have to be
    expressed in terms of expected penalty in order
    to be submodular.

15
  • By the way, Submodular function is in general
    NP-hard.
  • But it has a nice theorem that guarantee lower
    bounding for greedy algorithm (so go greedy,)

16
Greedy Algorithm
  • The authors proposed the following greedy
    algorithm for uniform cost case
  • Start with A , (null set)
  • At step k, adds the node sk which maximizes the
    marginal gain, stop when Budge max out.

17
Non-uniform case CEF
  • Non-uniform cost case is also bounded

18
Computing the online bound
  • You can also compute the tighter upper bound for
    submodular function
  • (This is useful for evaluation)

19
Efficient Algorithm - CELF
20
Efficient Algorithm - CELF
21
Multicriterion Optimization
  • If there no set A s.t. Ri(A) Ri(A) for all i
    and strictly better for at least one event, j
    (Rj(A) Rj(A) for some j) te solution is called
    pareto-optimal
  • For some positive weight ? consider objective
    R(A) ? ?i Ri(A). Any solution maximizing R(A)
    is guaranteed to be Pereto-optimal
  • Under submodularity assumption, this objective
    also is submodular. Nice.

22
Experiments
  • 45K blogs, 1M links, 17,589 information cascades
  • True optimal somewhere between green and blue
  • 13.8 away from the optimal (much closer than
    the guaranteed bound)
  • Note this is the evaluation on the training data
    - this is the utility score ( penalty reduction)
    based on the labeled data.

23
Experiments
  • Cost sensitive algorithm achieve the same level
    of utility score at 1500 posts, as opposed to
    10,710 posts.
  • Under non-uniform cost, summarization type
    blogs score much higher than large one.

24
Experiments
  • Comparizon with other heuristics. The proposed
    model does much better.
  • (Is this fair comparizon?)

25
Experiments
  • How well does it generalize? Save 50 of data for
    test. CELF appear to overfit.
  • The problem is alleviated when filter out very
    small blog sites.

26
Summary of the paper
Write a Comment
User Comments (0)
About PowerShow.com