When Ignorance is Bliss - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

When Ignorance is Bliss

Description:

in machine learning we do this all the time (to avoid ... doctor then learns the patient's ... using Bayesian updating, you learn the conditional distribution of ... – PowerPoint PPT presentation

Number of Views:727
Avg rating:3.0/5.0
Slides: 22
Provided by: peter807
Category:

less

Transcript and Presenter's Notes

Title: When Ignorance is Bliss


1
When Ignorance is Bliss
  • Peter Grünwald
  • CWI and EURANDOM
  • www.grunwald.nl
  • Joe Halpern
  • Cornell University
  • halpern_at_cs.cornell.edu

Preliminary version appeared in Proceedings 20th
Annual Conference on Uncertainty in Artificial
Intelligence (UAI 2004)
2
Is Ignorance Bliss?
  • Suppose you want to make predictions about a
    random variable Y, given the value of a random
    variable X
  • CLAIM there exist some situations in which
    value of X is best ignored
  • This includes situations in which the value of X
    is relevant for the prediction task at hand
  • This is true both in a Non-Bayesian and in a
    Bayesian analysis

3
Is Ignorance Bliss?
  • People react very differently when we claim this
  • True, of course
  • in machine learning we do this all the time
  • (to avoid overfitting)
  • people (even smart people) do this all the time
  • False, of course
  • everybody knows that more information is always
    better
  • Celebrated theorems in Bayesian
    decision-theoretic literature state that
    information should never be ignored (Good 1967,
    Raiffa and Shlaifer 1961)

4
Example
  • A doctor is trying to decide if a patient has a
    flu or tuberculosis.
  • The doctor then learns the patient's address.
  • The doctor knows that the patient's address may
    be correlated with disease but does not know the
    correlation at all
  • tuberculosis may be more common in some parts of
    the city than others
  • Here Y is patients disease, X is neighborhood in
    which he lives.
  • The doctor is trying to choose a treatment.
    Effect of treatment depends only on Y.
  • Under these circumstances, many doctors would
    simply not take the patient's address into
    account, thereby ignoring relevant information.
  • Here we show that this is quite sensible!

5
Setup
  • ,
  • Agent wants to predict given
  • is set of distributions that agent deems
    possible
  • is subset of set of all distributions on
  • Prediction quality measured using an arbitrary
    loss function


6
Setup
  • ,
  • Agent wants to predict given
  • is set of distributions that agent deems
    possible
  • is subset of set of all distributions on
  • Prediction quality measured using an arbitrary
    loss function
  • is set of actions available to Agent
  • simple example , L is
    0/1-loss,
  • if
  • if


7
Setup
  • This talk
  • for some fixed .
  • agent only knows , the marginal distribution
    of Y !


8
Setup
  • This talk
  • for some fixed .
  • agent only knows , the marginal distribution
    of Y !
  • Menu
  • Non-Bayesian analysis
  • Minimax, lower/upper probabilities
  • Bayesian analysis
  • Prior on
  • Beyond Bayes/Minimax

9
Non-Bayesian Analysis
  • Let
  • Agent can make reasonably accurate predictions of
    Y
  • Now agent observes
  • Obvious way to update on this information
  • Transform to
  • Now contains all distributions on Y
  • All information about Y is lost!
  • Same holds no matter what value of X is observed
  • This is called dilation in the robust Bayes
    literature (Seidenfeld and Wasserman 1993)

10
Ignoring is Minimax Optimal
  • Alternative update rule ignore value of Y
  • This is the minimax optimal decision rule
  • Proposition
  • Let L be arbitrary loss function, be
    arbitrary distribution on ,
  • Then
  • is achieved for , with

11
Example 0/1-loss
  • Thus, ignoring X leads to
  • But for conditioning on X, we get

12
When ignorance aint bliss - 1
  • If loss function can depend on X, then ignoring
    is a very bad strategy after all!
  • Interpretation
  • Bookie offers some tickets for a horse race
  • Agent computes expected payoff under his
    probability distribution(s) to decide whether to
    buy.
  • If bookie is allowed to let prices depend on X
    while agent ignores X, then bookie can force
    agent to loose money.
  • If bookie also ignores/does not know X, this
    cannot happen.

13
When ignorance aint bliss - 2
  • If is a singleton, then ignoring X
    never helps and often harms
  • Proposition
  • Let be an arbitrary singleton. Then

14
Bayesian Analysis
  • Celebrated theorem of Bayesian decision theory
    information should never be ignored
  • Here expectation is taken with respect to
    subjective prior on
  • Proof
  • use of prior induces a unique distribution on
  • We already know that in that case, we should
    condition rather than ignore

15
Bayesian Analysis
  • If agent really trusts her own prior, she should
    not ignore information
  • But in many (most?) practical applications,
    agents use pragmatic/noninformative priors
  • e.g. uniform, Jeffreys, MaxEnt,
  • For objective Bayesian (Jaynes, Berger) this is
    obvious
  • Even a subjective Bayesian (Savage, De Finetti,
    ) will have to admit that in practice, an Agent
    will often not have the time or the computational
    resources to precisely determine her subjective
    beliefs
  • In that case, it may make sense to look at
    worst-case behaviour of Bayesian updating

16
Bayesian Analysis
  • We consider a variety of noninformative priors
    on
  • including Jeffreys prior, and uniform prior on
    parameters
  • For all these priors,
  • If the observation is made once, the value of X
    is ignored
  • If the experiment is independently performed two
    times, then, no matter the outcome of first
    observation , the value of
    is not ignored
  • Bayes is not minimax optimal
  • Not surprising, really

17
Bayesian Analysis
  • Of course, if you repeatedly observe X and have
    to predict Y, then, by using Bayesian updating,
    you learn the conditional distribution of Y given
    X,
  • Then after a certain number of examples, Bayes
    starts to perform better than ignoring
  • unless X and Y are independent
  • Yet for the second observation and a few more,
    ignoring seems better

18
Beyond Minimax Reliability
  • If has more complicated structure, then
    ignoring is usually not minimax optimal any more
  • We recently completely characterized the for
    which it is
  • Still, ignoring can be a reasonable update rule
  • It is often the sharpest reliable update rule
  • Why should we measure quality as minimax optimal
    loss anyway? Many other possibilities
  • minimax regret
  • Average with respect to meta-prior,
  • Reliability has some particularly pleasant
    properties

19
Conclusion
  • Sometimes new information is best ignored
  • ignoring can be
  • minimax optimal
  • reliable
  • preferable to pragmatic Bayes
  • perhaps you dont want to know this
  • In that case, please ignore what I had to say!

20
Thank you for your attention!
21
Reliability
  • You hypothesize a messy set of distributions
  • It is much more convenient to work with a single
    distribution summarizing
  • Let
  • is reliable relative to if
    for all
  • The world behaves exactly as it would if
    were true, even though it isnt!
Write a Comment
User Comments (0)
About PowerShow.com