A Confidence Interval for the Misclassification Rate - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

A Confidence Interval for the Misclassification Rate

Description:

A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 38
Provided by: Sam1184
Category:

less

Transcript and Presenter's Notes

Title: A Confidence Interval for the Misclassification Rate


1
A Confidence Interval for theMisclassification
Rate
  • S.A. Murphy
  • E.B. Laber

2
Outline
  • Review
  • Three challenges in constructing CIs
  • Combining a statistical approach with a learning
    theory approach to constructing CIs
  • Relevance to confidence measures for the value of
    a policy.

3
Review
  • X is the vector of features, Y is the binary
    classification
  • Misclassification Rate
  • Data N iid observations of (Y,X)
  • Given a space of classifiers, , and the data,
    use some method to construct a classifier,
  • The goal is to provide a CI for

4
Review
  • Since the loss function
    is not smooth, one commonly uses a surrogate loss
    to estimate the classifier
  • Surrogate Loss L(Y,f(X))

5
Review
  • General approach to providing a CI
  • We estimate using the data,
    resulting in
  • Derive approximate distribution for
  • Use this approximate distribution to construct a
    confidence interval for

6
Three challenges
  • is too large leading to over-fitting and

  • (negative bias)
  • is a
    non-smooth function.
  • may behave like an extreme quantity
  • No assumption that is close to optimal.

7
Three challenges
  • is
    non-smooth.
  • Example The unknown Bayes classifier has
    quadratic decision boundary. We fit, by least
    squares, a linear decision boundary
  • f(x) sign(ß0 ß1 x)

8
Density of
9
Bootstrap CI for
10
Misclassification Rate is Non-smooth
  • Coverage of 95 CI

Sample Size Bootstrap Percentile Yang CV CUD-Bound
30 .85 .29 .91
50 .88 .24 .92
100 .83 .20 .94
200 .85 .22 .95
11
CIs for Extreme Quantities
  • may behave like an extreme quantity
  • Should this be problematic?
  • Highly skewed distribution of
  • Fast convergence of
    to zero

12
CIs from Learning Theory
  • Given a result of the form
  • where is known to belong to and

  • forms a 1-d CI as

13
Combine statistical ideas with learning theory
ideas
  • Construct a confidence interval for
  • where is chosen to be small yet contain
  • ---from this CI deduce a conservative CI for
  • ---use the surrogate loss to smooth the
    maximization and to construct

14
  • Construct a confidence interval for
  • --- should contain all that are close to
  • --- all f for which
  • --- is the limiting value of

15
Confidence Interval
  • Construct a confidence interval for
  • --- is a rate I would like
  • ---

16
Confidence Interval
17
Bootstrap
  • We use bootstrap to obtain an estimate of an
    upper percentile of the distribution of
  • to obtain b. The CI is then

18
Implementation
  • Approximation space for the classifier is linear
  • Surrogate loss is least squares
  • is the .632 estimator

19
Implementation
  • becomes

20
(No Transcript)
21
Computational Issues
  • Partition Rp into equivalence classes defined by
    the pattern of signs
  • Each equivalence class, can be written as
    a set of ß satisfying linear constraints.
  • The term in absolute values is constant on

22
Computational Issues
  • can be written as
  • since g is non-decreasing.

23
Computational Issues
  • Reduced the problem to the computation of a
    number of convex optimization problems. The
    number of convex optimizations is reduced via use
    of g with a branch and bound algorithm.
  • With a sample size of N 150 and 11 features
    calculation of the percentiles of the CUD bound
    using 500 bootstrap samples can be accomplished
    in a few minutes on a standard desktop (2.4 GHz
    processor 2 GB RAM).

24
Comparisons, 95 CI
Data CUD BS M Y
Spam 1.0 .99 .63 1.0
Ion .96 .96 .80 .99
Heart 1.0 .99 .95 1.0
Diabetes 1.0 .91 .98 .99
Donut .98 .90 .62 .88
Outlier .99 .80 .93 .93
Sample size 50
25
Comparisons, length of CI
Data CUD BS M Y
Spam .56 .38 .25 .33
Ion .34 .36 .24 .32
Heart .47 .47 .40 .44
Diabetes .39 .31 .31 .36
Donut .49 .53 .26 .33
Outlier .50 .39 .29 .33
Sample size50
26
Intuition
  • If then we are
    approximating the distribution of
  • where

27
Intuition
  • If and
  • then the distribution is approximately that of
    the
  • absolute value of a


  • (limiting distribution for binomial, as
    expected).

28
Intuition
  • If and
  • the distribution is approximately the
    distribution of

29
Intuition
  • Consider
  • if in place of we put where is
    close to
  • then due to the non-smoothness
    in
  • at
    we will get jittering.

30
Discussion
  • Further reduce the conservatism of the CUD-bound.
  • Eliminate symmetry of CUD-bound CI
  • ?
  • Trade off computational burden versus bias by use
    of a surrogate for the indicator in the
    misclassification rate
  • The real goal is to produce CIs for the Value of
    a policy.

31
The simplest Dynamic treatment regime (e.g.
policy) is a decision rule if there is only one
stage of treatment 1 Stage for each individual
Observation available at jth stage
Action at jth stage (usually a treatment)
Primary Outcome
32
Goal Construct decision rules that input
patient information and output a recommended
action these decision rules should lead to a
maximal mean Y. In future one selects action
33
Single Stage (k1)
  • Find a confidence interval for the mean outcome
    if a particular estimated policy (here one
    decision rule) is employed.
  • Action A is randomized in -1,1.
  • Suppose the decision rule is of form
  • We do not assume the optimal decision boundary is
    linear.

34
Single Stage (k1)
  • Mean outcome following this policy is
  • is the randomization
    probability

35
(No Transcript)
36
Oslin ExTENd
Naltrexone
8 wks Response
Randomassignment
TDM Naltrexone
Early Trigger for Nonresponse
CBI
Randomassignment
Nonresponse
CBI Naltrexone
Randomassignment
Naltrexone
8 wks Response
Randomassignment
TDM Naltrexone
Late Trigger for Nonresponse
Randomassignment
CBI
Nonresponse
CBI Naltrexone
37
  • This seminar can be found at
  • http//www.stat.lsa.umich.edu/samurphy/
  • seminars/Stanford04.01.08.ppt
  • Email Eric or me with questions or if you would
    like a copy of the associated paper
  • laber_at_umich.edu or samurphy_at_umich.edu
Write a Comment
User Comments (0)
About PowerShow.com