A Confidence Interval for the Misclassification Rate - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

A Confidence Interval for the Misclassification Rate

Description:

A Confidence Interval for the Misclassification Rate S.A. Murphy & E.B. Laber – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 38

Provided by: Sam1184

Category:

more less

Transcript and Presenter's Notes

Title: A Confidence Interval for the Misclassification Rate

1
A Confidence Interval for theMisclassification
Rate

S.A. Murphy
E.B. Laber

2
Outline

Review
Three challenges in constructing CIs
Combining a statistical approach with a learning
theory approach to constructing CIs
Relevance to confidence measures for the value of
a policy.

3
Review

X is the vector of features, Y is the binary
classification
Misclassification Rate
Data N iid observations of (Y,X)
Given a space of classifiers, , and the data,
use some method to construct a classifier,
The goal is to provide a CI for

4
Review

Since the loss function
is not smooth, one commonly uses a surrogate loss
to estimate the classifier
Surrogate Loss L(Y,f(X))

5
Review

General approach to providing a CI
We estimate using the data,
resulting in
Derive approximate distribution for
Use this approximate distribution to construct a
confidence interval for

6
Three challenges

is too large leading to over-fitting and
(negative bias)
is a
non-smooth function.
may behave like an extreme quantity
No assumption that is close to optimal.

7
Three challenges

is
non-smooth.
Example The unknown Bayes classifier has
quadratic decision boundary. We fit, by least
squares, a linear decision boundary
f(x) sign(ß0 ß1 x)

8
Density of
9
Bootstrap CI for
10
Misclassification Rate is Non-smooth

Coverage of 95 CI

Sample Size Bootstrap Percentile Yang CV CUD-Bound
30 .85 .29 .91
50 .88 .24 .92
100 .83 .20 .94
200 .85 .22 .95
11
CIs for Extreme Quantities

may behave like an extreme quantity
Should this be problematic?
Highly skewed distribution of
Fast convergence of
to zero

12
CIs from Learning Theory

Given a result of the form
where is known to belong to and
forms a 1-d CI as

13
Combine statistical ideas with learning theory
ideas

Construct a confidence interval for
where is chosen to be small yet contain
---from this CI deduce a conservative CI for
---use the surrogate loss to smooth the
maximization and to construct

Construct a confidence interval for
--- should contain all that are close to
--- all f for which
--- is the limiting value of

15
Confidence Interval

Construct a confidence interval for
--- is a rate I would like
---

16
Confidence Interval
17
Bootstrap

We use bootstrap to obtain an estimate of an
upper percentile of the distribution of
to obtain b. The CI is then

18
Implementation

Approximation space for the classifier is linear
Surrogate loss is least squares
is the .632 estimator

19
Implementation

becomes

20
(No Transcript)
21
Computational Issues

Partition Rp into equivalence classes defined by
the pattern of signs
Each equivalence class, can be written as
a set of ß satisfying linear constraints.
The term in absolute values is constant on

22
Computational Issues

can be written as
since g is non-decreasing.

23
Computational Issues

Reduced the problem to the computation of a
number of convex optimization problems. The
number of convex optimizations is reduced via use
of g with a branch and bound algorithm.
With a sample size of N 150 and 11 features
calculation of the percentiles of the CUD bound
using 500 bootstrap samples can be accomplished
in a few minutes on a standard desktop (2.4 GHz
processor 2 GB RAM).

24
Comparisons, 95 CI
Data CUD BS M Y
Spam 1.0 .99 .63 1.0
Ion .96 .96 .80 .99
Heart 1.0 .99 .95 1.0
Diabetes 1.0 .91 .98 .99
Donut .98 .90 .62 .88
Outlier .99 .80 .93 .93
Sample size 50
25
Comparisons, length of CI
Data CUD BS M Y
Spam .56 .38 .25 .33
Ion .34 .36 .24 .32
Heart .47 .47 .40 .44
Diabetes .39 .31 .31 .36
Donut .49 .53 .26 .33
Outlier .50 .39 .29 .33
Sample size50
26
Intuition

If then we are
approximating the distribution of
where

27
Intuition

If and
then the distribution is approximately that of
the
absolute value of a
(limiting distribution for binomial, as
expected).

28
Intuition

If and
the distribution is approximately the
distribution of

29
Intuition

Consider
if in place of we put where is
close to
then due to the non-smoothness
in
at
we will get jittering.

30
Discussion

Further reduce the conservatism of the CUD-bound.
Eliminate symmetry of CUD-bound CI
?
Trade off computational burden versus bias by use
of a surrogate for the indicator in the
misclassification rate
The real goal is to produce CIs for the Value of
a policy.

31
The simplest Dynamic treatment regime (e.g.
policy) is a decision rule if there is only one
stage of treatment 1 Stage for each individual
Observation available at jth stage
Action at jth stage (usually a treatment)
Primary Outcome
32
Goal Construct decision rules that input
patient information and output a recommended
action these decision rules should lead to a
maximal mean Y. In future one selects action
33
Single Stage (k1)

Find a confidence interval for the mean outcome
if a particular estimated policy (here one
decision rule) is employed.
Action A is randomized in -1,1.
Suppose the decision rule is of form
We do not assume the optimal decision boundary is
linear.

34
Single Stage (k1)

Mean outcome following this policy is
is the randomization
probability

35
(No Transcript)
36
Oslin ExTENd
Naltrexone
8 wks Response
Randomassignment
TDM Naltrexone
Early Trigger for Nonresponse
CBI
Randomassignment
Nonresponse
CBI Naltrexone
Randomassignment
Naltrexone
8 wks Response
Randomassignment
TDM Naltrexone
Late Trigger for Nonresponse
Randomassignment
CBI
Nonresponse
CBI Naltrexone
37

This seminar can be found at
http//www.stat.lsa.umich.edu/samurphy/
seminars/Stanford04.01.08.ppt
Email Eric or me with questions or if you would
like a copy of the associated paper
laber_at_umich.edu or samurphy_at_umich.edu

Write a Comment

User Comments (0)