Kernel Smoothing with Interval Censored Data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Kernel Smoothing with Interval Censored Data

1
Kernel Smoothing withInterval Censored Data

by
Thierry Duchesne,
James E. Stafford and
Paul Corey
Department of Statistics and Department of Public
Health Sciences
University of Toronto
Statistics 2001 Canada
Montréal, July 2001

2
1. Introduction
Available data A random sample , i 1, 2,
, n, where if is
observed exactly and if
is interval censored. Goal To estimate the
density of X, f, by kernel smoothing. Complete
data kernel density estimate

(1) With interval
censored data Replace in (1) by a
nonparametric estimate of the distribution
function (such as Turnbulls (JRSS B, 1976)
estimate), say , via

(2) Problem has gaps (see
Figure 1) over innermost intervals, so some
points at which jumps must be imputed,
creating bias if most data are interval
censored.
3
1. Introduction
Figure 1 Plot of Turnbulls nonparametric
estimate of the survivor function with interval
censored data. Estimate not uniquely defined over
points of support called innermost intervals. In
order to obtain a kernel density estimate via
(2), assumptions have to be made as to how jumps
occur over innermost intervals.
4
2. Proposed estimator
Complete data kernel density estimate can be
viewed as expectation of K((x-X)/h)/h with
respect to the empirical cdf

(3) For interval censored
data, we propose to replace the expectation in
(3) with a conditional expectation, given the
observed data, then perform the following
iterative scheme until convergence Let be
any initial density estimate. We define our
density estimate at jth step recursively via

(4) The
iterations stop when a self-consistent estimator
is found, i.e. when we have reached a
fixed point of the iterative scheme and
application of (4) has no more effect.
5
2. Proposed estimator
Figure 2 Top interval censored data from
DeGruttola and Lagakos (1989). Bottom dotted
line kernel density estimate based on Turnbulls
estimate (2) solid line kernel density estimate
based on proposed estimator (4) after 4
iterations. Notice how (4) smoothes the hump
around t 5 better than (2) without eroding the
peak around t 10.
Figure 3 Kernel density estimate (4) for the
data shown in Figure 2 after 1, 2, 3 and 4
iterations. Uniform distribution used as a
starting point.
6
3. Interpretation
Interpretation 1 Estimator computes kernel
weight at x for an observation (interval) by
computing the average height of the kernel over
the interval Figure 4
Top estimator gives more weight to narrow
interval if close to point of estimation x0.
This makes sense as there is more precision in
the data. Bottom estimator gives more weight to
wide interval if far from point of estimation
x-2. This also makes sense as the wider interval
is more local to 2 than the shorter
interval.
7
3. Interpretation
Interpretation 2 Gives the estimate that is
closest to the complete data kernel density
estimate in terms of mean integrated squared
distance, i.e.

(5)

The estimator may thus be regarded as
resulting from a generalized EM algorithm
with E-Step for all i define M-Step
compute The criterion (5) is reduced at every
iteration (see Section 7). We discuss numerical
issues concerning the E-Step in Section 5.
8
4. Properties
Efrons (1967) self-consistent estimator for
right-censored data is found recursively via

(6) where
N(x) is the number of observations ? x, is
the lower bound for failure time i, and is 0
if observation i is right censored, 1
otherwise. Efron showed that the fixed point of
his algorithm was the Kaplan-Meier estimator. In
the case of interval censored data, Turbulls
estimator is a special case of the estimator of
Li et al. (Scand. J. Stat., 1997)

(7) Li et al. show
their algorithm converges to Turnbulls estimator
where the estimator is uniquely defined, and to a
value that depends on the initial value of the
algorithm over innermost intervals. Under
right-censoring, Li et al.s estimator is the
Kaplan-Meier estimator.
9
4. Properties
We can show that in terms of the distribution
function, the proposed estimator (4) reduces to
Li et al.s and Efrons estimators when h shrinks
to zero at every step. Theorem 1 If we use the
same initial density estimate for the iterations,
our estimator (4) converges to that of Li et
al.s as h reduces to zero at every step,
i.e., (Proof can be found in Duchesne and
Stafford, 2001.) Theorem 2 Under suitable
regularity conditions, (Proof can be found in
Duchesne and Stafford, 2001.) Li et al.s have
shown that their algorithm converges to a fixed
point. From Theorem 2, we might wonder if the
term is large enough to affect convergence
of algorithm to a fixed point. From our
experience, the algorithm seems to be converging
quite quickly to a fixed point, even for
relatively large values of h. We looked at some
convergence criteria in a simulation study and we
present the results in Section 7.
10
5. Numerical implementation
The numerical implementation of the method
contains one interesting challenge evaluating
the conditional expectations of the kernel at
every step

(8) for some interval I. Except in a very
few special cases, it is impossible to evaluate
(8) analytically. Numerical integration can be
used, but we prefer Monte Carlo methods. We
propose two algorithms.
Algorithm 1 (acceptance/rejection) For
k1, , B 1. Simulate Y
from a- sample I with
replacement from b- sample
c- sample Y from 2. If Y in
I, set Y, else go back to 1.

11
5. Numerical implementation
We can base a method on the result where g
is some distribution over the interval I that is
easy to sample from and
. The algorithm can be simplified to
Algorithm 2 (importance
sampling) 1. Generate a random sample of B
values of X from distribution g.
2. Compute where Notice how
conditional densities in the weights are replaced
by unconditional densities. To obtain the values
of , we record values of estimate over a
very fine grid of values of x at each iteration
and we use linear interpolation.
12
5. Numerical implementation
Pros and cons of the two algorithms
Algorithm 1 (acceptance/rejection) Pro
sampling exact, hence we get Monte Carlo error as
small as we want Con precision in data (short
intervals) penalized by making algorithm very
slow (too many rejections in step 2.)
Algorithm 2 (importance sampling) Pro
very fast Cons linear interpolation involved
appropriate choice of g
Figure 5 Simulation study of importance sampling
Algorithm 2 applied to dataset of Figure 2. Solid
lines average estimate. Dashed lines 99
percentile intervals. Top figure algorithm
applied with B 10. Bottom figure algorithm
applied with B 100.
13
6. Choice of h
We propose a modification of the likelihood
cross-validation method for complete data
(Silverman, 1986). Idea successively remove
points of support of empirical c.d.f. and
maximize following with respect to h

(9) where superscript
indicates is left out when estimate is
computed. For interval censored data, support of
the empirical cdf is constituted of innermost
intervals. We thus replace (9) with

(10) where is
obtained by dropping , the rth innermost
interval, and minimize (10) with respect to h.
Eliminating is done by removing the
observed intervals in the sample that contribute
to its presence but not to the presence of other
innermost intervals. It would be interesting to
find methods that do not require knowledge of
innermost intervals, as the estimator itself does
not require that knowledge.
14
7. Simulation study
Figure 6 Illustration of contraction property of
the estimator based on dataset from Figure 2.
Clockwise from top left corner (a) density
estimates obtained with 5 different starting
distributions after 1 iteration (b) estimates
after 2 iterations (c) estimates after 3
iterations (d) estimates after 4 iterations.
Algorithm seems to be converging to the same
fixed point quickly for all 5 initial
distributions.
15
7. Simulation study
We have done a small simulation study to
investigate the behavior of a few convergence
criteria. Let represent a fine grid of
points. Consider the following distance between
two functions u and v

(11) First, we investigate whether the
mapping from one iteration to the next is a
contraction mapping, i.e. whether

(12) given two arbitrary
but different starting points and .
This would guarantee convergence of the algorithm
to a unique fixed point. We also look at
averages over B samples of criterion (5)
via and Finally, we consider the
Kullbeck-Leibler distance
16
7. Simulation study
We now present a summary of simulation results.
For more detail on how the data were generated,
see Duchesne and Stafford (2001).
Table 1
j is the iteration number, is
the traditional estimate (1) based on Turnbulls
empirical c.d.f. Squared distance simulation
average distance (11) between estimates from 2
starting points Contraction proportion of
samples for which criterion (12) true (50 at
convergence because of Monte Carlo error in
evaluation of conditional expectations) MSE1,
MSE2 and Kullbeck-Leibler as defined on
previous page.
17
8. Scatterplot smoothing
The method proposed can be generalized to any
statistical procedure where kernel weights must
be computed when data are missing. Consider the
problem of scatterplot smoothing when we have a
single covariate measured with error.
Mathematically, we want a nonparametric estimate
of the function g in With complete data,
could be done with a running mean smoother of the
form

(13) where is the nearest neighborhood
for x. Typically, the weights are given by
. When the
covariate is only know up to an interval, we can
use the method described in this paper to compute
the weights. We simply replace in
(13) by where expectation is computed with
respect to the density estimate based on the
interval censored covariate data.
18
8. Scatterplot smoothing
An algorithm to estimate g is given by 1. Use the
importance sampling Algorithm 2 of Section 5. to
approximate 2. Compute
Figure 7 Data on HIV infection and infant
mortality of Hughes and Richardson (JASA, in
press). Clockwise from top left corner (a)
observed data, time of death on y axis and time
of infection on x axis (b) density estimate (4)
for covariate data (c) and (d) plot of nearest
neighbor with interval censored covariate and
method of this section for various window widths,
with center of intervals as points in (c) and
intervals as lines in (d).
19
8. Scatterplot smoothing

Interesting questions regarding scatterplot
smoothing technique
How can we handle influential observations?
With regard to previous question, can technique
be made
robust, or can choice of smoothing parameter
be made
adaptive?
What is best way to generalize the method when
the response
is interval censored or measured with error as
well?

20
9. References
Efron, B. (1967), The two sample problem with
censored data, Fourth Berkeley Symposium on
Mathematical Statistics, University of California
Press, 831-853. DeGruttola, V. and Lagakos, S.W.
(1989), Analysis of doubly-censored survival
data, with applications to AIDS, Biometrics, 45,
1-11. Duchesne, T. and Stafford, J.E. (2001), A
kernel density estimate for interval censored
data, Technical Report No. 0106, University of
Toronto. Li, L., Watkins, T. and Yu, Q. (1997),
An EM algorithm for smoothing the
self-consistent estimator of survival functions
with interval-censored data, Scandinavian
Journal of Statistics, 24, 531-542. Silverman,
B. (1986), Density Estimation for Statistics and
Data Analysis, London, Chapman-Hall. Turnbull,
B.W. (1976), The empirical distribution function
with arbitrarily grouped, censored and truncated
data, Journal of the Royal Statistical Society
B, 38, 290-295. Wu, C.F. (1983), On the
convergence of the EM algorithm, Annals of
Statistics, 11, 95-103.

Write a Comment

User Comments (0)

About PowerShow.com

Kernel Smoothing with Interval Censored Data PowerPoint PPT Presentation