Title: A Bayesian Approach for Bandwidth Selection in Kernel Density Estimation
1A Bayesian Approach for Bandwidth Selection in
Kernel Density Estimation
- C.N.Kuruwita
- Department of Mathematical Sciences
- Clemson University
2What is Density Estimation ?
- The process of obtaining an estimate for an
unknown probability density function. - There are two main approaches.
- Parametric.
- Nonparametric.
3Parametric Approach.
- Assumes the data come from a known
- parametric family with density f0(. ?).
- Then estimate the unknown parameter ?
- using a suitable parameter estimation
- method.
- e.g Maximum Likelihood.
4Problems in Parametric Approach
- Restricting the estimator to a certain
parametric family - makes important features in the data
undetected.
Parametric estimate with a lognormal density
Nonparametric estimate
Reference Kernel Smoothing , Wand Jones (1995)
5Nonparametric Approach
- Let the data speak for themselves.
- Problems
- Properties of these estimators are hard to
- determine.
- Computer Intensive.
6Common Nonparametric Density Estimation Methods.
- Kernel density estimators.
- Nearest neighbor method.
- Maximum penalized likelihood estimators.
- Orthogonal series estimators.
7Kernel Density Estimator.
- Definition
- K(.) is a symmetric pdf. (usually)
- h is called the bandwidth or smoothing
parameter.
8The Problem
9- Data driven bandwidth selection.
10The Approach
- Use an asymmetric kernel with a positive support
to avoid the spill over effect. - Assign a prior density for the smoothing
parameter h. - Derive the density estimator on a Bayesian
framework.
11The Lognormal Kernel Density Estimator
- Definition
- K(.) is a lognormal pdf, with scale parameter
h. - sj is the jump size of the Kaplan-Meier estimator
at each observation . - h is assigned an inverted gamma prior with
shape parameter ? and scale parameter ?.
12Resulting Bayesian Bandwidth.
- The Bayesian estimator of the smoothing
parameter h under squared error loss is given
as - where
13Asymptotic Properties
- The lognormal KDE converge to the actual
- pdf as .
- i.e a.s ,
-
- The Bayesian local bandwidths converge to
- zero as .
- i.e a.s ,
14Simulation Study
- Assess the effect of the lognormal kernel
- Compare with Inverse Gaussian kernel.
- Assess the effectiveness of the Bayesian
bandwidths. - Compare with Cross Validated bandwidth.
- Simulated Failure Rate Data from Weibull(?,1)
- Decreasing Failure Rate ( DFR )
- Constant Failure Rate ( CFR )
- Increasing Failure Rate ( IFR )
- Censoring levels. 10, 20, 50
15Assessment Criteria
- Pointwise Estimated MSE Ratio
- Estimated Mean Integrated Squared Error
- where N is the number of simulations.
16Estimated Mean Integrated Squared Errors.
IG- Bayes
LN-CV
LN-Bayes
17Application to Real Data
The Problem Estimation of the probability
density of debond strength of carbon
fibers. Data Due to the complexity of the
experiment there were only 12
observations and 3 of which are
censored. Reference Harwell M. (1995)
Microbond Tests for ribbon fibers . M.S.thesis,
Department of Chemical Engineering , Clemson
University.
18Density Estimates
19Conclusion and Future Work
- The lognormal KDE with the Bayesian bandwidths
shows lot of potential as density estimator. - Need to explore the boundary effect at the right
of the support. i.e. when the support is finite
as 0,? , with
20Thank You