Title: Raoul LePage
1Raoul LePage Professor STATISTICS AND
PROBABILITY www.stt.msu.edu/lepage click on
STT351_F07
Week 8-27-07
2WEEK 8-27-07 PLAN Chapter 1 except Section
1-5. Kernel Density Estimate, pp
333-334. Homework due in class 9-5-07. No class
9-3-07 (labor day).
3Plot average heights of normal densities placed
at each data value, e.g. 10, 14. It is like
smearing each sample value, as it were a drop of
paint, according to the thickness of a normal
density. Each normal integrates to one, as does
their average the Sample Density Estimate shown
in dark.
Smoothing data , so you can see it.
normal densities at data 10, 14
Kernel Density Estimate
4The mean of a Kernel Density Estimate is equal to
the sample mean of its data.
5Making the densities narrower isolates different
parts of the data and reveals more detail.
NARROWER TENTS MORE DETAIL
6Closer view of the density by itself, with
narrow normal curves.
density
7Histograms lump data into categories (the black
boxes), not as good for continuous data.
DENSITY OR HISTOGRAM ?
density histogram
8Form of each rectangle comprising a Probability
Histogram. Example A sample of n 40 finds
three data values which are at least 30 but less
than 35 (interval 30, 35)).
height area w height 3 / 40
3/(40 5)
Histograms may radically change their shape in
response to minor changes of bin locations or
widths.
30 35 bin-width w 35 - 30
5
9Plot of average heights of 5 tents placed at data
12, 21, 42, 8, 9.
DENSITY FOR 12, 21, 42, 8, 9
normal density smear around datum 42
data density
10 Narrower tents operate at higher resolution but
they may bring out features that are illusory.
IS DETAIL ILLUSORY ?
which do we trust ?
kinkier
smoother
11Population of N 500 compared with two samples
of n 30 each.
BEWARE OVER-FINE RESOLUTION
POP mean 32.02
population of N 500
with 2 samples of n 30
12Population of N 500 compared with two samples
of n 30 each.
BEWARE OVER-FINE RESOLUTION
sample means are close
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
densities not good at fine resolution
population of N 500
with 2 samples of n 30
13The same two samples of n 30 each from the
population of 500.
WE DO BETTER AT COARSE RESOLUTION
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
how about coarse resolution ?
population of N 500
with 2 samples of n 30
14The same two samples of n 30 each from the
population of 500.
WE DO BETTER AT COARSE RESOLUTION
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
agreement better at coarser resolution
population of N 500
with 2 samples of n 30
15The same two samples of n 30 each from the
population of 500.
HOW ABOUT MEDIUM RESOLUTION ?
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
medium resolution ?
population of N 500
with 2 samples of n 30
16The same two samples of n 30 each from the
population of 500.
HOW ABOUT MEDIUM RESOLUTION ?
SAM1 mean 33.03 SAM2 mean 30.60
POP mean 32.02
not good at medium resolution
population of N 500
with 2 samples of n 30
17A sample of only n 600 from a population of N
500 million.(medium resolution)
SAMPLING ONLY 600 FROM 500 MILLION ?
large sample of n 600 ?
POP mean 32.02
medium resolution ?
population of N 500,000
with a sample of n 600
18A sample of only n 600 from a population of N
500 million.(MEDIUM resolution)
SAMPLING ONLY 600 FROM 500 MILLION ?
sample of n 600 sample mean 32.84
mean very close
POP mean 32.02
densities are close
population of N 500,000
with a sample of n 600
19A sample of only n 600 from a population of N
500 million.(FINE resolution)
SAMPLING ONLY 600 FROM 500 MILLION ?
sample of n 600 sample mean 32.84
POP mean 32.02
FINE resolution
densities very close
population of N 500,000,000
with a sample of n 600
20TALKING POINTS
- A density is controlled by the sd, referred to as
bandwidth, of the normal densities used to make
it. - 1a. You have to be content with the
information revealed by the population density at
your chosen bandwidth. - 1b. Small samples zero-in fairly well on
densities at coarse resolution, i.e. made with
large bandwidth. - 1c. Samples in hundreds may perform
remarkably well, even at fine resolution, i.e.
small bandwidth. - 2. Histograms are notorious for being unstable
for some data. Yet, they remain popular. Learn
to make them by hand. - 3. Learn to make a density for 2 to 4 data
values by hand.
21