Title: P1247676906bNOKS
1- A. Descriptive Statistics (ch 2)
- 1. Mean
- 2. Median
- 3. Mode
- 4. Standard Error
- Standard Deviation
- (probability ch 3) The Normal Distribution (ch5 )
- 1. Relation of width to standard deviation
- 2. Kurtosis (just talk about what it is)
- 3. Skewness (ex 5.4, 5.8)
- Applicability to experimental measures
- (ch 6 deals with estimating means, std dev
- of unknown populations. 6.5 Central Limit
Theorem) - C. Significance Tests for a Single Population
- 1. Z-tests for a single population (7.4)
- 2. T-tests for a single population (7.3)
- 3. Z-tests for two populations
- 4. T-tests for two populations (8.2)
2- Error in interpretation of statistical tests
- 1. Type I error (ch7, p228)
- 2. Type II error (ch7, p228)
- E. Regression (Ch 11)
- 1. Correlation
- 2. Linear Regression
- 3. Multiple Regression (11.9)
- F. Analysis of Variance (ANOVA)
- 1. F-tests
- 2. Multiple Comparison Procedures
- G. Data Fitting
- 1. Linear Least Squares
- 2. Chi-Squared Analysis (? 10.7?)
- H. Software for Biostatistical Analysis
- 1. SPSS
- 2. SAS
- Blast
- DNAStar
3B. The Normal Distribution 1. Relation of width
to standard deviation 2. Kurtosis 3. Skewness 4. A
pplicability to experimental measures
4Discrete Probability Distributions
5Discrete Probability Distribution
- Not necessarily finitely many
- possible outcomes
- List the probabilities of each
- outcome
- Sum of the probabilities of all
- outcomes must be one
Probability Mass Function for Discrete Random
Variable.
6Expected Value of a Discrete Random Variable
For discrete random variables, ( by the way,
for continuous random variables,
) Expected Value is NOT
the same as most likely outcome!!! ex What is
the expected value of the roll of a single
die? What is the most likely value of the roll
of a single die?
7Expected Value of a Discrete Random Variable
Expected Value is NOT the same as most likely
outcome!!!
8Examples
Ex 4.4 Many new drugs have been introduced in
the last several decades to bring hypertension
under control that is, to reduce high blood
pressure to normotensive levels. Suppose a
physician agrees to use a new antihypertensive
drug on a trial basis on the first 4 untreated
hypertensives she encounters in her practice,
before deciding whether to adopt the drug for
routine use. Let X the number of patients out
of 4 who are brought under control. Then X is a
discrete random variable, which takes on values
0, 1, 2, 3, 4. 4.6 Consider Example 4.4.
Suppose that from previous experience with the
drug, the drug company expects that for any
clinical practice the probability that 0 patients
out of 4 will be brought under control is .008, 1
patient out of 4 is .076, 2 patients out of 4 is
.265, 3 patients out of 4 is .411, and all 4
patients is .240. This probability mass
function, or probability distribution,
is 4.9 What is the expected value for the
random variable shown in the above table?
9Examples
Ex 4.8 How can the probability mass function
above be used to judge if the drug behaves with
the same efficacy in actual practice as predicted
by the drug company? The company might provide
the drug to 100 physicians and ask each of them
to treat their first 4 untreated hypertensives
with it. Each physicians would then report his
or her results to the drug company, and the
combined results could be compared with the
expected results from the table above. For
example, suppose out of the 100 physicians who
agree to participate, 19 bring all of their first
4 untreated hypertensives under control, 38 bring
3 out of 4 under control, 24 bring 2 under
control, and the remaining 9 bring only 1 under
control. The sample frequency distribution can
be compared with the probability
distribution. Ex 4.11 Compare the average number
of hypertensives actually brought under control
in the 100 clinical practices (denoted )
with the expected number brought under control (
) per 4-patient practices.
10Variance of Discrete Random Variable
Much easier to use!
Extremely useful but extremely vague comment
About 95 of the probability mass usually falls
within two standard deviations of the mean. But
not always.
11Examples
Look at variance of hypertension Already found
the mean. Do 95 of the mass fall within 2
standards of the mean?.
12Examples
Look at variance of hypertension Already found
the mean. Do 95 of the mass fall within 2
standards of the mean?.
13Examples
Look at variance of hypertension Already found
the mean. Do 95 of the mass fall within 2
standards of the mean?.
So within two standards of the mean means 2.799
2(.9168)
14Cumulative Distribution Function of a Discrete
Random Variable
Cumulative Distribution Function (CDF) of a
random variable X is denoted by (captial) F(X).
For a specific value x F(x) Pr(X x) So from
the previous probability distribution
probability mass function
CDF
15Some Discrete Distributions that people actually
use!
Binomial Distribution Poisson Distribution
16The Binomial Distribution
- Bernoulli trials
- identical,
- independent
- probability of a success is p, for each trial
- Repeat for a predetermined number of trials, n
- X number of successes is binomially
distributed
q1- p
17The Binomial Distribution
Ex flip one fair coin. Whats the probability
of it landing on heads? b. Flip two coins.
Whats the probability of getting 2 heads? 1
head? 0 heads? c. Flip ten coins. Whats the
probability of getting 10 heads? 8? 5? 0?
Ex 4.2.7 Pulmonary Disease. An investigator
notices that children develop chronic bronchitis
in the first year of life in 3 out of 20
households where both parents are chronic
bronchitics, as compared with the national
incidence rate of chronic bronchitis, which is
about 5 in the first year of life. Is this
difference real, or can it be attributed to
chance? Specifically, how likely are infants in
at least 3 out of 20 households to develop
chronic bronchitis if the probability of
developing disease in any one household is 0.05?
18The Binomial Distribution
Ex 4.2.7 Specifically, how likely are infants in
at least 3 out of 20 households to develop
chronic bronchitis if the probability of
developing disease in any one household is 0.05?
19The Binomial Distribution
Ex 4.2.7 Specifically, how likely are infants in
at least 3 out of 20 households to develop
chronic bronchitis if the probability of
developing disease in any one household is 0.05?
Probability of X at least 3 is same as (1-Pr(
Xlt3))
Why? So what?
20The Binomial Distribution
Ex 4.2.7 Specifically, how likely are infants in
at least 3 out of 20 households to develop
chronic bronchitis if the probability of
developing disease in any one household is 0.05?
Probability of X at least 3 is same as (1-Pr(
Xlt3))
21The Binomial Distribution
Ex 4.2.7 Specifically, how likely are infants in
at least 3 out of 20 households to develop
chronic bronchitis if the probability of
developing disease in any one household is 0.05?
Pretty unusual, but not VERY rare. Typically,
0.05 is a threshold for unusual What is the
expected number of cases? Relatively small
sample size. Could have used binomial table
Rosner, page 817
You know how to do the expectation for a discrete
random variable. But that sounds like a pain!
22The Binomial Distribution Expectation and
Variance
For a binomially distributed random
variable, E(X) np Var(X) npq
What was the expected number of children with
chronic bronchitis? Whats the variance? Recall
that (very roughly speaking), about 95 of the
probability mass falls within two standard
deviations of the mean. What does that mean in
the context of this example?
23The Binomial Distribution example two
Suppose we have a 40-base sequence of DNA, as
follows AGCTTCCGATCCGCTATAATCGTTAGTTGTTACACCTCTG
Absent any other knowledge about the DNA, might
assume that these bases are chosen at
random. Estimate the the probability that the
next site has an A. G? C? T? Suppose you look
further down on this string of DNA, at another 40
bases. Based on the above, how many As do you
expect to see? What is the probability of
finding exactly this many As? What is the
probability of finding at least this many As?
24The Poisson Distribution
Used to describe rare events
Expected number of events per unit time
Assumptions probability of observing one event
in a short time interval is proportional to the
length of time interval you watch for, so Pr(1
event) ? ?t, for some constant ?. Probability
of observing zero events in the same time period
is approximately 1- ??t. Probability of
observing more than one event in this time period
is essentially zero.
Probability of observing k events in time t, is
given by
Expected number of events over a period of
time Distribution is determined by a single
parameter mu!
Where µ ? t
Note can have indefinitely many trials, as
contrasted with the binomial distribution
25The Poisson Distribution
Probability of observing k events in time t, is
given by
Where µ ? t
Example 4.31, ex 4.33 Infectious diseases.
Consider the distribution of number of deaths
attributed to typhoid fever over a long period of
time, say one year. Assuming the probability of
a new death from typhoid fever in any one day is
very small, and the number of cases reported in
any two distinct periods of time are independent
random variables, then the number of deaths in a
one-year period will follow a Poisson
distribution. Suppose the number of deaths from
typhoid fever over a one-year period is Poisson
distributed with parameter µ 4.6. What is the
probability distribution of number of deaths over
a six month period? Three month period?
Expected number of deaths in one year
26The Poisson Distribution
Probability of observing k events in time t, is
given by
Where µ ? t
Expected number of deaths in one year
Suppose the number of deaths from typhoid fever
over a one-year period is Poisson distributed
with parameter µ 4.6. So ? 4.6/1 4.6
deaths per year. For a six month period, how
should the deaths be distributed? µ ? t
4.6.5 2.3
Expected number of deaths in six months
27The Poisson Distribution
Probability of observing k events in time t, is
given by
Where µ ? t
Expected number of deaths in one year
Suppose the number of deaths from typhoid fever
over a one-year period is Poisson distributed
with parameter µ 4.6. So ? 4.6/1 4.6
deaths per year. For a three month period, how
should the deaths be distributed? µ ? t
4.6.25 1.15
Expected number of deaths in three months
28The Poisson Distribution Mean and variance
Expected number of events over some period of
time (unitless)
Theyre both equal to µ! So if you have a
discrete distribution where the mean and variance
are about equal, it might be Poisson.
29The Poisson Distribution vs The Binomial
Distribution
Recall Binomial Distribution requires you to know
the number of trials in advance, but the Poisson
does not. Binomial distribution can be a pain to
work with. Whats 13!?, whats 13 choose
8? Recall that a binomial distribution is
uniquely determined by two parameters, n and p (q
comes along for free), and the Poisson by a
single parameter µ. Expected value of
binomial? Variance of the binomial? What can you
say about these two values, if p is small (and n
large also)? What advantage is there to using a
poisson approximation to the binomial
distribution? (how small is small enough for p?
How large is large enough for n?)
30The Poisson Distribution vs The Binomial
Distribution
Poisson approximation to the binomial
distribution is pretty ok when p lt .01, and n gt
100. Then the expected value for the binomial is
np 1.0. Example. How does the poisson
approximation measure up to the binomial? Compute
the exact binomial probabilities for X 0, 1, 2,
3, when p .01, n 100. Compute the poisson
approximation for the above.
31Continuous Probability Distributions
32Continuous Probability Distributions
- Variable is continuous.
- Area under the probability density function
(pdf) - curve is one
- Can talk about the probability that
- a measurement falls in an interval.
33Normal Distribution
- mean
- ? standard deviation
N(?, ?2) Standard Normal N(0,1)
34Normal Distribution
- mean
- ? standard deviation
N(?, ?2) Standard Normal N(0,1)
35Properties of the Standard Normal Distribution
36Properties of the Standard Normal Distribution
37Properties of the Standard Normal Distribution
38Properties of the Standard Normal Distribution
about 95 of the area falls within two
standard deviations of the mean
about 68 of the area falls within one
standard deviation of the mean
about 99.7 of the area falls within three
standard deviations of the mean
More values are located in table 3, page 825
39Expected Value
Recall that for discrete random
variables, For continuous random
variables, Expected Value is NOT the same as
most likely outcome!!! ex What is the expected
value of the roll of a single die?
40Expected Value
For continuous random variables, Expected
Value is NOT the same as most likely
outcome!!! ex 2
41Variance
For a discrete distribution
For a continuous distribution
42Moments of a distribution
Expected value, or mean, is a measure of the
middle of a distribution. Variance is a
measure of the spread of a distribution. These
are particular cases of what are called the
moments of a distribution.
rth moment about the origin rth moment about
the mean
Mean IS the first moment. Variance is the second
moment about the mean.
43Skewness
Skewness is a measure of the symmetry (or lack
thereof) of a distribution. Coefficient of
Skewness is given by the third moment about the
mean
The ?3 in the denominator makes the whole
expression dimensionless.
44Skewness
Not all continuous random variables have
symmetric bell shaped distributions. Cardiovascul
ar Disease Serum triglycerides is an asymmetric,
positively skewed, continuous random variable
whose probability density function is shown in
Figure 5.1 (This isnt really it!)
45Kurtosis
Another shape parameter is given by the
coefficient of kurtosis, which involves the
fourth moment about the mean. Roughly
speaking, the kurtosis of a pdf is a measure of
peakedness - how flat or peaked the pdf is.
46Kurtosis
Normal Mean 0 Variance 1 Skewness
coeff 0 Kurtosis coeff 0
Laplacian Mean 0 Variance 1 Skewness
coeff 0 Kurtosis coeff 1
The green curve is said to be leptokurtic it
is more sharply peaked than the Normal
distribution. This is reflected by a positive
Kurtosis coefficient. Note that all the lower
moments (including the variance) are identical.
47Kurtosis
Uniform Distribution
Normal Mean 0 Variance 1 Skewness
coeff 0 Kurtosis coeff 0
The green curve is said to be platykurtic it
is more flat-topped than the Normal distribution.
This is reflected by a negative Kurtosis
coefficient.
48(No Transcript)
49Cumulative Distribution Function (cdf)
50Symmetry of Normal Distribution
51Normal Distribution (more notation)
Def The (100u)th percentile of a standard
normal distribution is denoted by zu. It is
defined as
Pr(Xlt zu) u, where X N(0,1)
52Standardizing an Arbitrary Normal Distribution
Any Normal Distribution is determined by two
parameters - ? and ?. Know Cumulative
Distribution Function for N(0,1). Would like to
know CDF for any Normal Distribution.
Then Z (X- ?) / ?
Or
53Standardizing an Arbitrary Normal Distribution
Then Z (X- ?) / ?
Or
54Examples
Ex 5.20 - Hypertension. Suppose a mild
hypertensive is defined as A person whose
diastolic blood pressure is between 90 and 100 HG
inclusive, and the subjects are 35- to 45-year
old men whose blood pressures are normally
distributed with mean 80 and variance 144. What
is the probability that a randomly selected
person from this population will be mildly
hypertensive?
Ex 5.21 - Botany. Suppose tree diameters of a
certain species of tree from some defined forest
area are assumed to be normally distributed with
mean 8 in. and standard deviation 2 in. Find the
probability of a tree having an unusually large
diameter, which is defined as being greater than
12 in.
55Examples
Ex 5.22 - Cerebrovascular Disease. Diagnosing
stroke strictly on the basis of clinical symptoms
is difficult. A standard diagnostic test used in
clinical medicine to detect stroke in patients is
the angiogram. This test has some risks for the
patient, and researchers have developed several
noninvasive techniques that they hope will be as
effective as the angiogram. One such method uses
measurement of the cerebral blood flow (CBF) in
the brain, because stroke patients tend to have
lower CBF levels than normal. Assume that in the
general population, CBF is normally distributed
with mean 75 and standard deviation 17. A
patient is classified as being at risk for stroke
if his or her CBF is lower than 40. What
proportion of normal patients will be mistakenly
classified as being at risk?
Ex 5.23 - Ophthamology. Glaucoma is an eye
disease that is manifested by high intraocular
pressure. The distribution of intraocular
pressure (IOP) in the general population is
approximately normal with mean 16 mm Hg and
standard deviation 3 mm Hg. If the normal range
for intraocular pressure is considered to be
between 12 mm Hg and 20 mm Hg, then what
percentage of the general population would fall
within this range?