Title: Introduction to Inference: Confidence Intervals and Hypothesis Testing
1Introduction to InferenceConfidence
Intervalsand Hypothesis Testing
Presentation 4
First Part
2What is inference?
- Inference is when we use a sample to make
conclusions about a population.
2. Describe the SAMPLE
1. Draw a Representative SAMPLE from the
POPULATION
Var 1 Var 2 Va 3
459 Brown 28
657 Red 43
321 Green 46
213 Blue 47
536 Blue 53
3. Use Rules of Probability and Statistics to
make Conclusions about the POPULATION from the
SAMPLE.
3Population Parameters
- p population proportion
- µ population mean
- s population standard deviation
- ß1 population slope (we will see later)
Sample Statistics
- sample proportion
- sample mean
- s sample standard deviation
- b1 sample slope (we will see later)
4Two Types of Inference
- Confidence Intervals
- Confidence Intervals give us a range in which the
population parameter is likely to fall. - We use confidence intervals whenever the research
question calls for an estimation of a population
parameter. - Example Estimate the proportion of US
adult women who would vote for Hillary Clinton as
president. - Example What is the mean age of trees in
the forest? -
5Two Types of Inference, Cont
- 2. Hypothesis Testing
- Hypothesis tests are tests of population
parameters. - Example Is the proportion of US adult women
who would vote for Hillary Clinton greater than
50? - We can only prove that a population parameter is
different than our null value. We cannot prove
that a population parameter is equal to some
value. - Example
- Valid Hypothesis Is the mean age of trees in
the forest greater than 50 years? - Invalid Hypothesis Is the mean age of trees in
the forest equal to 50 years? -
6Types of CIs and Hypothesis Tests
- For Hypothesis Tests and C.I.s
- 1-proportion (1-categorical variable)
- 1-mean (1-quantitative variable)
- Difference in 2 proportions (2-categorical
variables, both with 2 possible outcomes) - Difference in 2 means (1-quantitative and
1-categorical variable, or 2-quantitative
variables, independent samples) - Regression, Slope (2-quantitative variables)
- For Hypothesis Tests only
- Chi-Square Test (2-categorical variables, at
least one with 3 or more levels!)
7 Some Examples
- Polina wants to estimate the mean high-school GPA
of incoming freshman at FIT. - Solution- CI for one population mean.
- Pampos wants to know if the proportion of PSU
students who engage in under age drinking is
greater than 25. - Solution- Hypothesis test of one proportion
- Null Hypothesis H0 p .25
- Alternative Hypothesis Ha p gt .25
- Isaac wants to estimate the difference in the
proportion of men and women who smoke. - Solution- CI for difference in 2-proportions.
8Interpreting Confidence Intervals
- Given the confidence level, 90, 95, 99, etc.
- conclude the following (let L confidence
level) - With L confidence the population parameter
is - within the confidence interval.
- Example Suppose the 90 CI for age of trees in
the forest is (32,45) years. - We are 90 confident that the true mean age of
trees in the forest is between 32 and 45 years.
9Interpreting Hypothesis Tests
- There are two hypotheses, the null and the
alternative. The research aim is to to prove the
alternative hypothesis significant. - Use the p-value to determine whether we can
reject the null hypothesis (H0). - At this point we dont need to know the exact
definition, or how to calculate the p-value. But
generally, the p-value is a measure of how
consistent the data is with the null hypothesis.
A small p-value (lt.05) indicates the data we
obtained was UNLIKELY under the null hypothesis.
- Decision Rule
- If the p-value is lt.05 we REJECT the null
hypothesis, and accept the alternative. We have a
statistically significant result! - If the p-value is gt.05 then we say that we do
NOT have enough evidence in the data to reject
the null hypothesis.
10Second Part
Confidence Intervalsfor 1-Proportion
11Sample Proportion
- Mean for E( ) p
- StdDev for s.d.( )
- Standard Error of s.e.( )
- If np and n(1-p) are greater than or equal to 10,
the sampling distribution of is
approximately - normal with mean p and standard deviation
i.e.
12From Sampling Distributions to Confidence
Intervals
- The sample proportion will fall close to the true
(unknown) proportion. - Thus, the true proportion is likely to be close
to the observed sample proportion. How close? - 95 of the would be expected to fall within
2 standard deviations of the true proportion p. - SO if we were to construct intervals around the
sample proportion with a width of 2 standard
deviations these intervals would contain the TRUE
population proportion 95 of the time!
13Margin of Error C.I.
- is an estimator of p but it is not exactly
equal to p. - But how far is from p? Or, how far is p from
? - Margin of Error is a measure of accuracy
providing a likely upper limit for the difference
between and p. - In other words, this difference is almost always
less than the Margin of Error, i.e. -
- The almost always is translated with large
probability. - Usually we are talking about 90, 95 or 99
probability.
14Margin of Error C.I., Cont
- This probability is the confidence level.
- For example, if the confidence level is 95, it
means that 95 of the times the difference
between and p is less than the Margin of
Error. (e.g. we expect 38 out of 40 samples to
give a such that its difference with p is less
than the Margin of Error.) - Example Based on a sample of 1000 voters, the
proportion of voters who favor candidate A are
34 with a 3 Margin of Error based on a 95
confidence level. What does this tell us?
15Confidence Interval for 1-proprtion
- Conditions We need to have
- Note that we are using instead of p here!
- CI for p
-
- M multiplier, depends on the level of
confidence desired. For a 95 CI the multiplier
is 2. - SE( ) is the standard error of the sample
proportion. - Margin of Error the multiplier times the SE
- Interpretation
- If M2, we are 95 confident that the true
population proportion is contained within the
confidence interval.
Margin of Error
16- Example 1 A sample of 1200 people is polled to
determine the percentage that are in favor of
candidate A. Suppose 580 say they are in favor.
Construct a 95 CI for the true population
proportion.
Conclusion We are 95 confident that the true
population proportion of those who support
candidate A is between 45.5 and 51.2.
17Example 2
- 300 high-risk patients received an experimental
AIDS vaccine. The patients were followed for a
period of 5 years and ultimately 53 came down
with the virus. Assuming all patients were
exposed to the virus construct a 99 CI for the
proportion of individuals protected. - 99 CI MSE( )
- 247/300 .823
- SE( ) sqrt(.823(1-.823)/300)
.0220 - M 2.58
- Can you see why M2.58 using the Normal table?
- So 99 CI .823 /- 2.58.0220
(.767,.880) - We are 99 confident that the true proportion of
those protected by the vaccine is between 76.7
and 88.0.
18Width of a Confidence Interval is affected by
- n as the sample size increases the
standard error of decreases and the confidence
interval gets smaller. So a larger sample size
gives us a more precise estimate of p. - M as the confidence level increases, M
the multiplier increases leading to a wider
confidence interval. -
- So, if we want to control the length of the
C.I. we can adjust the confidence level or the
sample size...