Fundamentals of Hypothesis Testing - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Fundamentals of Hypothesis Testing

Description:

H0 is called the null hypothesis. Correspondingly, H1 is called the alternative hypothesis. ... the observed data can be reasonably explained by this assumption. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 31
Provided by: suny73
Category:

less

Transcript and Presenter's Notes

Title: Fundamentals of Hypothesis Testing


1
Fundamentals of Hypothesis Testing
  • DS 101 Spring 2009

2
Central Limit Theorem
  • The Central Limit Theorem
  • Version 1 The sum of many random variables is
    (approximately) normally distributed.
  • Version 2 The sampling distribution of the
    sample mean can be approximated by normal
    distribution when sample size is large, say,
    ngt30.
  • Linear combinations of normally and independently
    distributed random variables are normally
    distributed.
  • Note whenever the population has a normal
    distribution, the sampling distribution of the
    sample mean is always normally distributed.

3
More about sampling
  • If we consider the process of selecting a simple
    random sample (sampling) as an experiment, a
    sample mean can be calculated for each
    experiment. Different experiment runs will most
    likely have different sample means. Thus, the
    sample mean (x-bar) is a random variable. As a
    result, just like any random variables, x-bar has
    a mean, a variance, and a probability
    distribution. Such a probability distribution of
    the sample mean is called sampling distribution.
  • According to the Central Limit Theorem, with a
    sufficient sample size, the sampling distribution
    of the mean approximately follows a normal
    distribution.

4
Interval estimation for parameters
  • We can build a confidence interval -- say, at the
    95 confidence level -- for the MEAN.
  • It means that, if you repeat the same experiment,
    i.e., pull the same number of samples from the
    same population, for 100 times, you expect to
    have your estimated mean in this interval for 95
    times.
  • Note this is NOT a 95 confidence interval for
    population data.

5
Interval Estimation using EXCEL
  • Interval Mean Estimate Marginal Error at the
    corresponding confidence level
  • At a 95 confidence level, Marginal Error 2
    Standard Error
  • (The standard deviation of the sampling
    distribution of the mean has a specific name
    Standard Error)
  • Excel Data Analysis ? Descriptive Statistics
  • In the dialog, mark the Confidence Level for
    Mean, and modify your confidence level if you
    wish.

6
95 CI for the Mean
  • 95 Confidence Interval (95 CI)
  • X 1.96S.E.
  • X 2S.E.

7
(No Transcript)
8
2x
9
s Unknown vs. s Known
  • In the above case, the population s.d. s is
    UNKNOWN and needs to be estimated from the data.
    The Standard Error and Marginal Error are also
    estimated from the current data.
  • In some other cases, the population s.d. s may be
    already known, e.g., estimated by extensive
    historical data. The Standard Error can then be
    directly calculated from the given population s.

10
s Known
  • In some cases, the population s.d. s is already
    known (e.g., by extensive historical data).
  • In these cases, the Standard Error can be
    calculated based on the given s and the number of
    trails in the sample (sample size n).
  • The st.dev. of the MEAN distribution, , can
    be calculated as follows
  • is the Standard Error.

11
  • Note that the MEAN generally follows a normal
    distribution . We want to find the
    corresponding lower and upper limits such that
    the area under the normal curve in between the
    two limits covers 95

Sampling Distribution of the MEAN
2.5
2.5
µ
12
  • Find the z-score that corresponds to a cumulative
    probability of 0.025
  • z - 1.96 -2
  • Find the z-score that corresponds to a cumulative
    probability of (1-0.025) 0.975
  • z 1.96 2
  • We denote Za/2 the z-score corresponding to a
    cumulative probability of (1- a/2) (i.e., an
    upper-tail probability of a/2), where a is the
    confidence level.
  • Z0.025 1.96 2

13
s Known
  • The Marginal Error at 95 confidence level with s
    known is
  • 95 Confidence Interval is
  • or

14
Example 95 C.I. w/ s known
  • Paper length s 0.02 inch (known)
  • Sample size n 100
  • Mean Estimate
  • Step 1 estimate the standard deviation of the
    sampling distribution of the mean (i.e., estimate
    the standard error)
  • Step 2 estimate the marginal error at the 95
    confidence level
  • Step 3 95 confidence interval is

15
  • What if we want to determine whether the mean
    paper length is equal to 11 inches. If not,
    something has gone wrong in the factory.
  • The mean is between 10.994 and 11.002 with 95
    confidence. This interval includes 11. We have no
    reason to believe that anything is wrong.
  • In fact, we are testing the following hypothesis
  • H0 µ 11
  • H1 µ ? 11

16
Last years exam question.
  • The response time of a computer server is an
    important quality characteristic. It is known
    that the standard deviation of response time to a
    specific command is 8 millisec. The system
    manager wants to know whether the mean response
    time is 75 millisec. In his test, the command is
    executed 25 times and the response time for each
    trail is recorded. The sample average response
    time is 79.25 millisec. Is the mean resonse time
    75 minisec?

17
Hypothesis Testing
18
  • H0 is called the null hypothesis.
  • Correspondingly, H1 is called the alternative
    hypothesis.
  • (Sometimes we write H0 and Ha, a for
    alternative).
  • The analysis is built upon a hypothetical normal
    distribution based on H0.
  • If there is significant statistical evidence that
    H0 is not true, we reject H0.
  • Otherwise, we do not reject H0.

19
Example
  • As a manager of a fast-food restaurant, you want
    to determine whether the mean waiting time to
    place an order has changed in the past month from
    its previous value of 4.5 minutes.
  • H0 µ 4.5 min
  • H1 µ ? 4.5 min

20
  • Suppose you wish to determine whether the mean
    freezing point of milk is less than or equal to
    -0.545C.
  • H0 µ -0.545C
  • H1 µ gt -0.545C

21
  • We wish to determine whether the fraction
    nonconforming is 10 out of a manufacturing
    process.
  • H0 p 0.1
  • H1 p ? 0.1

22
  • The output voltage of a power supply is normally
    distributed. We wish to determine whether the
    variance of the output voltage is equal to 1V2.
  • H0 s2 1
  • H1 s2 ? 1

23
  • You should be able to formulate your null and
    alternative hypotheses appropriately for decision
    making purposes.

24
  • Typical hypothesis tests for the mean
  • H0 µ a H0 µ a H0 µ a
  • H1 µ ? a H1 µ gt a H1 µ lt a
  • Typical hypothesis tests for the variance
  • H0 s2 b2 H0 s2 b2 H0 s2 b2
  • H1 s2 ? b2 H1 s2 gt b2 H1 s2 lt b2
  • Typical hypothesis tests for the population
    proportion
  • H0 p c H0 p c H0 p c
  • H1 p ? c H1 p gt c H1 p lt c

Two-Sided Test or Two-Tailed
Test
Upper One-Sided Test or Upper
One-Tailed Test
Lower One-Sided Test or Lower
One-Tailed Test
25
Statistically Significant?
  • Note that H0 is never formally accepted. We
    first assume that H0 is true, then perform the
    test to determine if the observed data can be
    reasonably explained by this assumption. If not,
    then we have a statistically significant proof
    that H0 should be rejected. Thus, a hypothesis
    test is also called a significance test.

26
(No Transcript)
27
  • When a hypothesis test is viewed as a decision
    procedure, two types of error are possible.

28
  • P(Type I error) P(Reject H0 H0 is true)
    producers risk
  • P(Type II error) P(Fail to reject H0 H0 is
    false) consumers risk
  • The power of the test 1 - P(Type II error)
    P(Reject H0 H0 is false) this is the
    probability that the test correctly rejects H0.
  • The level of significance is the upper bound on
    P(Type I error). The test is required to
    satisfy
  • P(Type I error) P(Reject H0 H0 is true) a
  • We typically set a0.05.
  • p-value is the smallest level of significance
    that would lead to the rejection of H0. We
    reject H0 if p-value a.

29
  • Hypothesis about a population mean s known
    z-test
  • Hypothesis about a population mean s unknown
    t-test
  • Hypothesis about a population variance
  • Chi-square test
  • Hypothesis about a population proportion z-test
  • Testing Method calculate the test statistic and
    compare the test statistic with the critical
    value(s) corresponding to the test.

30
  • As an alternative, we can always use the
  • P-value approach.
  • p-value is the smallest level of significance
    that would lead to a rejection of H0.
  • We reject H0 if p-value a.
  • We often use a 0.05

31
Questions?
Write a Comment
User Comments (0)
About PowerShow.com