Statistics - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Statistics

Description:

The basic strategy of hypothesis testing is to try to support a research ... Tests of the proportion may be done by using a normal ... Testing Hypotheses ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 42
Provided by: sandyb2
Category:

less

Transcript and Presenter's Notes

Title: Statistics


1
Statistics Data Analysis
  • Course Number B01.1305
  • Course Section 31
  • Meeting Time Wednesday 6-850 pm

Hypothesis Testing
2
Class Outline
  • Review of midterm exam
  • Hypothesis Testing
  • One-sample tests
  • Two-sample tests
  • P-values
  • Relationship with Confidence Intervals

3
Review of Last Class
  • Statistical Inference
  • Point Estimation
  • Confidence Intervals

4
Reminder Statistical Inference
  • Problem of Inferential Statistics
  • Make inferences about one or more population
    parameters based on observable sample data
  • Forms of Inference
  • Point estimation single best guess regarding a
    population parameter
  • Interval estimation Specifies a reasonable
    range for the value of the parameter
  • Hypothesis testing Isolating a particular
    possible value for the parameter and testing if
    this value is plausible given the available data

5
Point Estimators
  • Computing a single statistic from the sample data
    to estimate a population parameter
  • Choosing a point estimator
  • What is the shape of the distribution?
  • Do you suspect outliers exist?
  • Plausible choices
  • Mean
  • Median
  • Mode
  • Trimmed Mean

6
Confidence Intervals
  • Specification of a probably range for a
    parameter
  • Used to understand how statistics may vary from
    sample to sample
  • States explicit allowance for random sampling
    error (not selection biases)
  • We have 95 confidence that the population
    parameter falls within the bounds of the interval
  • Orthe interval is the result of a process that
    in the long run has a 95 probability of being
    correct

7
Hypothesis Testing
  • Chapter 8

8
Overview
  • A research hypothesis typically states that there
    is a real change, a real difference, or real
    effect in the underlying population or process.
    The the opposite, null hypothesis, then states
    that there is no real change, difference, or
    effect
  • The basic strategy of hypothesis testing is to
    try to support a research hypothesis by showing
    that the sample results are highly unlikely,
    assuming the null hypothesis, and more likely,
    assuming the research hypothesis
  • The strategy can be implemented in equivalent to
    raise by creating a formal rejection region, by
    obtaining a plea value, were like seeking whether
    the null hypothesis value falls within a
    confidence interval
  • There are risks of false positive and a false
    negative errors
  • Tests of a mean usually are based on the
    t-distribution
  • Tests of the proportion may be done by using a
    normal approximation

9
Overview
  • Very often sample data will suggest that
    something relevant is happening in the underlying
    population
  • A sample of potential customers may show that a
    higher proportion prefer a new brand to the
    existing one
  • A sampling of telephone response time by
    reservation clerks may show an increase in
    average customer waiting time
  • A sample of the service times may indicate
    customers are receiving poorer service fan in the
    company thinks it is providing
  • The question of whether the apparent defects in
    the sample is an indication of something
    happening in the underlying population and more
    if he apparent effect is merely a fluke

10
What is Hypothesis Testing
  • Method for checking whether an apparent result
    from a sample could possibly be due to
    randomness
  • Checks on how strong the evidence is
  • Are sample data reflecting a real effect or
    random fluke?
  • Results of a hypothesis test indicate how good
    the evidence is, not how important the result is

11
Motivating Case Study 1
  • FCC has been receiving complaints from customers
    ordering new telephone service
  • Big telecommunications company tells the FCC that
    the average time a new customer has to wait for
    new service installation is 72 hours (excluding
    weekends) with a standard deviation of 24 hours
  • The FCC randomly samples 100 new customers from
    the telecom company and asks how long each had to
    wait for new service installation

12
Testing Hypotheses
  • Research Hypothesis, or Alternative Hypothesis is
    what the is trying to prove
  • Denoted Ha
  • Null Hypothesis is the denial of the research
    hypothesis. It is what is trying to be disproved
  • Denoted H0

13
Hypothesis Testing Components
  • Define research hypothesis direction
  • One-sided (lt or gt)
  • Two-sided (?)
  • Strategy is to attempt to support the research
    hypothesis by contradicting the null hypothesis
  • The null hypothesis is contradicted if when
    assuming it is true, the sample data are highly
    unlikely and more likely given the research
    hypothesis
  • Test Statistic Summary of the sample data

14
Basic Logic
  • Assume that H0 m72 is true
  • Calculate the value of the test statistic
  • Sample mean, proportion, etc.
  • If this value is highly unlikely, reject H0 and
    support Ha
  • We can use the sampling distribution to determine
    what values of the test statistic are
    sufficiently unlikely given the null hypothesis

15
Rejection Region
  • Specification of the rejection region must
    recognize the possibility of error
  • Type I Error Rejecting the null hypothesis when
    in fact it is true
  • In establishing a rejection region, we must
    specify the maximum tolerable probability of this
    type of error (denoted a)
  • Type II Error Failing to reject the null
    hypothesis when in fact it is false (beyond
    scope)
  • Rejection region can be based on sampling
    distribution of the sample statistic
  • Remember, we want to reject the null hypothesis
    if the value of the test statistic is highly
    unlikely assuming H0 is true
  • Can uses the tails of a normal distribution

16
Rejection Region
17
Rejection Region (cont)
  • To determine whether or not to reject the null
    hypothesis, we can compute the number of standard
    errors the sample statistic lies above the
    assumed population mean
  • This is done by computing a z-statistic for the
    sample mean

18
Rejection Region (cont)
19
Example
  • The FCC sample of 100 randomly selection new
    service customers resulted in a mean of 80 hours.
  • Setup the hypothesis test
  • Calculate the test statistic
  • Interpret the hypothesis

20
Example
  • A researcher claims that the amount of time urban
    preschool children age 3-5 watch television has a
    mean of 22.6 hours and a standard deviation of
    6.1 hours.
  • A market research firm believes this is too low
  • The television habits of a random sample of 60
    urban preschool children are measured and
    resulted in the following
  • Sample mean 25.2
  • Should the researchers claim be rejected at an a
    value of 0.01?

21
Summary for Z Test with s Known
22
Example
  • A researcher claims that the amount of time urban
    preschool children age 3-5 watch television has a
    mean of 22.6 hours and a standard deviation of
    6.1 hours.
  • A market research firm believes this is
    incorrect, but does not know in which direction
  • The television habits of a random sample of 60
    urban preschool children are measured and
    resulted in the following
  • Sample mean 25.2
  • Should the researchers claim be rejected at an a
    value of 0.01?

23
Z-values Worth Remembering
z0.05 1.645 z0.025 1.96 z0.01
2.326 z0.005 2.576
24
P-Value
  • Probability of a test statistic value equal to or
    more extreme than the actual observed value
  • Recall basic strategy
  • Hope to support the research hypothesis and
    reject the null hypothesis by showing that the
    data are highly unlikely assuming that the null
    hypothesis is true
  • As the test statistic gets farther into the
    rejection region, the data become more unlikely,
    hence the weight of evidence against the null
    hypothesis becomes more conclusive and p-value
    become smaller

25
P-Value (cont)
  • Small p-values indicate strong, conclusive
    evidence for rejecting the null hypothesis
  • Computation is straightforward in our z-test
    example
  • Compute the p-value for our telecom example

26
P-Value (cont)
  • P-value is also referred to as attained level of
    significance
  • Results of a test are said to be statistically
    significant at the specified p-value
  • Statistically significant says the difference
    between what is observed and what is assumed
    correct is most likely not due to random
    variation
  • It DOES NOT MEAN the difference is important!
  • It DOES NOT tell you that the difference is
    meaningful from business perspective (practical
    significance)
  • With large enough sample size, any difference can
    become meaningful

27
P-Value for a z Test
28
Hypothesis Testing with the t Distribution
  • Population standard deviation is rarely known
  • Basic ideas of hypothesis testing are not
    changed, we simply switch sampling distributions

29
T Test for Hypotheses about m
30
Example
  • Airline institutes a snake system waiting line
    at its counters to try to reduce the average
    waiting time
  • Mean waiting time under specific conditions with
    the previous system was 6.1.
  • A sample of 14 waiting times is taken
  • Sample mean 5.043
  • Standard deviation 2.266
  • Test the null hypothesis of no change against an
    appropriate research hypothesis using a0.10.
  • Calculate the rejection region
  • Calculate the t-statistic
  • Perform and interpret the hypothesis test
  • Calculate the associated p-value

31
Example
  • Performance based benefits are a way of giving
    employees more of a stake in their work
  • A study was conducted to find out how managers of
    343 firms view the effectiveness of various kinds
    of employee relations programs
  • Each rated the effect of employee stock ownership
    on product quality using a scale from 2 (large
    negative effect) to 2 (large positive effect).
  • Sample Mean 0.35
  • Standard Error 0.14
  • Do managers view employee stock ownership as a
    worthwhile technique?
  • Create a 95 confidence interval for the
    population parameter
  • Perform a hypothesis test that the population
    mean isnt equal to zero

32
Example
  • To help your restaurant marketing campaign target
    the right age levels, you want to find out if
    there is a statistically significant difference,
    on the average, between the age of your customers
    and the age of the general population in town,
    which is 43.1 years.
  • A random sample of 50 customers shows an average
    of 33.6 years with a standard deviation of 16.2
    years
  • Perform a two-sided test at the 1 significance
    level
  • What is the p-value?

33
t-Test Assumptions
  • Hypothesis tests allow for random variation, but
    not for bias
  • Measurements are statistically independent
  • Underlying population distribution should be
    symmetric
  • Skewness affects p-value

34
Hypothesis Testing a Proportion
  • We can also perform hypothesis tests for
    proportions / percentages by using a normal
    approximation to the binomial distribution

35
Testing a Population Proportion
36
Example
  • A company figures out that the launch of their
    new product will only be successful if more than
    23 of consumers try the product
  • Based on a pilot study based on 205 consumers,
    you expect 44.1 of consumers to try it
  • How sure are you that the percentage of people
    who will try the new product is above the
    break-even point of 23?

37
Using A Confidence Interval
  • Construct a confidence interval (say at 95
    confidence) in the usual way
  • If m0 is outside the interval, it is not a
    reasonable value for the population parameter and
    you fail to reject the research hypothesis
  • Why does this work?
  • Confidence interval says that the probability
    that the population parameter is in the random
    confidence interval is 0.95.
  • If the null hypothesis was true, then the
    probability that m0 is in the interval is also
    95
  • When the null is true, you will make the correct
    decision in 95 of all cases

38
R Tutorial on Hypothesis Testing
39
Testing Two Samples
  • Can test whether two samples are significantly
    different or not, on the average
  • Unpaired test test whether two independent
    columns of numbers are different
  • Paired test test whether two columns of numbers
    are different when there is a natural pairing
    between them

40
R Tutorial on Two Sample Hypothesis Testing
41
Next Time
  • Regression Analysis
Write a Comment
User Comments (0)
About PowerShow.com