Inference about the difference of statistical analysis - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Inference about the difference of statistical analysis

Description:

A sample of 5 students were selected to take an SAT preparatory course. ... Assumption: Independent samples, n1 30 and n2 30. ... Large-Sample Hypothesis Test of ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 36
Provided by: JJ16
Category:

less

Transcript and Presenter's Notes

Title: Inference about the difference of statistical analysis


1
Chapter 9
  • Inference about the difference of statistical
    analysis

2
Sec. 9.1 Introduction
  • Experiment design The procedure used to choose
    and assign subjects to the two groups.
  • Two types of design for comparative experiment
  • The sample assigned to group 1 is selected
    independently of the sample assigned to group 2.
  • matched design

3
1b. All observations in this group are
independent of each other. This is ok.
Samples from population 1.
Samples from population 2
We also need the measurements in each group to
come from a normal distn. This is probably ok.
1a. All observations in this group are
independent of each other. This is ok.
2. An observation in one group must be
independent of an observation in the opposing
group.
4
Example 9.1
  • A survey of 436 workers showed that 192 of them
    said that it was seriously unethical to monitor
    employee e-mail. When 121 senior-level bosses
    were surveyed, 40 said that it was seriously
    unethical to monitor employee e-mail.
  • Let ?w and ?B be the population proportion of
    workers and bosses that feel its unethical to
    monitor e-mail respectively.

5
Example 9.2
  • A sample of 5 students were selected to take an
    SAT preparatory course. They took the SAT exam
    before they took the course and then they took it
    again after the course.
  • Student A B C D E
  • SAT Before700 840 830 860 840
  • SAT After 720 840 820 900 870
  • Let ?B denote the mean TSE score before the
    course,
  • ?A the mean TSE score after the course.

6
Sec. 9.2 Inference about difference between two
population proportions
7
Example 9.1 (continued)Find 80 CI for ?w- ?B
  • First find an point estimate of this difference.

8
  • 2.The standard error of ?w- ?B is estimated by
  • Pw 192/436 0.4403 and
  • pB 40/1210.3305 respectively. This gives a
    standard error of ?w- ?B

9
  • 3. 80CI for ?w- ?B is
  • Where z
  • Hence the desired CI is

10
Confidence Interval for ?1 -?2
  • Application 2 Bernoulli populations
  • Assumption Independent samples, n1gt30 and n2gt30.
  • A (1-?)100 confidence interval for ?1 -?2 is
    given by

11
Exercise 9.1
  • A study suggests that nicotine-laced gum helps
    smokers to stop smoking. The study shows that 29
    of 106 smokers who chewed nicotine-laced gum
    remained smoke free for 1 year and 16 out of 100
    smokers who chewed regular gum remained smoke
    free for 1 year. Use this information to find a
    98 confidence interval for the difference
    between the proportions of smokers who
    successfully use nicotine-laced gum and those who
    successfully use regular gum.

12
Large-Sample Hypothesis Test of ?1-?2
13
Example 9.1 (continued)
  • Given those data, Is the evidence sufficient to
    suggest that the larger percentage of workers
    feel that its unethical to monitor email.
  • Solution 1.That is to test

Vs.
14
  • 2.Under H0 the standardized test statistic is
  • where p(19240)/(43640)0.4165. as an estimate
    for ? .
  • Plugging in Pw 192/436 0.4403 and pB
    40/1210.3305 yields the observed value of the
    test statistic zobs 2.1656.

15
  • 3.Similar to the one sample tests, we can make a
    decision by comparing the p-value to a.

Since p-value P(Z gt 2.1656) 0.015lt0.05. Based
on the data, we reject H0.i.e. there is
significant evidence that the larger percentage
of workers feel that its unethical to monitor
email.
16
Large-Sample Hypothesis Test of ?1-?2
  • Assumption the two samples are independent of
    each other
  • Observe p1 p2
  • construct hypotheses
  • test statistic , and sample distribution under
    null hypothesis
  • p1-p2 N(0, )
  • z
  • p-value of zobs (use z-table)
  • make decision

17
Exercise 9.2
  • The campaign manager for a presidential candidate
    wishes to test the claim that the proportion of
    Ohio voters who favor the candidate is at least
    as large as the proportion of California voters
    who favor the candidate. Given these data, test
    the manager's claim at a 5 level of significance.

18
Sec. 9.3 Inferences about difference between
two population means
19
Example 9.3
  • 1.)What is the 90 confidence interval of the
    difference between the mean salary of of
    statisticians in New York and those in
    Massachusetts?
  • 2.)Test if the mean salary of statisticians in
    New York significantly different from those in
    Massachusetts?
  • (a0.05)

20
  • 1.Point estimate of ?N- ?M is
  • 2. The standard error is
  • When the population variances are known
  • When the population variances are unknown

These are sample sizes for NY and Mass.
21
Recall
  • Think about the one sample case first.
  • When we test something about a single
  • mean, there were 2 cases to consider
  • s known which means we use the standard normal
    (Z) to make inferences
  • s unknown which means we use the t distribution
    to make inferences

22
A little more complicated
Use the standard normal to obtain p-values and
confidence intervals.
sN and sM are both known
sN ? sM. This is a 2 sample t-test. We use a
t-distn but the df has to be approximated.
sN and sM are both unknown
23
Two sample t-test
  • 3. So if sN and sM are both unknown,the
    standardized test statistic
  • has t distribution with degree of freedom


24
Note
  • For a conservative approach to the two-sample
    t-procedures, the degrees of
  • freedom are given by
  • Dfmin(nN-1, nM-1).

For the example concerning New York and Mass.
salaries, The degrees of freedom to use is
min(45-1, 37-1)36.
25
  • 4.For 1) The 90 confidence interval of
    is of form
  • Where t is the upper critical value of t(36)
    with confidence level .9
  • t1.684

RemarkI used 40 degrees of freedom since 36 is
not in the book
26
  • A 90 confidence interval for is
    (-1690, 5090).

27
  • 5. For 2) testing
  • Under H0,
  • the standardized test statistic
  • Conservatively

28
  • Given data
  • tobs

P-value2P(tgttobs)2P(t(36)gt0.8473) Since
0.8473lt.851 P(tgt0.8473)gtP(tgt.851).2 Then
P-valuegt.4 gt0.05 Based on the data, not
reject H0,i.e. there is inefficient evidence to
reject the null hypothesis and the difference
between the mean salary of statisticians in two
cities are not significant
29
Remark
  • Actually the df of t-statistic in this example is
  • The test might be proceeded by using t(75),but
    test result is the same

30
Inferences about difference between two
population means
  • Assumption the two samples are independent of
    each other
  • the estimator
  • t
  • CI for based on
  • Estimator t(standard error)
  • where t is based on confidence level (1-?) and

31
  • degree of freedom (df)
  • (round it down to the nearest integer)
  • A conservative approach to
  • dfmin(n1-1, n2-1)

32
Exercise 9.3
  • Wind speed data were gathered during January and
    July at the site proposed for a wind generator
    will be different in the two months. From the
    summary data, construct a 99 confidence interval
    for the difference between the mean wind speeds
    in January and July.

33
  • (3.934,10.466)
  • By conservative approach,
  • 7.2(2.75)(1.228),i.e.(3.823,10.577)

34
Exercise 9.4
  • Plastic grocery hags have almost replaced the
    standard brown paper bags at the supermarket. One
    particular company was trying to increase the
    tensile strength of the bags. These summary data
    are from two independent random samples and give
    the tensile strengths of plastic bags from two
    different production run
  • Sample 1
  • Sample 2
  • Determine whether there is a significant
    difference between the mean tensile strengths
    from the two production runs.

35
  • Df64
  • P-value2P(tgt3.45) 0.001
  • Reject H0,this small p-value indicates that the
    difference between the mean tensile strength s of
    the plastic bags from the two different
    production runs is highly significant.
  • By conservative approach
  • df31, P-value2P(tgt3.45)lt2P(tgt3.385)0.002lt0.05
  • Reject H0
Write a Comment
User Comments (0)
About PowerShow.com