STA 3024 Introduction to Statistical 2 - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

STA 3024 Introduction to Statistical 2

Description:

Confidence Intervals for single population (mean and proportion) ... The dog smell cancer example, p.417. The American working hours/week example, p. 428 ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 48
Provided by: drramonc
Category:

less

Transcript and Presenter's Notes

Title: STA 3024 Introduction to Statistical 2


1
STA 3024Introduction to Statistical 2
  • Review of STA2023

2
What Should Have Been Covered in STA2023
According to the Course Description
  • Basic Probability
  • Random Variable
  • Sampling Distribution
  • Confidence Intervals for single population (mean
    and proportion)
  • Hypothesis testing for single population (mean
    and proportion)
  • Comparison of 2 populations
  • Simple Linear regression (usually not covered)

These are in Chapters 1 10 of the Agresti and
Franklins book.
3
A key question of statistics
  • Key questions easy to understand, not easy to
    answer without the knowledge of a special field.
  • Example 1. A key question of mathematics Three
    angles of a triangle
  • Example 2 A key question of calculus The volume
    of a ball is 4?r3/3.

4
A Key Question in StatisticsNatural cure rate
of a disease 50A drug is invented and to be
tested. Suppose we have the following response.
How about 2 patients with 2 cures?
How about 100 patients with 100 cures?
5
Key Concept in StatisticsNatural cure rate of a
disease 50A drug is invented and to be tested.
Suppose we have the following responses.
Where does the no jump to yes?
6
There is no 100 correct statistical decision
Risk Risk of making a wrong decision Accidental
death rate 10-6/day in USA
How many patients should we recruit in the
beginning?
7
Another key question
  • Suppose I wish to know the percentage of voters
    who support public health insurance.
  • I have no hypothesis to test, but I am interested
    in estimating this proportion.
  • Suppose I asked 100 randomly selected persons and
    the yes answer was 65. Or suppose I asked 1000
    persons the answer was 650. In both cases, my
    answer would be 65, but I know their accuracies
    are different, but by how much? Do we need a
    large sample to make the estimate even more
    accurate?

8
Beyond cure rates
  • Survival time improved by a drug
  • Patient difference in age, gender, tumor size
    and/or genetic markers.
  • Cure rate in medicine affected rate in plants,
    accident death rate in car insurance, response
    rate in stimulus.
  • Survival time in medicine fruit weight in
    plants, accident payment in car insurance,
    response time to a stimulus.

9
Applications of StatisticsKey When there are
uncertainty in responseWhen the decision cannot
100 correct.
  • Effectiveness of new drugs or treatments
  • DNA evidence in court
  • Estimating the bowhead whale population
  • Corn yield by different fertilizers
  • Quality control of light bulbs
  • Public opinion by polls

10
Successful stories of polls
1992 US Presidential election predictions
Source, from newspaper a few days before the
election.
11
More on polls
Source Nov. 5 (Election day morning) USA Today
Both 2000 and 2004, the candidates (Bush vs Gore,
Kerry) were too close to call (within ?3). The
actual results showed the same.
It is difficult to reduce ?3 by sample size
alone. From mathematics to practice Random
sample, mind change, not telling mind
12
The next two elections, 2000 (Bush vs Gore) and
2004 (Bush vs Kerry) were too close to call
before the election. The final results confirmed
this fact. Now the 2008 election.
  • This map was drawn by the New York Times 3 - 1
    day before the election. All the state
    projections were correct. Toss-up states were
    extremely close.
  • It also predicted that Obama would get 52?2 and
    McCain 41?2 with 7 undecided.
  • The actual result is Obama 52.5 and McCain 46.
  • The total number of votes was 124,471,000.

13
Danger of treatment based on screening (I)
  • Source New England Journal of Medicine, Sep. 12,
    2002, pp. 781-789.
  • Randomized clinical trials in early prostate
    cancer, Radical prostatectomy group (n347),
    watchful waiting (n348).(Duration 1989-1999,
    median follow-up time 6.2 years)
  • It is obvious that there were less death due to
    prostate cancer in the surgical group, because
    the prostate had been removed. To claim
    effectiveness based on 6253 is unreasonable.
  • No expense and quality of life change is
    reflected in this table.

14
Danger of treatment based on screening (II)
  • Source The lancet, 2000, 355 129-43. The
    lancet, 2001, 358 1340-42.
  • Randomized clinical trials in mammography for
    breast cancer.
  • Malmö (Sweden) study (1988- 97screened
    21,088 control 21,195)
  • Canada study (1981 97 screened 44,925
    control 44,910)

15
Solution to the key questionWhat you need to
know beforehand?
  • What risk you can take on a wrong claim (to claim
    ineffective drug as effective).
  • What do you considered as a good drug that need
    to be detected with high probability.
  • Let the first answer to be a0.05
  • Let the second answer to be if the cure rate
    becomes larger than 0.6 (p1), I want at least 0.9
    (1-ß) probability to detected.

16
Two Key Distributions and Their Properties
17
A Derivation Used All the Time
18
(No Transcript)
19
(No Transcript)
20
More on the Normal Distribution
21
(No Transcript)
22
The Binomial Distribution
23
Elements in Hypothesis Testing (pp. 413 - 5)
24
From page 421
25
From page 432
26
Examples in the Book
  • The therapeutic guess example (pp. 409, 422)
  • The dog smell cancer example, p.417
  • The American working hours/week example, p. 428
  • The anorexic example p. 433

27
One-sided or Two sided Test?
  • In 2004 survey, 868 working women were asked and
    the sampling mean was 39.11 and sample standard
    deviation was 14.6. Can we conclude that women
    workers, on the average, worked less than 40
    hours/week? (pp. 429-431 of the book)

28
One Sided or Two Sided Test?
  • The key is that hypotheses should be formed
    before you look at the data.
  • The correct sequence is
  • You have a hypothesis you wish to confirm or
    discard.
  • You collect data to make a decision.
  • If it is a statistical decision, you report your
    conclusion with a p-value.
  • Let use the female working hours example (p.
    428). If before we did the survey, we had been
    interested to see whether American female workers
    works less than 40 hours/week, then it is a
    one-sided test. But if before the survey we
    wished to know whether female workers worked more
    or less than 40 hour, it is a two sided test.

29
The Conclusion of a Two Sided Test Is usually
One-sided.
However, you cannot use one-sided test to make
this one-sided conclusion, because the
hypotheses should be formed before you look at
the data. (see next page)
30
Form Hypothesis after Seeing Data Can Seriously
Bias the Conclusion
Actually, the risk is much higher if the
hypotheses was formed after you see the two 1s.
(see next page)
31
Since we usually do not know the side before
seeing the data, most tests are two sided,
although the final conclusion can be one-sided.
(This is also the books view. See p. 431
(Conclusion on womens working hours).
32
When There is not Hypothesis to Test, Just Facts
finding
  • 65 yes in a sample of 100, we feel the real
    percentage is 65.
  • 650 yes in a sample of 1000, we feel the real
    percentage is 65.
  • Which one is more accurate?
  • Idea In a single observation, we do not know,
    but on the whole, the large sample gives a more
    accurate estimate.
  • How to quantify this concept?
  • Let Y be the yes answers in a sample of size n
    and the true proportion of yes answers in the
    population is p.

33
Sacrifice for Environment Protection Example (pp.
362-365)
  • In 2000 General Social Survey, from n1154
    respondents Y518 said yes to the question Are
    you willing to pay higher gasoline price to
    protect the environment?
  • The sample proportion is 518/11540.45. How
    accurate is this estimate.
  • We are not 100 sure that the real proportion is
    45, but we may be able to say that we have high
    confidence that the real proportion is between,
    say 0.44 and 0.46.

34
Interval Estimate (Confidence Interval, C.I.)
35
Interval Estimate for the Mean
36
eBay Example (pp.378-79)
  • Sale of Palm M515PDA (handheld computer). Seven
    buyers chose the buy-it-now option (vs wait for
    other bidders).
  • The data (n7) 235,225,225,240,250,250,210.
  • Find a 95 C.I. for the mean.

37
Logic Behind the Intervals (p. 359)
38
Logic Behind the Intervals (p. 359)
39
Confidence interval for the mean (p. 378)
40
Any Advantage of Knowing the Derivation?
41
Sample Size Determination in Interval Estimation
42
Importance in Sample Size Determination
  • In most practical situations, an investigator
    needs to collect the data. It is unlikely that
    the data already exit somewhere for you to fetch.
    This is especially true for new ideas that
    require new experiments.
  • Before you collect the data, you need to
    determine the sample size.
  • How to determine the sample size in estimation?
    (v)
  • How to determine the sample size in hypothesis
    testing?

43
Importance of Type II Error (9.6, pp. 453-457)
44
Importance of Type II Error (continued)
45
(No Transcript)
46
The solution (2)
47
The solution (3)
Write a Comment
User Comments (0)
About PowerShow.com