Reading - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Reading

Description:

Polls in the British 1992 election put Labour and Tories at 38% and 39 ... 1. 'It's Fox News you talk to him' 2. Telling people what they want to hear. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 42
Provided by: darrenb
Category:
Tags: reading

less

Transcript and Presenter's Notes

Title: Reading


1
Lecture 11
2
Reading
  • You should have completed chapter 6.

3
Evidence of a correlation
  • The margin of error for red depends on the number
    of red marbles in the sample
  • 500 red marbles, so -0.05.
  • f(LR) 0.65
  • P(LR) 0.65 - .05
  • So 0.6 to 0.7 of the red population are large.
  • 100 non-red marbles, so - 0.10 f(LN) 0.35
  • P(LN) 0.35 -0.1
  • So 0.25 to 0.45 of the non-red population are
    large. fig. 6.3 whole

4
Lack of evidence for a correlation
  • Sample size 600 fig.
  • 500 red, 100 non-red
  • 270 / 500 large, f(LR) 270/500 0.54
  • P(LR) 0.54-0.05
  • So 0.49 to 0.59 of the red population are large.
  • 46/100 large, f(LN) 46/100 0.46
  • P(LN) 0.46-0.01
  • So 0.36 to 0.56 of the non-red population are
    large.

5
Estimating the Strength of Correlation
  • The maximum allowed difference is between the top
    of the higher interval and the bottom of the
    lower interval.
  • The minimum allowed difference is between the
    bottom of the higher interval and the top of the
    lower interval.
  • Fig. 6.3, 0.7-0.250.45 and 0.60-0.450.15
  • Estimated strength of the correlation is (0.45,
    0.15)
  • Fig 6.4, 0.59-0.360.25 and 0.49-0.56
  • Estimated strength of the correlation is (0.23,
    -0.07)
  • The negative number indicates that there may be
    no correlation.

6
  • The standard deviation of the difference is less
    than the standard deviation of the estimates.
  • This is because we are getting evidence from the
    whole sample of 600 marbles combined.
  • So rather than a certainty of 95, our estimate
    of the strength of correlation is 99 certain.

7
Statistical Significance
  • A correlation in the sample may be due to a
    correlation in the population, or it may be due
    to an accident of sampling.
  • A correlation is statistically significant iff it
    is unlikely to be an accident of sampling.
  • The sample frequencies are statistically
    significant iff the corresponding interval
    estimates do not overlap.
  • Iff the interval estimates do overlap, the
    differences in sample frequencies are not
    statistically significant.

8
(No Transcript)
9
Evaluating Statistical HypothesesThe Real World
Population
  • The data only tell us about the population
    actually sampled.
  • This is different from the population of
    interest.
  • E.g. a phone survey of voting intentions doesnt
    include people without phones.
  • The population actually sampled must have members
    who are all equally likely to be in the sample
    (random sampling).

10
The Sample Data
  • When evaluating a hypothesis, it will usually
    only be one part of the sample data that is
    relevant. That part must be identified.

11
The Statistical Model
  • Evaluating a statistical hypothesis consists in
    comparing the real world to a model.
  • Sometimes the model is suggested by the data,
    though not always.
  • The possible models we have are proportions,
    distributions and correlations.

12
Random Sampling
  • How well does the study fit random sampling?
  • Random sampling
  • a) All members of the population have an equal
    chance of being selected.
  • b) There is no correlation between the outcome of
    one selection and another.
  • This is an ideal that is only approximated in
    practice.

13
Evaluating the Hypothesis
  • Assuming random sampling, what does the data tell
    us?
  • What estimate for a proportion do we have, with
    what margin of error?
  • Is there evidence of a correlation?
  • Is it strong evidence?

14
Summary
  • How well does the data support the evaluation of
    stage 5?
  • Is the sample random enough for the hypothesis to
    be supported?
  • This depends on how random the sampling procedure
    was, and how strong the evidence is.

15
Non-random sampling 1Stratified Sampling
  • The sample should contain the same proportion of
    each sub-section as the population.
  • Example If the population of America is 20
    hispanic, the sample should be 20 hispanic.
  • Advantages More precision.
  • Can be administratively easier to focus on
    certain groups.
  • Disadvantages Requires identifying appropriate
    strata.
  • More complex to analyse results.

16
Non-random Sampling 2Cluster Sampling
  • If some areas of the population are out of reach,
    sample a cluster of individuals from those within
    reach.
  • Rather than sample one child from 30 schools,
    sample 30 children from one school.
  • Each cluster should be a small scale
    representation of the sample.
  • The individuals within the cluster should be as
    heterogenous as possible. (This is a weakness of
    the HBSC study.)
  • Advantage Cheaper / More practical. Saves travel
    time.
  • Disadvantages Higher margin of error.
  • Clusters may be similar to each other.

17
Case StudyHealth Behaviour in School-Aged
Children
  • Aim To discover the behaviour and concerns of
    school aged children related to health.
  • Hopeful outcomes
  • Identify specific groups at risk.
  • Understanding the factors that dispose people to
    develop health problems.
  • Develop effective intervention strategies.
  • (Did they interview the same students at a later
    date?)

18
  • How do you get a random sample of 11-15 year
    olds?
  • They used the school systems.
  • This excluded from the study all those not in the
    school system
  • Home-schooled
  • In detention centres
  • Homeless

19
  • A sample of over 123,000 from 28 countries was
    obtained.
  • This consisted of students answering survey
    questions.
  • What does this tell us about the proportion of
    children who like school a lot?

20
The Real World Population
  • The population of interest is children in the 28
    countries between 11 and 15.
  • The population sampled is children in school
    during the testing period of 11, 13 and 15 who
    were competent enough in their national language.

21
The Sample Data
  • The relevant piece of data is that 24 of
    respondents reported liking school a lot (p.169).

22
The Statistical Model
  • The model suggested is that the proportion of
    children who like school a lot is 24.

Dont like school a lot
Like school a lot
0.24
23
Random Sampling
  • How well does the study fit random sampling?
  • Random sampling
  • a) All members of the population have an equal
    chance of being selected.
  • Not satisfied. Cluster sampling Only certain
    schools were selected.
  • b) There is no correlation between the outcome of
    one selection and another.
  • Not fully satisifed. There is less variation
    within classes than between them e.g. students
    grouped by abilities.

24
Evaluating the Hypothesis
  • Assuming random sampling, what does the data tell
    us?
  • For a sample of 120 000, the margin of error is
    about 1
  • So the data suggests that 23-25 of children like
    school a lot.

25
Children 11-15
0.23-0.25 like school a lot
Estimate
Dont like school a lot
Like school a lot
0.24
n 120,000
26
Summary
  • How well does the data support the evaluation of
    stage 5?
  • Is the sample random enough for the hypothesis to
    be supported?
  • Cluster sampling rather than random sampling was
    used.
  • We have to rely to some extent on the care taken
    by the researchers. Given that it is a
    large-scale project by academics, it is
    reasonable to assume it was close to random
    sampling.
  • Thus, we have good evidence that the proportion
    of children who like school a lot is 23-25.

27
Problems with survey sampling 1Non-random
sampling
  • The Kinsey report (1948, 1953) Sample of
    convenience.
  • 10 of the sample were homosexual.
  • 50 of married males had extra-marital sex.
  • Self-Selection.
  • 25 of the sample were in prison, and 5 were
    male prostitutes.
  • Advantage Much larger sample size than would
    otherwise be possible.

28
  • 1936 election The Literary Digest predicted Alf
    Landon, the Republican candidate would win by a
    landslide.
  • But their sample was chosen from phone books and
    car registration details. The poor of the 1930s
    had neither.
  • George Gallup constructed his sample more
    carefully, and predicted that Roosevelt would win.

29
Problems with survey sampling 2False responses
(liars)
  • Sensitive subjects. Sex, drugs, bullying.
  • Telling people what they want to hear.
  • The Shy-Tory Factor
  • Polls in the British 1992 election put Labour and
    Tories at 38 and 39.
  • But the Tories won by 7.6.
  • An inquiry by the Market Research Society put the
    difference down to embarassed Tories.

30
Example Why do Fox News polls favour Republicans?
  • Fox News shows Obama with a 7 point lead.
    http//www.foxnews.com/polls/
  • Gallup shows him with a 10 point lead.
    www.gallup.org
  • 1. Its Fox News you talk to him
  • 2. Telling people what they want to hear.

31
Major expected sources of error in the current
polls
  • 1. Racism. As it is not socially acceptable in
    most of America to refuse to vote for a black
    candidate, voters who refuse to vote for a black
    candidate will not say so. The Bradley-effect.
    (False responses.)
  • 2. Samples are selected by phone numbers. Voters
    who only have cell-phones may be
    under-represented. If there is a correlation
    between voters with only cell-phones and voting,
    the poll may be inaccurate.

32
Exercise 6.13
  • Is there a correlation between having a college
    education and not drinking?

33
1. The Real World Population
  • American adults.

34
2. The Sample Data
  • Among those with a college education, 75
    classified themselves as either light or moderate
    drinkers.
  • 49 with a high school education gave these
    responses.

35
College education High School Education
Non-drinkers or heavy drinkers
Light or moderate drinkers
0.75
0.49
36
3. The Statistical Model
  • The model suggested is that there is a positive
    correlation between having a college education
    and being a light or moderate drinker.

37
4. Random Sampling
  • How well does the study fit random sampling?
  • In-home interviews.
  • Random sampling
  • a) All members of the population have an equal
    chance of being selected.
  • Were not told how the homes are selected.
  • The homeless are excluded.
  • b) There is no correlation between the outcome of
    one selection and another.
  • Were not told, but probably satisfied.

38
5. Evaluating the Hypothesis
  • Assuming random sampling, what does the data tell
    us?
  • For a sample of 500 the margin of error is 4.
  • Complication Were not told what proportion of
    the population had a college education.
  • For n250, margin of error is-0.6

39
Non-drinkers or heavy drinkers
0.81 0.69
0.55 0.42
Light or moderate drinkers
Non-drinkers or heavy drinkers
0.75
Light or moderate drinkers
0.49
n500 total
40
Strength of Correlation
  • 0.81-0.42 0.39
  • 0.69-0.55 0.14
  • So the estimated strength of correlation is
    0.39, 0.14.
  • As this is based on the whole sample of 500, we
    can be 99 of this conclusion.

41
6. Summary
  • How well does the data support the evaluation of
    stage 5?
  • Is the sample random enough for the hypothesis to
    be supported?
  • We have to decide based on the report and the
    context in which we find the report. Its
    reasonable to assume that this was a carefully
    conducted study, in which case we have good
    evidence for the conclusion that there is a
    moderate correlation between having a college
    education and light or moderate drinking.
Write a Comment
User Comments (0)
About PowerShow.com