What is inference - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

What is inference

Description:

Inferential Statistics (McCall, part 2 and 3) Course Material ... The reason for mistakenly rejecting H0 is drawing a weird' sample. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 26
Provided by: mbu57
Category:

less

Transcript and Presenter's Notes

Title: What is inference


1
Inferential statistics by example
Maarten Buis Monday 2 January 2005
2
Two statistics courses
  • Descriptive Statistics (McCall, part 1)
  • Inferential Statistics (McCall, part 2 and 3)

3
Course Material
  • McCall Fundamental Statistics for Behavioral
    Sciences.
  • SPSS (available from Surfspot.nl)
  • Lectures 2 x a week
  • computer labs 1 x a week.
  • course website

4
setup of lectures
  • Recap of material assumed to be known
  • New Material
  • Student Recap

5
How to pass this course
  • Read assigned portions of McCall before each
    lecture
  • Do the exercises
  • Do the computer lab assignments, and hand them in
    before Tuesday 1700!
  • come to the computer lab
  • come to the lectures
  • ask questions during class or to the course
    mailing list

6
What is inference?
  • Drawing general conclusions from partial
    information
  • Based on your observations some conclusions are
    more plausible than others.
  • Compare with logic

7
Sources of uncertainty in inference
  • Sample
  • Measurement
  • Model
  • Typos when typing the data into SPSS
  • Inference, as discussed here, assumes that random
    sampling error is by far the most dominant source
    of uncertainty.

8
How is inference done?
  • If a null hypothesis is true than the probability
    of observing the data is so small that either we
    have drawn a very weird sample or the null
    hypothesis is false. (Ronald Fisher)
  • We use a good procedure to choose between two
    hypotheses, whereby good means that you draw
    the right conclusion in 95 of the times you use
    that procedure. (Jerzy Neyman and Egon Pearson)

9
PrdV
  • New populist party, wanted to participate in the
    next election if 41 of the Dutch population
    thought that the PrdV would be an asset to Dutch
    politics.
  • This was asked to a sample of 2,598 people
    between, and on 16 December only 31 agreed.
  • Peter R. de Vries decided not to participate in
    the next election.

10
The Inference Problem
  • The 31 people approving is 31 of the people in
    the sample.
  • Peter R. de Vries doesnt care about what people
    in the sample think, he cares about what all the
    people in the Netherlands think.
  • Could it be that he has drawn a weird sample,
    and that in the Netherlands 41 or more really
    think he would be an asset to Dutch politics?

11
Two hypotheses
  • H0 41 or more support PrdV
  • HA less than 41 support PrdV

12
A thought experiment (1)
  • If support for PrdV in the Netherlands is 41 and
    we draw 100 random samples of 2598 persons, than
    we get 100 estimates of the support for PrdV,
    some of them a bit too high, some of them a bit
    too low.
  • We would expect that 5 samples would show a
    support for PrdV of 39 or less.
  • If we find a support for PrdV of 39 or less and
    reject H0, than we have followed a procedure that
    would result in taking the right decision in 95
    of the times we used that procedure.

13
What does that 39 mean?
  • We propose the following procedure If we find a
    support for PrdV of less than x than reject H0
  • We choose x in such a way that the probability of
    rejecting H0 when we shouldnt is only 5
  • The reason for mistakenly rejecting H0 is drawing
    a weird sample.

14
Where does that 39 come from?
  • If H0 is true, than we draw a sample from a
    population in which the support for PrdV is 41
  • We can let the computer draw many (100,000)
    samples and calculate the mean in each sample.
  • 50,000 or 5 of these samples have a mean of 39
    or less.
  • So if we reject H0 when we find a support of 39
    or less, than the probability of making a mistake
    is 5

15
(No Transcript)
16
Where did that 39 come from?
  • If we draw many random samples, and compute the
    mean in each sample, than the distribution of
    these means will be approximately normally
    distributed with a mean of .41 and a standard
    deviation of
  • Remember that the sample size is 2598, and the
    SD of a proportion is , so the Standard
    Deviation of the distribution of means is
  • 5 of the samples has a support for PrdV of less
    than 39

17
Neyman Pearson hypothesis testing
  • This procedure is the Neyman Pearson hypothesis
    testing approach
  • Note that it tells us something quality of the
    procedure we use to make a decision, not about
    the strength of evidence against H0

18
Thought experiment (2)
  • If the H0 is true, than the probability of
    drawing a sample of size 2598 with a support for
    PrdV of 31 or less is 1.041 x 10-25.
  • This is so small that we think it is safe to
    reject H0.

19
Where did that 1.041 x 10-25 come from?
  • In the 100,000 samples that were drawn from the
    population if H0 were true none were lees than
    .31
  • So the probability of drawing this or a more
    extreme sample when H0 is true is less than
    1/100,000.
  • Remember that if H0 is true, the distribution of
    means obtained from many samples is normal with a
    mean of .41 and a standard deviation of .0096
  • The proportion of samples with a mean less than
    .31 is 1.041 x 10-25

20
Fisher hypothesis testing
  • This procedure is Fisher hypothesis testing.
  • Note that it gives us a measure of evidence
    against H0, but it does not give us an indication
    of how likely we are to make the wrong decision.

21
Fisher vs. Neyman Pearson
  • You will draw the same conclusion whichever
    method you use.
  • However, it really helps to choose one approach
    when writing your results down.

22
Limits to inference
  • More importantly, both assume random sampling,
    and we almost never have that.
  • Testing is more helpful to determine whether the
    data is screaming or whispering at us.
  • Knowing the reasoning behind statistical
    inference will help you determine the weight you
    should assign to conclusions derived from
    statistical tests.

23
Terminology (1)
  • Distribution means obtained from different
    samples is the sampling distribution of the mean.
  • The standard deviation of the sampling
    distribution is the standard error.
  • Proportion of samples that wrongly reject the H0
    is the significance level or a or Type I error
    rate.
  • Proportion of samples that wrongly fail to reject
    H0 is the Type II error rate or b.
  • Proportion of samples that will rightly reject H0
    is the power.

24
Terminology (2)
  • The probability of the data given that H0 is true
    is the p-value.
  • Maximum p-value that will cause you to reject H0
    is also the level of significance.

25
What to do before Wednesday?
  • Read Chapter 8
  • Do exercises of chapter 8
Write a Comment
User Comments (0)
About PowerShow.com