Statistics for Linguistics Students - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics for Linguistics Students

Description:

Statistics for Linguistics Students Michaelmas 2004 Week 5 Bettina Braun www.phon.ox.ac.uk/~bettina Overview P-values How can we tell that data are taken from a ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 26
Provided by: phonOxAc
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Linguistics Students


1
Statistics for Linguistics Students
  • Michaelmas 2004
  • Week 5
  • Bettina Braun
  • www.phon.ox.ac.uk/bettina

2
Overview
  • P-values
  • How can we tell that data are taken from a normal
    distribution?
  • Speaker normalisation
  • Data aggregation
  • Practicals
  • Non-parametric tests

3
p-values
  • p-values for all tests tell us whether or not to
    reject the null hypothesis (and with what
    confidence)
  • In linguistic research, a confidence level of 95
    is often sufficient, some use 99
  • This decision is up to you. Note that the more
    stringent your confidence level, the more likely
    is a type II error (you dont find a difference
    that is actually there)

4
p-values
  • If you decide for a p-value of 0.05 (95
    certainty that there indeed is a significant
    difference), then a value smaller than 0.05
    indicates that you can reject the null-hypothesis
  • Remember the null-hypothesis generally predicts
    that there is no difference
  • If we find an output saying p 0.000, we cannot
    certainly say that it is not 0.00049 so we
    generally say p lt 0.001

5
p-values
  • So, in a t-test, if you have p 0.07 means that
    you cannot reject the null hypothesis that there
    is no difference? there is no significant
    difference between the two groups
  • In the Levene test for homogenity of variances,
    if p 0.001, then you have to reject the
    null-hypothesis that there is no difference ? so
    there is a difference in the variances for the
    two groups

6
Kolmogorov-Smirnov test
  • Parametric tests assume that the data are taken
    from normal distributions
  • Kolmogorov-Smirnov test can be used to compare
    actual data to normal distribution
  • -- the cumulative probabilities of values in the
    data are compared with the cumulative
    probabilities in a theoretical normal
    distribution
  • Null-hypothesis your sample is taken from a
    normal distribution

7
Kolmogorov-Smirnov test
  • Non-parametric test
  • Kolmogorov-Smirnoff statistic is the greatest
    difference in cumulative probabilities across
    range of values
  • If its value exceeds a threshold, null-hypothesis
    is to be rejected

8
Kolmogorov-Smirnov test
  • Kolmogorov test is not significant, i.e. the
    null-hypothesis that our sample is drawn from a
    normal distribution holds
  • The distribution can therefore be assumed to be
    normal Kolmogorov-Smirnov Z 0.59 p 0.9

9
Speaker normalisation
  • We often collect data from different subjects but
    we are not interested in the speaker differences
    (e.g. mean pitch height, average speaking rate)
  • We can convert the data to z-scores (which tell
    us how many sd away a given score is from the
    speaker mean)

10
Speaker normalisation in SPSS
  • First, you have the split the file according to
    the speakers (Data -gt split file)

11
Speaker normalisation in SPSS
  • Then, Analyze -gt Descriptive Statistics -gt
    Descriptives
  • This will create an output, but also a new column
    with z-values

12
Sorting data for within-subjects desings
13
Aggregating data
  • One can easily build a mean for different
    categories, preserving the structure of the SPSS
    table
  • Data -gt Aggregate
  • Independent variables you want to preserve are
    break variables
  • Dependent variables for which youd like to
    calculate the mean are Aggregated variables
  • Per default, new table will be stored as aggr.sav

14
Aggregating data
  • SPSS-dialogue-box

15
Non-parametric tests
  • If assumptions for parametric tests are not met,
    you have to do non-parametric tests.
  • They are statistically less powerful (i.e. they
    are more likely not to find a difference that is
    actually there Type I error)
  • On the other hand, if a non-parametric test shows
    a significant difference, you can draw strong
    conclusions

16
Mann-Whitney test
  • Non-parametric equivalent to independent t-test
  • Null-hypothesis The two samples we are comparing
    are from the same distribution
  • All data are ranked and calculations are done on
    the ranks

17
Wilcoxon Signed ranks test
  • Non-parametric equivalent to paired t-test
  • The absolute differences in the two conditions
    are ranked
  • Then the sign is added and the sum of the
    negative and positive ranks is compared
  • Requires that the two samples are drawn from
    populations with the same distribution shape (if
    this is not the case, use the Sign Test)

18
Examples
  • English is closer to German than French is
  • A teacher compares the marks of a group of German
    students who take English and French (according
    to the German system from 1 to 15)
  • His research hypothesis is that pupils have
    better marks in English than in French
  • One-tailed prediction!
  • File language_marks.sav

19
Example
  • For a one-tailed test divide the significance
    value bz 2
  • Marks in English are better than in French (Z
    -2.28, p 0.011)

20
What are frequency data?
  • Number of subjects/events in a given category
  • You can then test whether the observed
    frequencies deviate from your expected
    frequencies
  • E.g. In an election, there is an a priori change
    of 50-50 for each candidate.
  • Note that you must determine your expected
    frequencies beforehand

21
X2-test
  • Null-hypothesis there is no difference between
    expected and observed frequency
  • Data
  • Calculation

Kerry supporter Bushsupporter
observed 56 44
expected 50 50
22
X2-test example
  • Null-hypothesis there is no difference between
    expected and observed frequency
  • Data
  • Calculation

Kerry supporter Bushsupporter
observed
expected
23
Looking up the p-value
  • Calculated value for X2 must be larger than the
    one found in the table
  • Degrees of freedom
  • If there is one independent variabledf (a
    1)
  • Iif there are two independent variablesdf
    (a-1)(b-1)

24
X2-test
  • Limitations
  • All raw data for X2 must be frequencies (not
    percentages!)
  • Each subject or event is counted only once(if we
    wish to find out whether boys or girls are more
    likely to pass or fail a test, we might observe
    the performance of 100 children on a test. We may
    not observe the performance of 25 children on 4
    tests, however)
  • The total number of observations should be
    greater than 20
  • The expected frequency in any cell should be
    greater than 5

25
X2 as test of association
  • Calculation of expected frequencies
  • Cell freq

Apect Past tense Present tense total
Progressive 308 476 784
Non-progressive 315 297 612
Total 623 773 1396
Write a Comment
User Comments (0)
About PowerShow.com