Statistics for Linguistics Students - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics for Linguistics Students

Description:

www.phon.ox.ac.uk/~bettina. Overview. Discussion of last assignment ... You construct 20 sentences, which can appear in two different contexts, say ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 47
Provided by: phon5
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Linguistics Students


1
Statistics for Linguistics Students
  • Michaelmas 2004
  • Week 4
  • Bettina Braun
  • www.phon.ox.ac.uk/bettina

2
Overview
  • Discussion of last assignment
  • z-distribution vs. t-distribution
  • Between-subjects design vs. Within-subjects
    design
  • t-tests
  • for independent samples
  • for dependent samples

3
Exercise z-scores
1) The mean pause duration in a read text is
200ms with a standard deviation of 50ms. For the
calculations please specify how you reached your
conclusion! a) Is this a statistic or a
parameter? If we are interested in describing
this particular read test, then its a parameter.
If we use this text to draw inferences about
pause duration in any text then its a
statistic. b) What proportion of the data is
above 70ms?z2.60.47 of the data lie below
70ms99.53 of the data lie above 70ms c) What
proportion of the data falls between 100ms and
300ms?z22,28 lie below 100ms and 2.28 lie
above 300ms95.44 lie between 100ms and 300ms
4
Exercise sampling distribution
  • 2) If we have a sample size of 50, what does the
    sampling distribution of the means look like if
    the population is
  • U-shaped
  • skewed-left, and
  • normally distributed?
  • Because of the central limit theorem, the
    sampling distribution of the mean will be
    normally distributed, irrespective of the form of
    the parent distribution

5
Exercise central limit theorem, standard error
  • 3) What happens, if the sample size increases
    for the following statistics. Does the
  • estimated mean increase, decrease, or stay
    approximately the same? Why?Stays the same as
    the sample mean is an adequate estimate for the
    population mean (central limit theorem)
  • standard error increase, decrease, or stay
    approximately the same? Why?Standard error
    decreases with the square root of the sample size
    (see formula for standard error)

6
What are frequency data?
  • Number of subjects/events in a given category
  • You can then test whether the observed
    frequencies deviate from your expected
    frequencies
  • E.g. In an election, there is an a priori change
    of 50-50 for each candidate.

7
X2-test
  • Null-hypothesis there is no difference between
    expected and observed frequency
  • Data
  • Calculation

Kerry supporter Bush supporter
observed
expected
8
X2-test
  • Limitations
  • All raw data for X2 must be frequencies
  • Each subject or event is counted only once(if we
    wish to find out whether boys or girls are more
    likely to pass or fail a test, we might observe
    the performance of 100 children on a test. We may
    not observe the performance of 25 children on 4
    tests, however)
  • The total number of observations should be
    greater than 20
  • The expected frequency in any cell should be
    greater than 5

9
Looking up the p-value
  • Degrees of freedom
  • If there is one independent variabledf (a
    1)
  • Iif there are two independent variablesdf
    (a-1)(b-1)

10
Exercise dependent and independent variables
  • Generally, in hypothesis testing, the independent
    variable is hardly ever interval. Mostly it is
    nominal, or ordinal
  • Differentiate between
  • Number of independent variables (e.g. gender and
    exam year for score example gt 2)
  • Levels of an independent variable are the number
    of values it can take (e.g. gender generally 2)
  • The null-hypothesis is formulated to deny a
    relation between dependent and independent
    variable

11
Exercise dependent and independent variables
  • Imagine you have a text-to-speech synthesis
    system. You are interested to find out whether
    the acceptability (from 1 to 5) is increased if
    you model short pauses at syntactic phrases.
  • dependent variable acceptability (ordinal data)
  • independent variable TTS with/without pause
    model (2 levels)
  • Null-Hypothesis Duration model does not
    influence acceptability rating

12
Exercise dependent and independent variables
  • Subjects learned 20 nonsense-words presented
    visually. 30 minutes later they were tested for
    retention. The next day, the same subjects
    learned another 20 nonsense-words, this time in a
    combined visual and auditory presentation. Again,
    after 30 minutes they were tested for retention.
    The researcher measured the number of correct
    nonsense-words.
  • dependent variable number of correct responses
    (interval data)
  • independent variable kind of presentation (2
    levels)
  • null hypothesis The number of correct responses
    will be the same in the two conditions

13
Further influencing factors
  • Besides the independent variable, there might be
    further factors that influence your dependent
    variable.
  • Other factors might be confounded with our
    independent variable (e.g. in the nonword
    retention task, the audio-visual presentation was
    on a different day than the auditory
    presentation. Presentation kind can thus be
    confounded with presentation time)
  • Systematic error

14
Counterbalancing
  • To avoid confounding variables, the conditions
    have to be counterbalanced. Examle
  • Half the subjects are doing the auditory
    presentation first and the audio-visual
    presentation second
  • Half the subjects are doing the task in opposite
    order
  • We often have a group of subjects to perform the
    task (not just one subject)
  • Also, in linguistic research, we often use
    multiple repetitions or different lexicalisations
    for a given condition (e.g. different words that
    all have a CVCV strucure)

15
Exercise drawing error-bars
  • Variables need to have the correct type!
  • Error bars show the 95 confidence interval for
    the mean (i.e. the mean and the area where 95 of
    the data fall in)
  • One independent variable
  • Simple error bar for groups of variables
  • Two independent variables
  • Clustered error bar for groups of variables

16
Exercise drawing error-bars
Clustered error bars for two independent variables
17
Example testing if a sample is drawn from a
given population
  • A lecturer at Oxford University expects that
    students at this university have a higher
    IQ-score than the average British population.
  • Since records are taken, he knows that the mean
    IQ-score in Britain is 200 with a standard
    deviation of 32

18
Experimental Procedure
  • The Null-hypothesis H0 is that the IQ of Oxford
    students is no different from the general public.
  • He randomly selects 40 students and gives them
    the standard IQ test.
  • This results in an IQ-score of 210
  • Questions
  • Can he conclude that Oxford students have a
    higher IQ?
  • Can he compare his sample to the population?

19
Comparison to population
  • The sample mean cannot directly be compared to
    the whole population, but to the sampling
    distribution of the sample mean (with samples of
    size n40).
  • The sampling distribution has the same mean as
    the population (200) and the standard error of

20
Calculating z-score
  • Since the sampling distribution will be normally
    distributed (for n gt 30), we can calculate the
    z-score to see how likely a mean of 210 is, given
    the null-hypothesis were true

There is a chance of 2.4 that the sample mean
falls within the sampling distribution
21
What if the population is unknown?
  • Often, we compare two different samples and we do
    not know the population parameters (e.g. are
    exam scores of the year 1990 and 2000 from the
    same distribution?)
  • Independent variable ( levels?)
  • Dependent variable (type?)

22
What if the population is unknown?
  • Often, we compare two different samples and we do
    not know the population parameters (e.g. are
    exam scores of the year 1990 and 2000 from the
    same distribution?)
  • Independent variable ( levels?)exam year (2
    levels)
  • Dependent variable (type?)exam score (interval
    data)

23
Hypothesis
  • Null-hypothesis The scores in the 2 exam years
    were drawn from the same distribution
  • Comparison of the means of the two populations
    (estimated from two representatitve samples)
  • What statistical test do we have to perform?

24
Between-subjects design (completely randomised)
  • All comparisons between the different conditions
    are based on comparisons between different
    (groups of) subjects
  • Each subject provides data for only one research
    condition
  • ExampleYou want to test whether the pitch of
    children under the age of 10 is dependent on
    their gender (a given child is either male or
    female!)

25
Within-subjects design (repeated measures)
  • All comparisions between different conditions are
    based on comparisons within the same group of
    subjects
  • Each subject provides data for all experimental
    conditions (as many scores as experimental
    conditions)
  • ExampleYou want to test whether the number of
    reading errors is higher when a subject is sober
    or slightly drunk.

26
Why is this difference important?
  • On average, two scores from P1 and two scores
    from P2 will be more alike than two scores, one
    from P1 and one from P2
  • Scores from one person on the same task will be
    correlated this is taken into account by
    within-subjects tests.
  • If between-subjects test is used for
    within-subjects design, we may fail to find an
    effect (type II error)
  • If within-subjects test is used for
    between-subjects design, we might find an effect
    that is actually not there (type I error)

27
Example
  • You want to test whether the precontext has an
    effect on the prosodic realisation of
    sentence-initial accents.
  • You construct 20 sentences, which can appear in
    two different contexts, say contrastive and
    non-contrastive.
  • Then you ask 20 subjects to read the 40 short
    paragraphs and measure the pitch height of the
    initial accent and the duration of the initial
    word.
  • You want to know if accents are realised
    differently in contrastive and non-contrastive
    context.

28
Difficult cases
  • Different classes of dependent variables
  • If you are interested in articulatory precision
    at two different speech rates, you might measure
    the formant values of the vowels and the number
    of sound elisions
  • These two dependent variables are taken from the
    same speaker but this is not a within-subjects
    design

29
Difficult cases
  • More than one measurement per subject, combined
    to give one score
  • You are interested in the formant values of male
    and female /a/. You have a list of 20 words,
    containing an /a/. Each group of 10 speakers
    reads the 20 words and you measure the formant
    values. Then you build the mean formant value of
    /a/ for every speaker
  • Since the analysis is performed on only one score
    per subject, no within-subjects design

30
Which statistical test, when youve score data
(parametric tests)?
Between, within, mixed?
Significance test
Number of indepen-dent variables?
Indep. t-Test (2 levels)
One
One-way ANoVA
Between
Two-/Three-way ANoVA
More than one
Paired t-Test (2 levels)
One
a x s ANoVA
Within
b x b (x c) x s ANoVA
More than one
Mixed
31
Assumptions for statistical tests on score data
(parametric tests)
  • The scores must be from an interval scale
  • The scores must be normally distributed in the
    population
  • The variances in the conditions must be
    homogenious
  • Note You can perform parametric tests only if
    these assumptions are met!

32
T-Test
  • Students T-test
  • How likely is it that two samples are taken from
    the same population?
  • T-test looks at the ratio of the difference in
    group means to the variance

Sample 1 Sample 2
Figure taken from http//esa21.kennesaw.edu/module
s/basics/exercise3/3-8.htm
33
T-Tests
  • Calculating t-statistic
  • Comparable to z-statistic, but dependent on the
    degrees of freedoms (df)
  • Degrees of freedom (df)
  • Independent t-test N1N2-2
  • Paired t-test N-1
  • The critical t-value for a 0.05 (5 risk of
    finding an effect that is not actually there) is
    dependent on df

34
T-distribution
  • The more degrees of freedom, the closer the
    closer the t- distribution is to the normal
    distribution

35
T-Table
36
One-tailed vs. two-tailed predictions
  • If we predict a direction of the difference, we
    are making a one-tailed prediction
  • If we predict that there is a difference
    (irrespective of direction), we are making a
    two-tailed prediction
  • If there is not enough evidence for a directional
    difference, a two-tailed test is safe.

37
Example
  • Hypothesis reaction time in cond a is
    significantly different from cond b
  • Null-hypothesis the reaction times are not
    different in conditions a and b

38
Independent t-test in SPSS
  • Organise independent and dependent variables in
    separate columns!

39
Independent t-test in SPSS
  • Independent variable(s)Test variable(s)
  • Dependent variableGrouping variable

You have to specify the levels of the independent
variable (can only have two!)
40
How to interpret the output?
Descriptive statistics
If p gt 0.05, variances are homogenious
There is an effect of condition on rt
41
How to interpret the output?
  • Group statistics (descriptive statistics for the
    conditions)
  • Independent samples test
  • Levenes test for equality of variances(if p gt
    0.05, then variances are homogenious)
  • t-test for equality of means
  • t-value
  • df (N-2)
  • Significance level (2-tailed)
  • mean difference (difference between the means)

42
What do we report?
  • There is a significant effect of condition on
    reaction time. The average reaction time in
    condition a was 238.7ms longer than in condition
    b (t 6.12, df 62, p lt 0.001).
  • Interpretation?

43
Paired t-test in SPSS
  • Variables of different conditions have to be in
    parallel columns.
  • Click on variables to compare and then

44
How to interpret the output?
  • Paired samples statistic (descriptive statistics)
  • Paired samples correlation (naturally, there
    should be a rather strong correlation. Subjects
    with a low rt will have a slow one in both
    conditions)
  • Paired samples t-test(t, df (N-1), significance
    level)

45
What if the basic assumptions are not met
  • For example
  • if the distributions are very skewed
  • if you have ordinal data instead of interval data
  • You have to use non-parametric tests
  • There is a whole range of non-parametric tests
    Ill only show the most common ones

46
Non-parametric statistical tests (for one
independent variable only)
Between, within, mixed?
Significance test
Number of levels of independent variable?
Mann-Whitney Test
Two
Between
Kruskal-Wallis Test
More than two
Two
Wilcoxon Signed Ranks Test
Within
Freedman Test
More than two
Write a Comment
User Comments (0)
About PowerShow.com