Statistics for variationists - PowerPoint PPT Presentation

About This Presentation
Title:

Statistics for variationists

Description:

Statistics for variationists - or - what a linguist needs to know about statistics Sean Wallis Survey of English Usage University College London – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 54
Provided by: SeanW1
Category:

less

Transcript and Presenter's Notes

Title: Statistics for variationists


1
Statistics for variationists
  • - or -what a linguist needs to know about
    statistics

Sean Wallis Survey of English Usage University
College London s.wallis_at_ucl.ac.uk
2
Outline
  • What is the point of statistics?
  • Variationist corpus linguistics
  • How inferential statistics works
  • Introducing z tests
  • Two types (single-sample and two-sample)
  • How these tests are related to ?²
  • Effect size and comparing results of
    experiments
  • Methodological implications for corpus linguistics

3
What is the point of statistics?
  • Analyse data you already have
  • corpus linguistics
  • Design new experiments
  • collect new data, add annotation
  • experimental linguistics in the lab
  • Try new methods
  • pose the right question
  • We are going to focus onz and ?² tests

4
What is the point of statistics?
  • Analyse data you already have
  • corpus linguistics
  • Design new experiments
  • collect new data, add annotation
  • experimental linguistics in the lab
  • Try new methods
  • pose the right question
  • We are going to focus onz and ?² tests

observational science

experimental science

philosophy of science

a little maths
5
What is inferential statistics?
  • Suppose we carry out an experiment
  • We toss a coin 10 times and get 5 heads
  • How confident are we in the results?
  • Suppose we repeat the experiment
  • Will we get the same result again?
  • Inferential statistics is a method of inferring
    the behaviour of future ghost experiments from
    one experiment
  • We infer from the sample to the population
  • Let us consider one type of experiment
  • Linguistic alternation experiments

6
Alternation experiments
  • A variationist corpus paradigm
  • Imagine a speaker forming a sentence as a series
    of decisions/choices. They can
  • add choose to extend a phrase or clause, or stop
  • select choose between constructions
  • Choices will be constrained
  • grammatically
  • semantically

7
Alternation experiments
  • A variationist corpus paradigm
  • Imagine a speaker forming a sentence as a series
    of decisions/choices. They can
  • add choose to extend a phrase or clause, or stop
  • select choose between constructions
  • Choices will be constrained
  • grammatically
  • semantically
  • Research question
  • within these constraints,what factors influence
    the particular choice?

8
Alternation experiments
  • Laboratory experiment (cued)
  • pose the choice to subjects
  • observe the one they make
  • manipulate different potential influences
  • Observational experiment (uncued)
  • observe the choices speakers make when they make
    them (e.g. in a corpus)
  • extract data for different potential influences
  • sociolinguistic subdivide data by genre, etc
  • lexical/grammatical subdivide data by elements
    in surrounding context
  • BUT the alternate choice is counterfactual

9
Statistical assumptions
  • A random sample taken from the population
  • Not always easy to achieve
  • multiple cases from the same text and speakers,
    etc
  • may be limited historical data available
  • Be careful with data concentrated in a few texts
  • The sample is tiny compared to the population
  • This is easy to satisfy in linguistics!
  • Observations are free to vary (alternate)
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean
  • This requires slightly more explanation...

10
The Binomial distribution
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean P

F
  • We toss a coin 10 times, and get 5 heads

N 1
P
x
11
The Binomial distribution
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean P

F
  • Due to chance, some samples will have a higher or
    lower score

N 4
P
x
12
The Binomial distribution
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean P

F
  • Due to chance, some samples will have a higher or
    lower score

N 8
P
x
13
The Binomial distribution
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean P

F
  • Due to chance, some samples will have a higher or
    lower score

N 12
P
x
14
The Binomial distribution
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean P

F
  • Due to chance, some samples will have a higher or
    lower score

N 16
P
x
15
The Binomial distribution
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean P

F
  • Due to chance, some samples will have a higher or
    lower score

N 20
P
x
16
The Binomial distribution
  • Repeated sampling tends to form a Binomial
    distribution around the expected mean P

F
  • Due to chance, some samples will have a higher or
    lower score

N 24
P
x
17
Binomial ? Normal
  • The Binomial (discrete) distribution is close to
    the Normal (continuous) distribution

F
x
18
The central limit theorem
  • Any Normal distribution can be defined by only
    two variables and the Normal function z

? population mean P
? standard deviations ? P(1 P) / n
F
  • With more data in the experiment, s will be
    smaller

z . s
z . s
  • Divide x by 10 for probability scale

0.5
0.3
0.1
0.7
p
19
The central limit theorem
  • Any Normal distribution can be defined by only
    two variables and the Normal function z

? population mean P
? standard deviations ? P(1 P) / n
F
z . s
z . s
  • 95 of the curve is within 2 standard deviations
    of the expected mean
  • the correct figure is 1.95996!
  • the critical value of z for an error level of
    0.05.

2.5
2.5
95
0.5
0.3
0.1
0.7
p
20
The central limit theorem
  • Any Normal distribution can be defined by only
    two variables and the Normal function z

? population mean P
? standard deviations ? P(1 P) / n
F
z . s
z . s
za/2
  • the critical value of z for an error level a of
    0.05.

2.5
2.5
95
0.5
0.3
0.1
0.7
p
21
The single-sample z test...
  • Is an observation p gt z standard deviations from
    the expected (population) mean P?
  • If yes, p is significantly different from P

F
observation p
z . s
z . s
0.25
0.25
P
0.5
0.3
0.1
0.7
p
22
...gives us a confidence interval
  • P z . s is the confidence interval for P
  • We want to plot the interval about p

F
z . s
z . s
0.25
0.25
P
0.5
0.3
0.1
0.7
p
23
...gives us a confidence interval
  • P z . s is the confidence interval for P
  • We want to plot the interval about p

observation p
F
w
w
P
0.25
0.25
0.5
0.3
0.1
0.7
p
24
...gives us a confidence interval
  • The interval about p is called the Wilson score
    interval

observation p
  • This interval is asymmetric
  • It reflects the Normal interval about P
  • If P is at the upper limit of p,p is at the
    lower limit of P

F
w
w
P
0.25
0.25
(Wallis, to appear, a)
0.5
0.3
0.1
0.7
p
25
...gives us a confidence interval
  • The interval about p is called theWilson score
    interval

observation p
  • To calculate w and w we use this formula

F
w
w
P
0.25
0.25
(Wilson, 1927)
0.5
0.3
0.1
0.7
p
26
Plotting confidence intervals
  • Plotting modal shall/will over time (DCPSE)
  • Small amounts of data / year
  • Highly skewed p in some cases
  • p 0 or 1 (circled)
  • Confidence intervals identify the degree of
    certainty in our results

(Wallis, to appear, a)
27
Plotting confidence intervals
  • Probability of adding successive attributive
    adjective phrases (AJPs) to a NP in ICE-GB
  • x number of AJPs
  • NPs get longer ? adding AJPs is more difficult
  • The first two falls are significant, the last is
    not

28
2 x 1 goodness of fit ?² test
  • Same as single-sample z test for P (z² ?²)
  • Does the value of a affect p(b)?

F
p(b a)
z . s
z . s
p(b)
P p(b)
p(b a)
IV A a, a DV B b, b
p
29
2 x 1 goodness of fit ?² test
  • Same as single-sample z test for P (z² ?²)
  • Or Wilson test for p (by inversion)

F
p(b)
P p(b)
w
w
p(b a)
IV A a, a DV B b, b
p(b a)
p
30
The single-sample z test
  • Compares an observation with a given value
  • Compare p(b a) with p(b)
  • A goodness of fit test
  • Identical to a standard 2?1 ?² test
  • Note that p(b) is given
  • All of the variation is assumedto be in the
    estimate of p(b a)
  • Could also comparep(b a) with p(b)

p(b)
p(b a)
p(b a)
31
z test for 2 independent proportions
  • Method combine observed values
  • take the difference (subtract) p1 p2
  • calculate an averaged confidence interval

p2 p(b a)
F
O1
O2
p1 p(b a)
(Wallis, to appear, b)
p
32
z test for 2 independent proportions
  • New confidence interval D O1 O2
  • standard deviation s' ?p(1 p) (1/n1 1/n2)
  • p p(b)
  • comparez.s' with x p1 p2



?

x
D
z.s'
(Wallis, to appear, b)
0
mean x 0
? p
33
z test for 2 independent proportions
  • Identical to a standard 2?2 ?² test
  • So you can use the usual method!

34
z test for 2 independent proportions
  • Identical to a standard 2?2 ?² test
  • So you can use the usual method!
  • BUT 2?1 and 2?2 tests have different purposes
  • 2?1 goodness of fit compares single value a with
    superset A
  • assumes only a varies
  • 2?2 test compares two valuesa, a within a set A
  • both values may vary

A
g.o.f. c2
a
a
2 ? 2 c2
IV A a, a
35
z test for 2 independent proportions
  • Identical to a standard 2?2 ?² test
  • So you can use the usual method!
  • BUT 2?1 and 2?2 tests have different purposes
  • 2?1 goodness of fit compares single value a with
    superset A
  • assumes only a varies
  • 2?2 test compares two valuesa, a within a set A
  • both values may vary
  • Q Do we need ?²?

A
g.o.f. c2
a
a
2 ? 2 c2
IV A a, a
36
Larger ?² tests
  • ?² is popular because it can be applied to
    contingency tables with many values
  • r ? 1 goodness of fit ?² tests (r ? 2)
  • r ? c ?² tests for homogeneity (r,c ? 2)
  • z tests have 1 degree of freedom
  • strength significance is due to only one source
  • strength easy to plot values and confidence
    intervals
  • weakness multiple values may be unavoidable
  • With larger ?² tests, evaluate and simplify
  • Examine ?² contributions for each row or column
  • Focus on alternation - try to test for a speaker
    choice

37
How big is the effect?
  • These tests do not measure the strength of the
    interaction between two variables
  • They test whether the strength of an interaction
    is greater than would be expected by chance
  • With lots of data, a tiny change would be
    significant
  • Dont use ?², p or z values to compare two
    different experiments
  • A result significant at plt0.01 is not better
    than one significant at plt0.05
  • There are a number of ways of measuring
    association strength or effect size

38
How big is the effect?
  • Percentage swing
  • swing d p(a b) p(a b)
  • swing d d/p(a b)
  • frequently used (X increased by 50)
  • may have confidence intervals on change
  • can be misleading (50 then -50 is not
    zero)
  • one change, not sequence
  • over one value, not multiple values

39
How big is the effect?
  • Percentage swing
  • swing d p(a b) p(a b)
  • swing d d/p(a b)
  • frequently used (X increased by 50)
  • may have confidence intervals on change
  • can be misleading (50 then -50 is not
    zero)
  • one change, not sequence
  • over one value, not multiple values
  • Cramérs f
  • ? ?²/N (2?2) N grand total
  • ?c ?²/(k 1)N (r ?c ) k min(r, c)
  • measures degree of association of one variable
    with another (across all values)

?
?
40
Comparing experimental results
  • Suppose we have two similar experiments
  • How do we test if one result is significantly
    stronger than another?

41
Comparing experimental results
  • Suppose we have two similar experiments
  • How do we test if one result is significantly
    stronger than another?
  • Test swings
  • z test for two samples from different
    populations
  • Use s' s12 s22
  • Test d1(a) d2(a) gt z.s'

0
?
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
d1(a)
d2(a)
-0.7
(Wallis 2011)
42
Comparing experimental results
  • Suppose we have two similar experiments
  • How do we test if one result is significantly
    stronger than another?
  • Test swings
  • z test for two samples from different
    populations
  • Use s' s12 s22
  • Test d1(a) d2(a) gt z.s'
  • Same method can be used to compare other z or
    ?² tests

0
?
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
d1(a)
d2(a)
-0.7
(Wallis 2011)
43
Modern improvements on z and ?²
  • Continuity correction for small n
  • Yates ?2 test
  • errs on side of caution
  • can also be applied to Wilson interval
  • Newcombe (1998) improves on 2?2 ?² test
  • combines two Wilson score intervals
  • performs better than ?² and log-likelihood (etc.)
    for low-frequency events or small samples
  • However, for corpus linguists, there remains one
    outstanding problem...

44
Experimental design
  • Each observation should be free to vary
  • i.e. p can be any value from 0 to 1

p(b words)
p(b VPs)
p(b tensed VPs)
b1
b2
45
Experimental design
  • Each observation should be free to vary
  • i.e. p can be any value from 0 to 1
  • However many people use these methods
    incorrectly
  • e.g. citation per million words
  • what does this actually mean?

p(b words)
p(b VPs)
p(b tensed VPs)
b1
b2
46
Experimental design
  • Each observation should be free to vary
  • i.e. p can be any value from 0 to 1
  • However many people use these methods
    incorrectly
  • e.g. citation per million words
  • what does this actually mean?
  • Baseline should be choice
  • Experimentalists can design choice into
    experiment
  • Corpus linguists have to infer when speakers had
    opportunity to choose, counterfactually

p(b words)
p(b VPs)
p(b tensed VPs)
b1
b2
47
A methodological progression
  • Aim
  • investigate change when speakers have a choice
  • Four levels of experimental refinement

?
pmw
words
48
A methodological progression
  • Aim
  • investigate change when speakers have a choice
  • Four levels of experimental refinement

?
?
select a plausible baseline
pmw
words
tensed VPs
49
A methodological progression
  • Aim
  • investigate change when speakers have a choice
  • Four levels of experimental refinement

?
?
?
select a plausible baseline
grammatically restrict data or enumerate cases
pmw
will, shall
words
tensed VPs
50
A methodological progression
  • Aim
  • investigate change when speakers have a choice
  • Four levels of experimental refinement

?
?
?
?
select a plausible baseline
grammatically restrict data or enumerate cases
check each case individually for plausibility of
alternation
pmw
will, shall
will, shall
words
tensed VPs
Ye shall be saved
51
Conclusions
  • The basic idea of these methods is
  • Predict future results if experiment were
    repeated
  • Significant effect gt 0 (e.g. 19 times out of
    20)
  • Based on the Binomial distribution
  • Approximated by Normal distribution many uses
  • Plotting confidence intervals
  • Use goodness of fit or single-sample z tests to
    compare an observation with an expected baseline
  • Use 2?2 tests or two independent sample z tests
    to compare two observed samples
  • When using larger r ?c tests, simplify as far as
    possible to identify the source of variation!
  • Take care with small samples / low frequencies
  • Use Wilson and Newcombes methods instead!

52
Conclusions
  • Two methods for measuring the size of an
    experimental effect
  • absolute or percentage swing
  • Cramérs f
  • You can compare two experiments
  • These methods all presume that
  • observed p is free to vary (speaker is free to
    choose)
  • If this is not the case then
  • statistical model is undermined
  • confidence intervals are too conservative
  • but multiple changes are combined into one
  • e.g. VPs increase while modals decrease
  • so significant change may not mean what you
    think!

53
References
  • Newcombe, R.G. 1998. Interval estimation for the
    difference between independent proportions
    comparison of eleven methods. Statistics in
    Medicine 17 873-890
  • Wallis, S.A. 2011. Comparing ?² tests for
    separability. London Survey of English Usage,
    UCL
  • Wallis, S.A. to appear, a. Binomial confidence
    intervals and contingency tests. Journal of
    Quantitative Linguistics
  • Wallis, S.A. to appear, b. z-squared The origin
    and use of ?². Journal of Quantitative
    Linguistics
  • Wilson, E.B. 1927. Probable inference, the law of
    succession, and statistical inference. Journal of
    the American Statistical Association 22 209-212
  • NOTE My statistics papers, more explanation,
    spreadsheets etc. are published on
    corp.ling.stats blog http//corplingstats.wordpre
    ss.com
Write a Comment
User Comments (0)
About PowerShow.com