Statistics for Linguistics - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Statistics for Linguistics

Description:

The IV must have at least two levels (=conditions) ... the sample difference sufficient to posit a frequency difference between the ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 89
Provided by: ACE593
Category:

less

Transcript and Presenter's Notes

Title: Statistics for Linguistics


1
Statistics for Linguistics
2
Variables and levels
  • Independent variable
  • Dependent variable

The IV must have at least two levels
(conditions).
The DV must allow for at least two different
types of responses.
3
Example 1
Subjects are given two types of constructions and
are asked to decide whether the given sentence is
grammatical (1) a. I gave the key
him. Construction 1 b. I gave the book
her. c. (2) a. I gave the key to
him. Construction 2 b. I gave the book to
her. c.
4
Example 1
IV (two conditions) DV (forced choice task)
Construction 1 Construction 2 a. grammatical b. ungrammatical
5
Example 2
Subjects are asked to complete copular sentences
with a relative clause. The predicate nominals of
the copular clauses belong to three different
semantic types (1) animate/human (2)
inanimate/object (3) place.
(1) This is the man __ (2) This is the ball
__ (3) This is the place __
6
Example 2
Subjects responses can be divided into five
different types
(1) This is the man who talked to Jane. who
I met. whom I gave the book. to whom she
went. whose cat died.
7
Example 2
IV DV
1. This is the man __ 2. This is the ball __ 3. This is the place __ a. SUBJ relative clause b. DO relative clause c. IO relative clause d. OBL relative clause e. GEN relative clause
8
Example 2
IV DV
1. This is the man __ 2. This is the thing __ 3. This is the place __ a. SUBJ relative clause b. DO relative clause c. IO relative clause d. OBL relative clause e. GEN relative clause
1. I saw the man __ 2. I saw the thing __ 3. I saw the place __ a. SUBJ relative clause b. DO relative clause c. IO relative clause d. OBL relative clause e. GEN relative clause
9
Example 2
Copular Transitive
SUBJ 3.5 DO 3.2 IO 2.7 OBL 2.2 GEN 0.6 SUBJ 2.5 DO 3.8 IO 3.2 OBL 0.5 GEN 0.5
10
Example 2
Interaction
No interaction
11
Types of variables
  • Nominal/categorical data
  • Ordinal data
  • Interval data

12
Types of analysis
  • Correlational analysis
  • Difference test

13
Types of analysis
Correlational test Difference test
Pearsons r Kendalls tau T-test ANOVA
14
Related vs. independent designs
  • Within subjects design related design
    repeated measures design
  • Between subjects design unrelated design
    independent design

15
Related vs. independent designs
Advantages of a within subject design
  • Reduction of inter-individual differences
  • Fewer subjects

Disadvantages of a within subject design
  • Subjects recognize the purpose of the study.
  • Subjects get tired, frustrated, excited.
  • Subjects get habituated to the task.

16
One sample tests
17
Binomial
A linguist has collected a sample of sentences
including ditransitive verbs from a corpus.
Overall, there are 46 sentences in his sample. In
27 sentences the verb occurs with two NP objects,
in 19 sentences the verb occurs with an NP and a
PP. (1) He gives Peter the ball. V NP NP
(n27) (2) He gives the ball to Peter. V NP PP
(n19) Is the difference in frequency
between two categories significant?
18
Binomial
Null-hypothesis The two constructions are
equally frequent (suggesting that they are free
variants i.e. there is nothing to
explain).Alternative hypothesis The two
constructions differ in frequency (which must
have a reason that needs to be explained).
19
Binomial
Kategorie N Beobachteter Anteil Testanteil Asymptotische Signifikanz (2-seitig)
Freq Gruppe 1 27,00 27 ,59 ,50 ,302(a)
Freq Gruppe 2 19,00 19 ,41
Freq Gesamt 46 1,00
20
??-square goodness-of-fit
A linguist has collected relative clauses from a
corpus, which he divided into four types (1)
subjects relatives, (2) object relatives, and (3)
oblique relatives, (4) genitive relatives. Is the
sample difference sufficient to posit a frequency
difference between the four groups in the
population?
21
??-square goodness-of-fit
Subject Object Oblique Genitive Total
Freq 55 53 39 4 151
22
??-square goodness-of-fit
Subject Object Oblique Genitive Total
Freq 55 53 39 4 151
Exp.
23
??-square goodness-of-fit
Subject Object Oblique Genitive Total
Freq 55 53 39 4 151
Exp. 37.75 37.75 37.75 37.75
24
??-square goodness-of-fit
Null-hypothesis The four types of relative
clauses are equally frequent in the true
population. Alternative hypothesis The four
types of relative clauses are not
equally frequent in the true population.
25
??-square goodness-of-fit
??
(observed expected)2
?
expected
26
??-square goodness-of-fit
Observed
55 53 39 4
27
??-square goodness-of-fit
Observed Expected
55 53 39 4 37.75 37.75 37.75 37.75
28
??-square goodness-of-fit
Observed Expected Difference (Residuals)
55 53 39 4 37.75 37.75 37.75 37.75 17.25 15.25 1.25 -33.75
29
??-square goodness-of-fit
Observed Expected Difference (Residuals) Square
55 53 39 4 37.75 37.75 37.75 37.75 17.25 15.25 1.25 -33.75 297.56 232.56 1.56 1139.06
30
??-square goodness-of-fit
Observed Expected Difference (Residuals) Square Sum
55 53 39 4 37.75 37.75 37.75 37.75 17.25 15.25 1.25 -33.75 297.56 232.56 1.56 1139.06 1670
31
??-square goodness-of-fit
Observed Expected Difference (Residuals) Square Sum Divided by expected frequency
55 53 39 4 37.75 37.75 37.75 37.75 17.25 15.25 1.25 -33.75 297.56 232.56 1.56 1139.06 1670 ?? 44.25
32
Normal distributions
33
Binomial distribution
34
Binomial distribution
Bernoulli trail
  • two possible outcomes on each trail
  • the outcomes are independent of each other
  • the probability ratio is constant across trails

35
Binomial distribution
T
H
HH
HT
TH
TT
36
Binomial distribution
0 heads HH 1 head HT TH 2 heads TT
37
Binomial distribution
HH HT TH TT
0 1 2
Sample space
Random variable
38
Binomial distribution
Cumulative outcome Probability
0 1? 1 2? 2 1? 0.25 0.50 0.25
? P(x) 1
39
H
T
HH
HT
TH
TT
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
40
Sample space HHH TTT HHT TTH HTH THT
THH HTT
Random variables 0 Head 1 Head 2
Heads 3 Heads
0 head 1 1 head 3 2 heads 3 3 heads 1
/ 8 0.125
/ 8 0.375
/ 8 0.375
/ 8 0.125
41
Binomial distribution
42
??-distribution
43
(No Transcript)
44
??-square
.995 .99 .975 .95 .90 .10 .05 .025 .01 .005
1 df
2 df
3 df 0.072 0.115 0.216 0.352 0.584 6.25 7.81 9.35 11.34 12.84
4 df
45
df3
44.25
7.81
46
Two sample tests
47
??-square of indepence
VP NP P VP P NP Total
Spatial Non-spatial 345 76 17 12 362 (80.4) 88 (19.6)
421 29 450
  1. He pushed the chair away. Spatial
  2. He turned on the TV. Non-spatial

48
??-square of indepence
X Y Total
A B 50 50
Total 50 50 100
49
??-square of indepence
X Y Total
A B 25 25 25 25 50 50
Total 50 50 100
50
??-square of indepence
VP_NP_P VP_P_NP Total
Spatial Non-spatial 362 88
Total 421 29 450
51
??-square of indepence
Expected frequency
X ? Y total
52
??-square of indepence
V_NP_P V_P_NP Total
Spatial 345 362?421/450 17 362?29/450 362
Non-spatial 76 88?421/450 12 88?29/450 88
Total 421 29 450
53
??-square of indepence
V_NP_P V_P_NP Total
Spatial 345 339 17 23.5 362
Non-spatial 76 82.5 12 5.7 88
Total 421 29 450
54
??-square of indepence
??
(observed expected)2
?
expected
55
??-square of indepence
Observed
345 17 76 12
56
??-square of indepence
Observed Expected
345 17 76 12 338.7 23.3 82.3 5.7
57
??-square of indepence
Observed Expected Difference (Residuals)
345 17 76 12 338.7 23.3 82.3 5.7 -6.3 -6.6 -6.3 6.3
58
??-square
Observed Expected Difference (Residuals) Square
345 17 76 12 338.7 23.3 82.3 5.7 -6.3 -6.6 -6.3 6.3 39.69 43.56 43.56 39.69
59
??-square
Observed Expected Difference (Residuals) Square Divided by expected frequency
345 17 76 12 338.7 23.3 82.3 5.7 -6.3 -6.6 -6.3 6.3 39.69 43.56 43.56 39.69 0.11 1.87 0.53 6.96
60
??-square
Observed Expected Difference (Residuals) Square Divided by expected frequency ??
345 17 76 12 338.7 23.3 82.3 5.7 -6.3 -6.6 -6.3 6.3 39.69 43.56 43.56 39.69 0.11 1.87 0.53 6.96 9.47
61
Probablity distribution
df (rows 1) ? (columns 1)
62
??-square
.995 .99 .975 .95 .90 .10 .05 .025 .01 .005
1 df 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 df
3 df
4 df
63
3.841
9.47
64
McNemar
Profitieren Kinder beim Erwerb einer
grammatischen Konstruktion davon, wenn sie diese
Konstruktion häufig hören? Um diese Frage zu
beantworten, bitten wir 100 Kinder, einen
ditransitiven Satz mit 10 Wörtern nachzusprechen
(Der Mann gibt dem kleinen Jungen einen sehr
großen Ballon). Alle Kinder müssen den Satz zwei
Mal nachsprechen (1) zu Beginn der Studie und
(2) nach einer Trainingsphase, in der sie
ähnliche ditransitive Sätze 5 Mal scheinbar
beiläufig in einer einstündigen Konversation
hören. Beeinflusst die Trainingsphase die
Fähigkeit der Kinder ditransitive Sätze
nachzusprechen?
65
McNemar
  1. Kinder, die den Satz vor und nach der
    Trainingsphase richtig nachsprechen. (N31)
  2. Kinder, die den Satz vorher falsch und nachher
    richtig aussprechen. (N39)
  3. Kinder, die den Satz vorher richtig und nachher
    falsch aussprechen. (N13)
  4. Kinder, die den Satz vorher und nachher falsch
    aussprechen. (N17)

66
McNemar
vorher vorher
richtig falsch Total
nachher richtig 31 39 70
nachher falsch 13 17 30
Total 44 56 100
67
Extensions of McNemar
Bowker Wenn eine der beiden Variablen mehr als
zwei Ausprägungen umfasst (richtig teilweise
richtig falsch).
Cochran Q Wenn die Probanden nicht nur zwei
Mal sondern mehrmals zu verschiedenen Zeiten
getestet werden.
68
Extension ??-square
  1. Konfigurationsfrequenzanalyse (KFA)
  2. Loglineare Analyse

69
Interval data
70
t-test
Parametric
between / independent / unrelated Independent t-test
within / dependent / related / repeated measures Paired t-test
71
t-test
Parametric Non-parametric
between / independent / unrelated Independent t-test Mann-Whitney U
within / dependent / related / repeated measures Paired t-test
72
t-test
Parametric Non-parametric
between / independent / unrelated Independent t-test Mann-Whitney U
within / dependent / related / repeated measures Paired t-test Wilcoxon
73
t-test
24 people were involved in an experiment to
determine whether background noise (e.g. music)
affects short-term memory (recall of words). Half
of the sample was randomly allocated to the NOISE
condition, and half to the NO NOISE condition.
The participants in the NOISE condition tried to
memorize a list of 20 words in two minutes, while
listening to pre-recorded noise through
earphones. The other participants wore earphones
but heard no noise as they attempted to memorize
the words. Immediately after this, they were
tested to see how many words they recalled.
74
NOISE (group 1) NO NOISE (group 2)
5.00 10.00 6.00 6.00 7.00 3.00 6.00 9.00 5.00 10.00 11.00 9.00 15.00 9.00 16.00 15.00 16.00 18.00 17.00 13.00 11.00 12.00 13.00 11.00
75
Standard deviation
?(xn x)2 N- 1
76
X1
5.00 10.00 6.00 6.00 7.00 3.00 6.00 9.00 5.00 10.00 11.00 9.00
? 87 / 12 7.3 (mean)
77
X1 (X1 Xmean)
5.00 10.00 6.00 6.00 7.00 3.00 6.00 9.00 5.00 10.00 11.00 9.00 5 7.3 10 7.3 6 7.3 6 7.3 7 7.3 3 7.3 6 7.3 9 7.3 5 7.3 10 7.3 11 7.3 9 7.3
? 87 / 12 7.3 (mean)
78
X1 (X1 Xmean) d1
5.00 10.00 6.00 6.00 7.00 3.00 6.00 9.00 5.00 10.00 11.00 9.00 5 7.3 10 7.3 6 7.3 6 7.3 7 7.3 3 7.3 6 7.3 9 7.3 5 7.3 10 7.3 11 7.3 9 7.3 2.3 2.7 1.3 1.3 0.3 4.3 1.3 1.7 2.3 2.7 3.7 1.7
? 87 / 12 7.3 (mean)
79
X1 (X1 Xmean) d1 d12 (residuals)
5.00 10.00 6.00 6.00 7.00 3.00 6.00 9.00 5.00 10.00 11.00 9.00 5 7.3 10 7.3 6 7.3 6 7.3 7 7.3 3 7.3 6 7.3 9 7.3 5 7.3 10 7.3 11 7.3 9 7.3 2.3 2.7 1.3 1.3 0.3 4.3 1.3 1.7 2.3 2.7 3.7 1.7 5.29 7.29 1.69 1.69 0.09 18.49 1.69 2.89 5.29 7.29 13.69 2.89
? 87 / 12 7.3 (mean) ? 72.85
80
Variance
72.85 12 - 1
6.25
81
Standard deviation
72.85 11 - 1
2.25
82
NOISE (group 1) NO NOISE (group 2)
5.00 10.00 6.00 6.00 7.00 3.00 6.00 9.00 5.00 10.00 11.00 9.00 15.00 9.00 16.00 15.00 16.00 18.00 17.00 13.00 11.00 12.00 13.00 11.00
Within group variance
Within group variance
Between group variance (difference between M1
and M2)
83
(No Transcript)
84
t-test
  • Interval data
  • For small samples (N lt 15) the data must be
    normally distributed.
  • Homogeneity-of-variance (Levenes test )

85
One sample t-test
Previous research has shown that English-speaking
children have an MLU of 3.1 at age 32. A
researcher wants to know if SLI children (i.e.
children with a specific language impairment)
have a lower (or higher MLU) at this age. We know
that SLI children have difficulties in processing
morphological units, but it is unclear, if their
MLUs are lower than in normally developing
children. In order to test this hypothesis, the
researcher collected data from 24 SLI children
aged 31 to 33 and determined the MLU for each
child.
86
Child MLU
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 2,7 3,0 2,8 2,9 3,1 3,0 3,1 2,5 3,2 3,1 2,9 2,9 2,8 3,1 3,2 2,4 2,3 2,8 3,1 2,5 2,7 2,9 2,9 3,0
87
One sample t-test
88
Confidence intervals
MLU of 3.1 normally developing children
Write a Comment
User Comments (0)
About PowerShow.com