Title: English Corpus Linguistics Introducing the Diachronic Corpus of Present-Day Spoken English (DCPSE)
1English Corpus LinguisticsIntroducing the
Diachronic Corpus of Present-Day Spoken English
(DCPSE)
2Barber (1964) changes in English grammar
- a. A tendency to regularize irregular morphology
(e.g. dreamt- dreamed) - b. A revival of the mandative subjunctive,
probably inspired by formal US usage (we demand
that she take part in the meeting) - c. Elimination of shall as a future marker in
the first person - Development of new, auxiliary-like uses of
certain lexical verbs (e.g. get, want cf.,
e.g., The way you look, you wanna / want to see a
doctor soon) - Extension of the progressive to new
constructions, e.g. modal, present perfect and
past perfect passive progressive (the road would
not be being built/ has not been being built/ had
not been being built before the general
elections) - Increase in the number and types of multi-word
verbs (phrasal verbs, have/take/give a ride,
etc.) - Placement of frequency adverbs before auxiliary
verbs (even if no emphasis is intended I never
have said so) - h. Do-support for have (have you any money? and
no, I havent any money - do you have/ have you
got any money? and no, I dont have any money/
havent got any money)
3The Diachronic Corpus of Present-daySpoken
English (DCPSE)
- Orthographically transcribed spoken BrE
- Fully parsed
- every sentence has a tree diagram
- searchable with ICECUP and FTFs
- 400,000 words each from
- London-Lund Corpus (aka The Survey Corpus)
- ICE-GB
- Balanced by text category
- Not evenly distributed by year
- LLC samples from 1958-1977
- ICE-GB 1990-1992
4Tree diagrams
- A tree diagram for the sentence Were getting
there.
5Barber on shall and will
- The distinctions formerly made between shall
and will are being lost, and will is coming
increasingly to be used instead of shall. One
reason for this is that in speech we very often
say neither will nor shall, but just ll
Ill see you to-morrow, well meet you at the
station, Johnll get it for you. We cannot use
this weak form in all positions (not at the end
of a phrase, for example), but we use it very
often and, whatever its historical origin may
have been (probably from will), we now use it
indiscriminately as a weak form for either shall
or will and very often the speaker could not
tell you which he had intended. There is thus
often a doubt in a speakers mind whether will or
shall is the appropriate form and, in this
doubt, it is will that is spreading at the
expense of shall, presumably because will is used
more frequently than shall anyway, and so is
likely to be the winner in a levelling process.
So people nowadays commonly say or write I will
be there, we will all die one day, and so on,
when they intend to express simple futurity and
not volition. - (Barber 1964 134)
6Denison on shall and will
- During the latter part of our period
1776-present day ... in the first person shall
has increasingly been replaced by will even where
there is no element of volition in the meaning. - (Denison 1998 167)
7The use of shall and will in written British and
American English from the 1960s and 1990s
BrE LOB FLOB LL diff will 2,798 2,723 1.2 -2.7
shall 355 200 44.3 -43.7
AmE Brown Frown LL diff will 2,702 2,402 17.3 -1
1.1 shall 267 150 33.1 -43.8
From Mair and Leech (2006 327)
- Figures are normalised per million word
frequencies - Log likelihood LL is performed against number of
words
8Mair and Leechs data
- Simply counts tagged lexical tokens
- Will auxiliary verb, includes ll
- Shall auxiliary verb
- Includes negative forms
- Does not distinguish by grammatical position or
context - Does not ask whether the choice is available,
e.g. limit to first person use - Does not consider subclasses separately
- Negative cases will not/wont vs. shall
not/shant? - Do interrogative cases behave differently?
- Is written data only
- Can we do better than this?
9An FTF for first person declarative shall
- This FTF is limited to first person cases
- The FTF requires that the NP is realised by the
pronoun I or we. - Interrogative cases have a different structure
- We can subtract negative (shall not) cases to
exclude them.
10Shall vs. will
- Does the proportion of cases of shall out of
shall, will change over time? - ?² for first person subject shall vs will
- d percentage difference (30 fall in shall
between LLC and ICE-GB) - an estimate of the size of the overall effect
(a bit like d) - c2 2x2 chi-square test is this change
statistically significant? - c2(shall) 2x1 goodness of fit test does shall
behave differently to average?
11Shall vs. will/ll
- Does the proportion of cases of shall out of
shall, will, ll change over time? - ?² for first person subject shall vs will vs.
ll
c2(shall) 2x1 goodness of fit test does shall
behave differently to average?
12Focusing on choice
- We focused on the choice of shall vs. will
- Mair and Leech simply said that total cases of
shall fell - But this might have happened for other reasons
- For example there may have been more
opportunities to use shall in the LLC data - Examining choice is a more precise way of
conducting experiments than counting frequencies - It allows us to consider what variables (time,
genre, other choices) affect the probability of
shall being chosen - Probability is a simple fraction from 0 to 1.
- p(shall) F(shall)
- F(shall) F(will)
13Probability of shall vs. will over time
14Probability of shall vs. will/ll over time
15Confidence intervals
- Probability p(shall)
- 0 no cases are of type shall
- 1 all cases are of type shall
- Our sample is a tiny subset of possible sentences
from the same period - So we cannot say a particular observation is
certain - Instead we try to estimate our confidence in an
observation using error bars or confidence
intervals - The more data we have supporting an observation
p, the smaller the confidence interval around it - We set a confidence level, typically of 95
- we are 95 sure that the true value is within the
interval
16Modal meaning
- Remember Barber and Denison. Not all cases of
shall or will mean the same thing - Root (futurity)
- Ive got some at home so I shall take it home.
DI-A18 30 - I will answer you in a minute. DI-B30 293
- Epistemic (volition)
- So I shall have roughly from the twenty-ninth of
June to the eighth of July on which I can spend
the whole of that time on those two papers.
DL-B01 62 - Its certainly my long term hope that I will have
some kind of companion... DI-B53 0257 - We should examine these choices separately
- Unfortunately this means classifying cases
manually
17Modal meaning statistics
Root
Epistemic
Unclear
Total
shall LLC 33 30.84 72 67.29 2 1.87 107 ICE-GB 22
59.46 14 37.84 ? sig 1 2.70 37 will LLC 44 55.70
28 35.44 7 8.86 79 ICE-GB 37 66.07 14 25.00 5 8
.93 56 Total 136 128 ? sig 15 279
- Root shall / will is stable results are not
significant - Epistemic shall / will falls (d -30 ?27)
- The fall in shall is not explained by the sharp
fall in Epistemic modals overall - from 100
(7228) to 28 (1414) - This is evidence that the shift in use in C20 is
concentrated within Epistemic meanings, from
shall to will. - Barber and Denison earlier shift was in Root
(future) meaning.
18Modal meaning statistics
Root
Epistemic
Unclear
Total
shall LLC 33 30.84 72 67.29 2 1.87 107 ICE-GB 22
59.46 14 37.84 ? sig 1 2.70 37 will LLC 44 55.70
28 35.44 7 8.86 79 ICE-GB 37 66.07 14 25.00 5 8
.93 56 Total 136 128 ? sig 15 279
- Shall is losing its particular Epistemic meaning
as a result - In the LLC data two thirds (67) of shall uses
were Epistemic. - This fell to 37 (just over one third) in ICE-GB.
19Conclusions
- DCPSE is
- orthographically transcribed spoken English
- mostly spontaneous
- fully parsed and checked by linguists, uses
phrase structure grammar based on Quirk et al. - searchable with ICECUP and FTFs
- Even lexical studies benefit from parsing
- allows us to focus on when a choice occurs
- You can use DCPSE to carry out many different
experiments on real English - we looked at change over (recent) time
- we might also look at how decisions interact
20Conclusions
- Designing a Corpus Linguistic experiment means
thinking carefully about your hypothesis and then
attempting to test it against the corpus - We examined the shift from shall to will
- We limited it to first person, declarative,
positive cases - Changing baselines (including ll) may lead to
different conclusions - Many corpus studies only consider word baselines
(or pmw) - But it is often better to consider proportions of
types of clause or phrase, or list specific
alternative choices - Alternation (choice) studies aim to hold meaning
constant so the speaker/writer is free to choose
between both cases - We focused further by subdividing data by modal
meaning
21Suggested further reading
- On shall vs. will and the progressive
- Aarts, B. Close, J. and Wallis S.A. (forthcoming)
Choices over time methodological issues in
investigating current change. In B. Aarts et
al. The changing Verb Phrase, Cambridge CUP. - www.ucl.ac.uk/english-usage/projects/verb-phrase/b
ook/aartsclosewallis.pdf - Barber, C. (1964) Linguistic Change in
Present-Day English. Edinburgh and London Oliver
and Boyd. - Denison, D. (1998) Syntax. In S. Romaine (ed.).
The Cambridge History of the English Language.
IV 1776-1997. Cambridge Cambridge University
Press. 92-329. - Mair, C. and Leech, G. (2006) Current changes in
English syntax.In B. Aarts and A. McMahon (ed.)
The Handbook of English Linguistics. Malden MA
Blackwell Publishers. 318-342. - On statistical tests, confidence intervals and
other methods - Wallis, S.A. (2010) z-squared the origin and use
of c2. Survey of English Usage, UCL. - www.ucl.ac.uk/english-usage/statspapers/z-squared.
pdf