Title: Tackling meaning and aboutness with KeyWords
1Tackling meaning and aboutness with KeyWords
Corpus Linguistics Summer Institute Liverpool 2
July 2008
- Mike Scott,
- School of English
- University of Liverpool
2Purpose
- To explore the notion of keyness
- and its implications in corpus-based study
- with reference to WordSmith
3Keyness
- Words are not key in a language but in a given
text - Words can be key to a culture (Stubbs 2002,
Williams 1976) - Keyness
- Importance
- Aboutness (Phillips, 1989)
4The Notion of Keyness
- 2 main qualities
- Importance
- a key player, a key position
- the keystone of an arch
- Aboutness (Phillips, 1989)
- a key point a main point in the texts
- development and argument,
- what the text is about
5Overview
- Keyness, as a new territory, looks promising and
has attracted colonists and prospectors. It
generally appears to give robust indications of
the texts aboutness together with indicators of
style.
6the texts aboutness
7colonists
8and prospectors
9Issues
- the issue of text section v. text v. corpus v.
sub-corpus - statistical questions what exactly can be
claimed? - how to choose a reference corpus
- handling related forms such as antonyms
10Of course it doesnt actually understand
11 or know what is correct
12 only look at what is found in text
or context
whether marked up or not
ltintrogtOnce upon a time .lt/introgt
13 14(No Transcript)
15Corresponding units of meaning
- morpheme
- word
- cluster / phrase
- sentence
- paragraph
- section, chapter
- text
- (sub-) genre
16If all this is so
- what is the status of the key words one may
identify and what is to be done with them?
17Issues
- the issue of text section v. text v. corpus v.
sub-corpus - statistical questions what exactly can be
claimed? - how to choose a reference corpus
- handling related forms such as antonyms
- what is the status of the key words one may
identify and what is to be done with them?
18text section v. text v. corpus v. sub-corpus
- text section levels 1-5
- text level 6
- corpus levels 7 8
19But these are often not clearly differentiated
- text, level 6 with or without mark-up, images,
sounds? - what do we mean by section, chapter (4) and other
non linguistically defined categories? - is text itself mutating?
20Internet text
21Wikipedia homepage (part)
22Wikipedia homepage (part)
23Wikipedia article (3 parts of same article)
24Wikipedia discussion
- from History of the stall article
- latest contributor, Talk section
25Statistics
- there is no statistical defence of the whole set
of KWs - but only of each one
- comparing KW p values is not advisable
26Why?
hail wind weevils
peas
chick-peas
potatoes
- Matrix text, describing a series of troubles
affecting a set of crops in a certain place. - weevils and chickpeas will be much rarer words
(if not rarer entities in this particular place) - and will float to the top of the KW list
27choosing a reference corpus
- using a mixed bag RC, the larger the RC the
better but a moderate sized RC may suffice. - the keyword procedure is fairly robust.
- KWs identified even by an obviously absurd RC can
be plausible indicators of aboutness, which
reinforces the conclusion that keyword analysis
is robust. - genre-specific RCs identify rather different KWs
- the aboutness of a text may not be one thing but
numerous different ones. - Scott (forthcoming)
28related forms
- WordSmith can be asked to treat members of the
same lemma as related - and can handle clusters
- but otherwise ignores relations such as
- synonymy
- antonymy
- collocation
table tables
at the end of
29status of the KW
- not intrinsic to the word/cluster but
context-bound - a pointer to specific textual aboutness
- and/or style
- statistically arrived at but not established
- sometimes pointing to a pattern
30status of the set of KWs
- indicative of the more general aboutness of the
source text(s) - and/or style
- but (as a set) not statistically proven
31Shakespeares KWs
32KWs of Hamlet
- Characters
- FORTINBRAS, GERTRUDE, GUILDENSTERN, HAMLET,
HAMLET'S,HORATIO, LAERTES, OPHELIA, PYRRHUS,
ROSENCRANTZ - Places
- DENMARK, NORWAY
- Pronouns
- I, IT, T, THEE, THOU
- Themes, events
- MADNESS, PLAY,PLAYERS
- Other (unexpected)
- E'EN, LORD, MOST, MOTHER, PHRASE, VERY
33Most of these are obvious probably
uninteresting.
- if you know the play you already know
- it concerns Hamlet and some other characters
- its set in Denmark
- Ophelia goes mad.
34 but some are puzzling
- Why are IT, LORD and MOST positively key in
Hamlet - if they are negatively key in the other plays?
- Which characters are they most key of?
- Where are they found, how are these KWs dispersed
throughout the play?
35IT in Hamlet (1)
- In the plays 0.95 (1 word in 100) but
- in Hamlets speeches 1.48 a 50 increase in
this one characters speeches - in Horatios speeches 2.33 nearly 250 of the
average in this one characters speeches.
36IT in Hamlet (2)
- In Hamlets speeches, distributed evenly
37DO in Othello
- Nearly twice as frequent as in the other plays
- Characteristic of Iago (nearly twice as often)
and Desdemona (more than 3 times as often) - DOST characteristic of Othello (more than 6 times
as frequent)
38Iago commanding
39Desdemona conditional
40Othellos DOST questioning suspicion
41Keyword Clusters
- Text-initial sections of
- Hard News (Guardian 1998-2004)
- studying Hoeys Lexical Priming theory
42Research Questions
- Using the hard news corpus,
- How many 3-5 word clusters are found to be key in
TISC sections? - How many are positively and how many are
negatively key? - What recurrent patterns can be found in the two
types of key cluster?
43RQs 1 2 Numbers of KW clusters
- using a p value of 0.0000001 and minimum
frequency of 3 and log likelihood statistic, - 8,132 key clusters altogether (in 3.2 million
words of text) - of which 7,631 were positively key
- and 501 negatively key
- though there is repetition as these are 3-5 word
n-grams
Research Question 2
44RQ 1 Numbers of KW clusters
- Is 8 thousand a large number of distinct key
text-initial clusters? - In the same amount of text there are 84 thousand
3-5 word clusters of frequency at least 5
altogether - about one in 10 is associated with text initial
position at the .0000001 level of significance
45RQ 1, continued
- is 1 in 10 a large number to be key?
- In the case of SISC (sentences from paragraphs
with only one sentence in), we get - 507 thousand clusters, of which
- 2,192 are key (1,747 positively and 445
negatively) - which is about 1 in 230
46IT reporting verb positively key
- IT WAS ANNOUNCED LAST NIGHT
- IT WAS CLAIMED LAST NIGHT
- IT WAS CONFIRMED LAST NIGHT
- IT IS REVEALED TODAY
47IT otherwise negatively key
- IT IS A
- IT IS ABOUT
- IT IS EXPECTED
- IT IS GOING
- IT IS ONLY
- IT IS POSSIBLE
- IT SEEMS TO
48Conclusions
- keyness is a pointer
- to importance
- which can be
- sub-textual
- textual
- intertextual
49References
- Berber Sardinha, Tony, 1999. Using Key Words in
Text Analysis practical aspects. DIRECT Papers
42, LAEL, Catholic University of São Paulo. - Berber Sardinha, Tony, 2004. Lingüística de
Corpus. Barueri Manole. - Culpeper, J. ,2002. 'Computers, language and
characterisation An Analysis of six characters
in Romeo and Juliet'. In U. Melander-Marttala,
C. Östman and M. Kytö (eds.), Conversation in
Life and in Literature Papers from the ASLA
Symposium, Association Suedoise de Linguistique
Appliquée (ASLA), 15. Universitetstryckeriet
Uppsala, pp.11-30. - Kemppanen, Hannu 2004. Keywords and Ideology in
Translated History Texts A Corpus-based
Analysis. Across Languages and Cultures 5 (1),
89-106 - Rigotti, Eddo and Andrea Rocci, 2002. From
Argument Analysis to Cultural Keywords (and back
again). http//www.ils.com.unisi.ch/articoli-rigot
ti-rocci-keywords-published.pdf (accessed May
2007). In F. H. van Eemeren et al, Proceedings of
the 5th Conference of the International Society
for the Study of Argumentation. Amsterdam
SicSat. pp. 903-908. - Scott, M., 1996 with new versions in 1997, 1999,
2004, Wordsmith Tools, Oxford Oxford University
Press. - Scott, M., 1997a. "PC Analysis of Key Words --
and Key Key Words", System, Vol. 25, No. 1, pp.
1-13. - Scott, M., 1997b. "The Right Word in the Right
Place Key Word Associates in Two Languages",
AAA - Arbeiten aus Anglistik und Amerikanistik,
Vol. 22, No. 2, pp. 239-252. - Scott, M., 2000a. Focusing on the Text and Its
Key Words, in L. Burnard T. McEnery (eds.),
Rethinking Language Pedagogy from a Corpus
Perspective, Volume 2. Frankfurt Peter Lang.,
pp. 103-122. - Scott, M. 2000b. Reverberations of an Echo, in B.
Lewandowska-Tomaszczyk P.J. Melia (eds.)
PALC99 Practical Applications in Language
Corpora. Lodz Studies in Language, Volume 1.
Frankfurt Peter Lang., pp. 49-68. - Scott, M., 2001. Mapping Key Words to Problem
and Solution in M. Scott G. Thompson (eds.)
Patterns of Text in honour of Michael Hoey,
Amsterdam Benjamins, pp. 109-127. - Scott, M., 2002. Picturing the key words of a
very large corpus and their lexical upshots or
getting at the Guardians view of the world in
B. Kettemann G. Marko (eds.) Teaching and
Learning by Doing Corpus Analysis, Amsterdam
Rodopi, pp. 43-50 and cd-rom within the cover of
the book. - Scott, M. 2006. "The Importance of Key Words for
LSP" in Arnó Macià, E., A. Soler Cervera C.
Rueda Ramos (eds.), Information Technology in
Languages for Specific Purposes issues and
prospects. New York Springer, pp. 231-243. - Scott. M. (forthcoming) In Search of a Bad
Reference Corpus. AHRC Methods Network. - Scott, M. Tribble, C., 2006. Textual Patterns
keyword and corpus analysis in language
education, Amsterdam Benjamins. - Seale C, Charteris-Black J, Ziebland S. 2006.
Gender, cancer experience and internet use a
comparative keyword analysis of interviews and
online cancer support groups. Social Science and
Medicine. 62, 10 2577-2590 - Tribble, Chris, 1999, "Genres, keywords,
teaching towards a pedagogic account of the
language of project proposals" in L. Burnard A.
McEnery (eds.) Rethinking Language Pedagogy from
a Corpus Perspective Papers from the Third
International Conference on Teaching and Language
Corpora, (Lodz Studies in Language). Hamburg
Peter Lang.