Tackling meaning and aboutness with KeyWords - PowerPoint PPT Presentation

About This Presentation
Title:

Tackling meaning and aboutness with KeyWords

Description:

Title: PowerPoint Presentation Last modified by: Mike Scott Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 50
Provided by: lexically
Category:

less

Transcript and Presenter's Notes

Title: Tackling meaning and aboutness with KeyWords


1
Tackling meaning and aboutness with KeyWords
Corpus Linguistics Summer Institute Liverpool 2
July 2008
  • Mike Scott,
  • School of English
  • University of Liverpool

2
Purpose
  • To explore the notion of keyness
  • and its implications in corpus-based study
  • with reference to WordSmith

3
Keyness
  • Words are not key in a language but in a given
    text
  • Words can be key to a culture (Stubbs 2002,
    Williams 1976)
  • Keyness
  • Importance
  • Aboutness (Phillips, 1989)

4
The Notion of Keyness
  • 2 main qualities
  • Importance
  • a key player, a key position
  • the keystone of an arch
  • Aboutness (Phillips, 1989)
  • a key point a main point in the texts
  • development and argument,
  • what the text is about

5
Overview
  • Keyness, as a new territory, looks promising and
    has attracted colonists and prospectors. It
    generally appears to give robust indications of
    the texts aboutness together with indicators of
    style.

6
the texts aboutness
7
colonists
8
and prospectors
9
Issues
  • the issue of text section v. text v. corpus v.
    sub-corpus
  • statistical questions what exactly can be
    claimed?
  • how to choose a reference corpus
  • handling related forms such as antonyms

10
Of course it doesnt actually understand
11
or know what is correct
12
only look at what is found in text
or context
whether marked up or not
ltintrogtOnce upon a time .lt/introgt
13
  • Context?

14
(No Transcript)
15
Corresponding units of meaning
  • morpheme
  • word
  • cluster / phrase
  • sentence
  • paragraph
  • section, chapter
  • text
  • (sub-) genre

16
If all this is so
  • what is the status of the key words one may
    identify and what is to be done with them?

17
Issues
  1. the issue of text section v. text v. corpus v.
    sub-corpus
  2. statistical questions what exactly can be
    claimed?
  3. how to choose a reference corpus
  4. handling related forms such as antonyms
  5. what is the status of the key words one may
    identify and what is to be done with them?

18
text section v. text v. corpus v. sub-corpus
  • text section levels 1-5
  • text level 6
  • corpus levels 7 8

19
But these are often not clearly differentiated
  • text, level 6 with or without mark-up, images,
    sounds?
  • what do we mean by section, chapter (4) and other
    non linguistically defined categories?
  • is text itself mutating?

20
Internet text
21
Wikipedia homepage (part)
22
Wikipedia homepage (part)
23
Wikipedia article (3 parts of same article)
24
Wikipedia discussion
  • from History of the stall article
  • latest contributor, Talk section

25
Statistics
  • there is no statistical defence of the whole set
    of KWs
  • but only of each one
  • comparing KW p values is not advisable

26
Why?
hail wind weevils
peas
chick-peas
potatoes
  • Matrix text, describing a series of troubles
    affecting a set of crops in a certain place.
  • weevils and chickpeas will be much rarer words
    (if not rarer entities in this particular place)
  • and will float to the top of the KW list

27
choosing a reference corpus
  • using a mixed bag RC, the larger the RC the
    better but a moderate sized RC may suffice.
  • the keyword procedure is fairly robust.
  • KWs identified even by an obviously absurd RC can
    be plausible indicators of aboutness, which
    reinforces the conclusion that keyword analysis
    is robust.
  • genre-specific RCs identify rather different KWs
  • the aboutness of a text may not be one thing but
    numerous different ones.
  • Scott (forthcoming)

28
related forms
  • WordSmith can be asked to treat members of the
    same lemma as related
  • and can handle clusters
  • but otherwise ignores relations such as
  • synonymy
  • antonymy
  • collocation

table tables
at the end of
29
status of the KW
  • not intrinsic to the word/cluster but
    context-bound
  • a pointer to specific textual aboutness
  • and/or style
  • statistically arrived at but not established
  • sometimes pointing to a pattern

30
status of the set of KWs
  • indicative of the more general aboutness of the
    source text(s)
  • and/or style
  • but (as a set) not statistically proven

31
Shakespeares KWs
32
KWs of Hamlet
  • Characters
  • FORTINBRAS, GERTRUDE, GUILDENSTERN, HAMLET,
    HAMLET'S,HORATIO, LAERTES, OPHELIA, PYRRHUS,
    ROSENCRANTZ
  • Places
  • DENMARK, NORWAY
  • Pronouns
  • I, IT, T, THEE, THOU
  • Themes, events
  • MADNESS, PLAY,PLAYERS
  • Other (unexpected)
  • E'EN, LORD, MOST, MOTHER, PHRASE, VERY

33
Most of these are obvious probably
uninteresting.
  • if you know the play you already know
  • it concerns Hamlet and some other characters
  • its set in Denmark
  • Ophelia goes mad.

34
but some are puzzling
  • Why are IT, LORD and MOST positively key in
    Hamlet
  • if they are negatively key in the other plays?
  • Which characters are they most key of?
  • Where are they found, how are these KWs dispersed
    throughout the play?

35
IT in Hamlet (1)
  • In the plays 0.95 (1 word in 100) but
  • in Hamlets speeches 1.48 a 50 increase in
    this one characters speeches
  • in Horatios speeches 2.33 nearly 250 of the
    average in this one characters speeches.

36
IT in Hamlet (2)
  • In Hamlets speeches, distributed evenly
  • In Horatios speeches

37
DO in Othello
  • Nearly twice as frequent as in the other plays
  • Characteristic of Iago (nearly twice as often)
    and Desdemona (more than 3 times as often)
  • DOST characteristic of Othello (more than 6 times
    as frequent)

38
Iago commanding
39
Desdemona conditional
40
Othellos DOST questioning suspicion
41
Keyword Clusters
  • Text-initial sections of
  • Hard News (Guardian 1998-2004)
  • studying Hoeys Lexical Priming theory

42
Research Questions
  • Using the hard news corpus,
  • How many 3-5 word clusters are found to be key in
    TISC sections?
  • How many are positively and how many are
    negatively key?
  • What recurrent patterns can be found in the two
    types of key cluster?

43
RQs 1 2 Numbers of KW clusters
  • using a p value of 0.0000001 and minimum
    frequency of 3 and log likelihood statistic,
  • 8,132 key clusters altogether (in 3.2 million
    words of text)
  • of which 7,631 were positively key
  • and 501 negatively key
  • though there is repetition as these are 3-5 word
    n-grams

Research Question 2
44
RQ 1 Numbers of KW clusters
  • Is 8 thousand a large number of distinct key
    text-initial clusters?
  • In the same amount of text there are 84 thousand
    3-5 word clusters of frequency at least 5
    altogether
  • about one in 10 is associated with text initial
    position at the .0000001 level of significance

45
RQ 1, continued
  • is 1 in 10 a large number to be key?
  • In the case of SISC (sentences from paragraphs
    with only one sentence in), we get
  • 507 thousand clusters, of which
  • 2,192 are key (1,747 positively and 445
    negatively)
  • which is about 1 in 230

46
IT reporting verb positively key
  • IT WAS ANNOUNCED LAST NIGHT
  • IT WAS CLAIMED LAST NIGHT
  • IT WAS CONFIRMED LAST NIGHT
  • IT IS REVEALED TODAY

47
IT otherwise negatively key
  • IT IS A
  • IT IS ABOUT
  • IT IS EXPECTED
  • IT IS GOING
  • IT IS ONLY
  • IT IS POSSIBLE
  • IT SEEMS TO

48
Conclusions
  • keyness is a pointer
  • to importance
  • which can be
  • sub-textual
  • textual
  • intertextual

49
References
  • Berber Sardinha, Tony, 1999. Using Key Words in
    Text Analysis practical aspects. DIRECT Papers
    42, LAEL, Catholic University of São Paulo.
  • Berber Sardinha, Tony, 2004. Lingüística de
    Corpus. Barueri Manole.
  • Culpeper, J. ,2002. 'Computers, language and
    characterisation An Analysis of six characters
    in Romeo and Juliet'. In U. Melander-Marttala,
    C. Östman and M. Kytö (eds.), Conversation in
    Life and in Literature Papers from the ASLA
    Symposium, Association Suedoise de Linguistique
    Appliquée (ASLA), 15. Universitetstryckeriet
    Uppsala, pp.11-30.
  • Kemppanen, Hannu 2004. Keywords and Ideology in
    Translated History Texts A Corpus-based
    Analysis. Across Languages and Cultures 5 (1),
    89-106
  • Rigotti, Eddo and Andrea Rocci, 2002. From
    Argument Analysis to Cultural Keywords (and back
    again). http//www.ils.com.unisi.ch/articoli-rigot
    ti-rocci-keywords-published.pdf (accessed May
    2007). In F. H. van Eemeren et al, Proceedings of
    the 5th Conference of the International Society
    for the Study of Argumentation. Amsterdam
    SicSat. pp. 903-908.
  • Scott, M., 1996 with new versions in 1997, 1999,
    2004, Wordsmith Tools, Oxford Oxford University
    Press.
  • Scott, M., 1997a. "PC Analysis of Key Words --
    and Key Key Words", System, Vol. 25, No. 1, pp.
    1-13.
  • Scott, M., 1997b. "The Right Word in the Right
    Place Key Word Associates in Two Languages",
    AAA - Arbeiten aus Anglistik und Amerikanistik,
    Vol. 22, No. 2, pp. 239-252.
  • Scott, M., 2000a. Focusing on the Text and Its
    Key Words, in L. Burnard T. McEnery (eds.),
    Rethinking Language Pedagogy from a Corpus
    Perspective, Volume 2. Frankfurt Peter Lang.,
    pp. 103-122.
  • Scott, M. 2000b. Reverberations of an Echo, in B.
    Lewandowska-Tomaszczyk P.J. Melia (eds.)
    PALC99 Practical Applications in Language
    Corpora. Lodz Studies in Language, Volume 1.
    Frankfurt Peter Lang., pp. 49-68.
  • Scott, M., 2001. Mapping Key Words to Problem
    and Solution in M. Scott G. Thompson (eds.)
    Patterns of Text in honour of Michael Hoey,
    Amsterdam Benjamins, pp. 109-127.
  • Scott, M., 2002. Picturing the key words of a
    very large corpus and their lexical upshots or
    getting at the Guardians view of the world in
    B. Kettemann G. Marko (eds.) Teaching and
    Learning by Doing Corpus Analysis, Amsterdam
    Rodopi, pp. 43-50 and cd-rom within the cover of
    the book.
  • Scott, M. 2006. "The Importance of Key Words for
    LSP" in Arnó Macià, E., A. Soler Cervera C.
    Rueda Ramos (eds.), Information Technology in
    Languages for Specific Purposes issues and
    prospects. New York Springer, pp. 231-243.
  • Scott. M. (forthcoming) In Search of a Bad
    Reference Corpus. AHRC Methods Network.
  • Scott, M. Tribble, C., 2006. Textual Patterns
    keyword and corpus analysis in language
    education, Amsterdam Benjamins.
  • Seale C, Charteris-Black J, Ziebland S. 2006.
    Gender, cancer experience and internet use a
    comparative keyword analysis of interviews and
    online cancer support groups. Social Science and
    Medicine. 62, 10 2577-2590
  • Tribble, Chris, 1999, "Genres, keywords,
    teaching towards a pedagogic account of the
    language of project proposals" in L. Burnard A.
    McEnery (eds.) Rethinking Language Pedagogy from
    a Corpus Perspective Papers from the Third
    International Conference on Teaching and Language
    Corpora, (Lodz Studies in Language). Hamburg
    Peter Lang.
Write a Comment
User Comments (0)
About PowerShow.com