Corpora in lexical studies - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Corpora in lexical studies

Description:

Corpora in lexical studies Corpus Linguistics Richard Xiao lancsxiaoz_at_googlemail.com Noun collocates of sweet Click on a word to see its collocation info ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 45
Provided by: Richard1333
Category:

less

Transcript and Presenter's Notes

Title: Corpora in lexical studies


1
Corpora in lexical studies
  • Corpus Linguistics
  • Richard Xiao
  • lancsxiaoz_at_googlemail.com

2
Aims of this session
  • Lecture
  • Corpus-based lexicography
  • Collocation and colligation
  • Lab session
  • Collocation using WST
  • Collocation using AntConc
  • Collocation and colligation in Xaira
  • Using the BNCweb to study collocation

3
Corpus revolution in lexicographic and lexical
studies
  • Lexicographic and lexical studies are the
    greatest beneficiaries of corpora
  • Corpora have revolutionised dictionary making
    and reference publishing
  • It is now nearly unheard of for new dictionaries
    and new editions of old dictionaries published
    from the 1990s onwards not to claim to be based
    on corpus data

4
Why use corpora in dictionary making?
  • Machine-readable corpora allow dictionary makers
    to extract all authentic, typical examples of the
    usage of a lexical item from a large body of text
    in a few seconds
  • Corpora allow dictionary makers to select entries
    based on frequency information
  • Corpora can readily provide frequency information
    and collocation information for readers
  • Textual (e.g. register, genre and domain) and
    sociolinguistic (e.g. user gender and age)
    information encoded in corpora allows
    lexicographers to give a more accurate
    description of the usage of a lexical item

5
Why use corpora in dictionary making?
  • Corpus annotations such as part-of-speech tagging
    and word sense disambiguation also enable a more
    sensible grouping of words which are polysemous
    and homographs
  • A monitor corpus allows lexicographers to track
    subtle change in the meaning and usage of a
    lexical item so as to keep their dictionaries
    up-to-date
  • Corpus evidence can complement or refute the
    intuitions of individual lexicographers, which
    are not always reliable because of potential
    biases in intuitions

6
Five emphases
  • Changes brought about by corpora to dictionaries
    and other reference books - five emphases
    (Hunston 2002)
  • an emphasis on frequency
  • an emphasis on collocation and phraseology
  • an emphasis on variation
  • an emphasis on lexis in grammar
  • an emphasis on authenticity

7
Top 1000 written / spoken words
Authentic examples
8
Corpus-based learner dictionaries
  • First fully corpus-based dictionary
  • Collins Cobuild English Dictionary (1987)
  • Some corpus-based learner dictionaries
  • Longman Dictionary of Contemporary English (3rd
    edition)
  • Oxford Advanced Learners Dictionary (OALD, 5th
    edition)
  • Cambridge International Dictionary of English
    (1st edition)

9
Frequency dictionaries
10
Collocation
  • Collocation is among the linguistic concepts
    which have benefited most from advances in corpus
    linguistics
  • What is collocation?
  • strong tea, powerful car (Halliday 1976)
  • collocations of a given word are statements of
    the habitual or customary places of that wordthe
    company that words keep (Firth 1968181-2)
  • One of the meanings of night is its
    collocability with dark (Firth 1957196)
  • a frequent co-occurrence of two lexical items in
    the language (Greenbaum 197482)
  • expel a school child vs. cashier an army officer
  • I propose to bring forward as a technical term,
    meaning by collocation, and apply the test of
    collocability (Firth 1957 194)

11
Meaning by collocation
  • There is frequently so high a degree of
    interdependence between lexemes which tend to
    occur in texts in collocation with one another
    that their potentiality for collocation is
    reasonably described as being part of their
    meaning (Lyons 1977 613)
  • Complete description of the meaning of a word
    would have to include the other word or words
    that collocate with it
  • You shall know a word by the company it keeps!
    (Firth 1968179)
  • Collocation is part of the word meaning

12
Two types of collocation
  • Coherence collocation vs. neighbourhood
    (horizontal) collocation (Scott 1998)
  • Coherence collocation
  • Collocates associated with a word (e.g. letter
    stamp, post office)
  • Neighbourhood collocation
  • Words which do actually co-occur with the word
    (letter - my, this, a, etc)

13
Coherence collocation
  • A cover term for the cohesion that results from
    the co-occurrence of lexical items that are in
    some way or other typically associated with one
    another, because they tend to occur in similar
    environments. (Halliday Hasan 1976287)
  • candle flame flicker
  • hair comb curl wave
  • sky sunshine cloud rain
  • Difficult to measure using a statistical formula

14
Neighbourhood collocation
  • Collocation in corpus linguistics
  • Structure of collocation collocation window
  • We may use the term node to refer to an item
    whose collocations we are studying, and we may
    then define a span as the number of lexical items
    on each side of a node that we consider relevant
    to that node. Items in the environment set by the
    span we will call collocates. (Sinclair
    1966415)
  • Casual vs. significant collocation
  • Significant collocation collocation that occurs
    more frequently than would be expected (in a
    statistical sense) on the basis of the individual
    items
  • n.b. Neighbourhood (horizontal) collocations can
    include some coherence collocations

15
Intuition vs. collocation
  • Greenbaum (1974) people disagree on
    collocations in introspection-based elicitation
    experiments
  • Although collocation can be observed informally
    on the basis of intuitions, it is more reliable
    to measure it statistically, and for this a
    corpus is essential (Hunston 2002 68)
  • Intuition is often a poor guide to collocation
  • because each of us has only a partial knowledge
    of the language, we have prejudices and
    preferences, our memory is weak, our imagination
    is powerful (so we can conceive of possible
    contexts for the most implausible utterances),
    and we tend to notice unusual words or structures
    but often overlook ordinary ones (Krishnamurthy
    2000 32-33)
  • Collocation can be measured on the basis of
    co-occurrence statistics (MI, z, t, LL etc)
    more discussion to follow

16
Collocation is syntagmatic
Langue (Language system) paradigmatic
  • famous boots. On the stroke of full time the
  • Stoke the lead on the stroke of half-time
    with a goal
  • Smith sin-binned on the stroke of half-time,
    added a
  • clinched their win on the stroke of lunch after
    resuming
  • chase by declaring on the stroke of lunch. ltpgt
    With a lead
  • expectant crowd, on the stroke of midday. The
    bird
  • hour began not upon the stroke of midnight but
    upon the
  • of midnight but upon the stroke of noon. There
    was,
  • booked in advance. On the stroke of seven, a
    gong summons
  • Promptly on the stroke of six 'clock,
    the chooks
  • from Edinburgh on the stroke of the
    Millennium.
  • Parole (Utterance)
    syntagmatic

17
Collocation vs. colligation
  • Collocation
  • Relationship between a lexical item and other
    lexical items
  • Relationship between words at the lexical level
  • E.g. very collocates with good
  • Colligation
  • Relationship between a lexical item and a
    grammatical category
  • Relationship between words at the grammatical
    level
  • E.g. very colligates with ADJ

18
WST Collocate settings
Concord tab
19
WST collocates
Strength of relationship is displayed as 0.000 if
it hasn't yet been computed
20
Strength of collocation relationship
A wordlist is required
21
Highlight and double click
22
to see the selected collocate
23
Collocates in AntConc
24
Collocation in Xaira
25
Colligation in Xaira
26
Exploring collocation with BNCweb
  • http//bncweb.lancs.ac.uk/bncwebSignup/user/login.
    php

27
Search for sweet
28
Concordances of sweet
KWIC view
29
KWIC view
30
Dropdown menu collocations
31
Collocation setting
32
Collocation database (default settings)
33
Adjusting settings
34
Noun collocates of sweet
Click on a word to see its collocation info
35
Collocation info of sweet smell
Click on a number to see concordances of
collocates at that position
36
Concordances of smell at R2
37
Collocation statistics
38
Rank by frequency
Frequent words crowd into the top of the
collocate list Are they genuine collocates?
39
Rank by the t test
  • Also focusing on frequent words?

40
Rank by MI
  • Infrequent words at the top of the list
  • How useful are they (especially to English
    learners)?

41
Rank by the z score
  • Like MI, the z score also over-estimates
    infrequent items (e.g. nothings, afton, marjoram)

42
Log-likelihood test
43
Rank by MI3
44
Rank by dice coefficient
Write a Comment
User Comments (0)
About PowerShow.com