Wei Naixing - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Wei Naixing

Description:

1 Major Principles of Firthian Linguistics. 2 ... 'Papers in Linguistics 1934-1951', London: Oxford university Press ' ... The farmer kills the duckling. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 73
Provided by: home8
Category:
Tags: duckling | naixing | wei

less

Transcript and Presenter's Notes

Title: Wei Naixing


1
Corpus-based and Corpus-driven Studies of
Collocation ideas and methods
  • Wei Naixing
  • Shanghai Jiaotong University

2
1 Major Principles of Firthian
Linguistics 2 Towards defining collocations3
Corpus based and Corpus-driven Studies4
Extensions
An Outline for the presentation
3
J. R Firth (1890-1960)
1 Major Principles of Firthian Linguistics
  • Major works include
  • Papers in Linguistics 1934-1951, London Oxford
    university Press
  • A Synopsis of Linguistic Theory 1930-1955.
    Oxford Philological Society.

4
Neo-Firthians
  • M. Halliday
  • J. Sinclair
  • A. McIntosh
  • P. Strevens
  • etc.

5
Post-Firthian corpus Linguists
  • John Sinclair
  • Michael Stubbs
  • Antoinette Renouf
  • Wolfgang Teubert
  • Elena Tognini-Bonelli
  • Etc.

6
Principles
  • Language is mode of actions, a way of doing
    things.
  • Linguistics is concerned with the study of
    meaning meaning is always contextual

7
Principles
  • Language should be studied in actual, attested,
    authentic instances of use, not as intuitive,
    invented sentences.
  • Linguistic analysis is empirical.

8
Principles
  • There is no boundary between lexis and syntax
    lexis and syntax are interdependent. Form and
    meaning are inseparable.
  • Much language use is routine.
  • Language in use transmits the culture.
  • language is monist and probabilistic

9
1.1 Language is a mode of action, a way of doing
things.
  • Firth Anti-mentalism
  • As we know so little about mind and as our
    study is essentially social, I shall cease to
    respect the duality of mind and body, thought and
    word, and be satisfied with the whole man,
    thinking and acting as a whole, in association
    with his fellows. (195719)
  • Firthians linguistics is a part of sociology
    The object of study is E language.

10
  • A comparison Chomskyan Mentalism
  • linguistics is to account for the ideal language
    users competence of his language, that is, the
    innate knowledge of linguistic rules. The central
    object of linguistics is the I Language.
  • Inevitably, linguistics is a part of cognitive
    psychology it is speculative hypothetical
    explanatory.

11
1.2 Linguistics is concerned with the study of
meaning meaning is always contextual
  • Firth Contextualism
  • The complete meaning of a word is always
    contextual, and no study of meaning apart from a
    complete context can be taken seriously. (1957 7)

12
What Firth says about collocation?
  • You shall know a word by the company it
    keeps.
  • The collocation of a given word, rather than a
    mere juxtaposition, is an order of mutual
    expectancy. The words in a collocation have
    customary or habitual places and are mutually
    expected and prehended

13
What Firth says about collocation?
  • Collocation is a mode of meaning
  • Meaning by collocation is an abstraction at
    the syntagmatic level and is not directly
    concerned with the conceptual or idea approach to
    the meaning of words. One of the meanings of
    night is its collocability with dark, and of
    dark, of course, collocation with night. (1957
    196)

14
What Firth says about collocation?
  • Colligation and collocation
  • Colligation is the abstract inter-relations
    of grammatical categories.
  • Collocations are actual words in habitual
    company. A word in a usual collocation stares you
    in the face just as it is. Colligation cannot be
    words as such. Colligations of grammatical
    categories related in a given structure do not
    necessarily follow word divisions or even
    sub-divisions of words. A colligation is not to
    be interpreted as abstraction in parallel with
    collocation of exemplifying words in text.

15
1.3 Language should be studied in actual,
attested, authentic instances of use, not as
intuitive, invented, isolated sentences
  • The farmer kills the duckling.
  • I have not seen your fathers pen, but I have
    read the book of your uncles gardener (ibid
    60-1).
  • Walter played the piano more often in Chicago
    than his brother conducted concerts in the rest
    of the states. (Quirk 1985 1132)
  • Ive never seen a dog more obviously friendly
    than your cat. (Quirk 1985 1132)

16
  • Much linguistics is based on invented sentences,
    and often only a small number of invented
    sentences are discussed.

Invented isolated sentences Grammatically
well-formed, but difficult or impossible to
imagine in use.
17
  • An important point
  • invented examples are really part of the
    explanations. They have no independent authority
    or reason for their existence, and they are
    constructed to refine the explanations and in
    many cases to clarify the explanation. Usage
    cannot be invented, it can only be recorded.

18
A comparison of Chomskyan assumptions with the
neo-Firthian principles
  • The critical problem for grammatical theory today
    is not a paucity of evidence but rather the
    inadequacy of present theories of language to
    account for masses of evidence that are hardly
    open to serious question.(Chomsky 196519-20)
  • Starved of adequate data, linguistics
    languished-indeed it became almost totally
    introverted.(Sinclair 1991a1)

19
1.4 There is no boundary between lexis and
syntax lexis and syntax are interdependent
  • Grammar and lexis are two perspectives from
    which we look at language. They are two sides of
    the same coin.
  • All linguistic items enter into patterns of
    both kinds. They are grammatical items when
    described grammatically, as entering (via
    classes) into closed systems and ordered
    structures, and lexical items when described
    lexically, as entering into open sets and linear
    collocations (Halliday, 1976 77)

20
  • Collocation is where grammar and lexis meet
  • 1) Any syntactic structure restricts the lexis
    that occurs in it and conversely any lexical
    item can be specified in terms of the structures
    in which it occurs.
  • 2) Such restrictions are typically not absolute,
    but clear tendencies grammar is inherently
    probabilistic.

21
  • 3) Native speakers have no reliable intuitions
    about such statistical tendencies. Grammars based
    on intuitive data will imply more freedom of
    combination than is in fact possible. Grammar is
    corpus-driven in the sense that the corpus tells
    us what the facts are.

22
  • 4) Every sense or meaning of a word has its own
    grammar each meaning is associated with a
    distinct formal patterning.
  • 5) Words are systematically co-selected the
    normal use of language is to select more than one
    word at a time.

23
  • 6) Since paradigmatic choices are not made
    independently of position in syntagmatic chain,
    the relation between paradigmatic and syntagmatic
    has to be rethought.
  • 7) in all cases so far examined, each meaning
    can be associated with a distinct formal
    patterning. There is ultimately no distinction
    between form and meaning. meaning affects the
    structure and this is the principal observation
    of corpus linguistics in the last decade.
    (Sinclair, 1991 6-7)

24
  • A comparison Chomskyan positions
  • Grammar is grammar and usage is usage.
  • the understanding of knowledge of grammar
    involves going beyond an examination of language
    in use. (P691)
  • Probabilistic information drawn from corpora
    is of utmost value for many aspects of linguistic
    inquiry. But it is all but useless for providing
    insights into the grammar of any individual
    speaker. (P 698)
  • In summary, we have grammar and we have
    usage. Grammar supports usage, but there is a
    world of difference between what a grammar is and
    what we do and need to do when we speak. (P695)
  • (Frederick J. Newmeyer 2003)

25
1.5 Much language use is routine
  • Language use is conventional and prefabricated.
  • Man is born free and is everywhere in chains.
    The bonds of family, neighborhood, class,
    occupation, country and religion are knit by
    speech and language. (Firth 1957 185)
  • It is true that everyday life we generally say
    what the other fellow expects us, one way or the
    other, to say, but this expectancy is the measure
    even of our delightful surprises, and good
    personal style is highly valued
  • (Firth 1957 186)

26
  • A multitude of terms
  • linguistic pre-fabrications
  • stereotyping
  • memorized chunks
  • formulaic expressions
  • pre-assembled parts
  • etc

27
1.6 Language in use transmits the culture.
  • Cultural behavioral patterns of language users

Behavioral pattern of key words Recurrent
collocations
Cultural behavioral pattern
Linguistic units
Cultural units
28
1.7 Language is monist and probabilistic
  • A monist view
  • Firth Saussurean dualisms are misconceived.
  • Such a language in the Saussurean sense is a
    system of signs placed in categories. It is a
    system of different values, not of concrete and
    positive terms. Actual people do not talk such a
    language. However systematically you may talk,
    you do not talk systematics. According to strict
    Saussurean doctrine, therefore, there are no
    sentences in a language considered as a system.
    Strictly speaking, in a language there are no
    real words either, but only examples of
    phonological and morphological categories.
  • (Firth 1957 180)

29
Chomskyan dualisms are not necessary.
  • Chomskys theory of competence and performance
    had driven a massive wedge between the system and
    instance, making it impossible by definition that
    analysis of actual texts could play any part in
    explaining the grammar of a language- let alone
    in formulating a general linguistic theory.
  • (Halliday1991 30)

30
A probabilistic view
  • It had always seemed to me that the linguistic
    system was inherently probabilistic, and that
    frequency in text was the instantiation of
    probability in the grammar.
  • (Halliady 1991 31)
  • Metaphor of weather and climate

31
  • The weather and the climate are the same
    phenomenon but regarded from different time
    depths. If we are thinking of the next few hours,
    then we are thinking of the weather and this
    perspective determines what kinds of actions we
    might take, for instance going to the beach or
    taking an umbrella. If we are thinking of the
    next decade or the next century, then we are
    thinking of the climate and this perspective
    also determines what kinds of actions we might
    take, for example, legislating against industrial
    processes which are destroying the ozone layer.
    If the climate changes, then obviously the
    weather changes. But conversely, each days
    weather affects the climate, however
    infinitesimally, either maintaining the status
    quo or helping to tip the balance towards
    climatic change. Instance and system, probability
    and categoriality, micro and macro, are two sides
    of the same coin, relative to the observers
    position.

32
2 Towards defining collocations
2.1 Semantic considerations The mechanism is the
selectional restrictions, that is, a lexical
items semantic property will presuppose certain
restrictions on the choice of items to occur in
its environment.
Semantically motivated collocations Drink
water pregnant women murder a suspect
33
  • Semantically unmotivated collocations
  • spotless reputation flawless
    reputation
  • flawless performance unblemished
    performance
  • bear a grudge bear
    a hatred bear a scorn
  • pay attention/ respect/ visit pay
    greeting pay welcome
  • rulelessness, arbitrariness,
    idiosyncraticness

34
2.2 Grammatical considerations
  • A collocation is a sequence of words that occur
    more than once in identical form (in the Brown
    Corpus) and which is grammatically
    well-structured. (Kjellmer, 1987 133) 
  • By collocation is meant the co-occurrence of two
    or more lexical items as realizations of
    structural elements within a given syntactic
    pattern. (Cowie, 1978 132)
  • e.g.
    Table 6.2

35
Many discontinuous collocations cut across
sentence boundaries.
  • laugh...joke
  • ill...doctor
  • try...succeed
  • king...crown
  • cradle...flame...flicker
  • hair...comb...curl...wave
  • sky...sunshine...cloud...rain, etc.
  • (Halliday and Hasan 287).

36
2.3 The Lexical co-occurrence Approach
  • The lexical approach is based on the
    assumption that words receive their meaning from
    the words they co-occur with. These linguists,
    Firthians in particular, perceived collocations
    as a lexical phenomenon independent of grammar.
  •  

37
  • You shall know a word by the company it
    keeps
  • (Firth, 1957 12)

... lexis seems to require the recognition
merely of linear co-occurrence together with some
measure of significant proximity, either a scale
or at least a cut-off point. It is this
syntagmatic relation which is referred to as
collocation. (Halliday, 1976 75)
38
  • Collocation is the occurrence of two or more
    words within a short space of each other in a
    text. The usual measure of proximity is a maximum
    of four words intervening. (Sinclair, 1991 170)

39
  • Collocates of back
  • I crawled back to camp.
  • Ill drive you back to your flat.
  • We had to go back to the hotel.
  • You have just got back from the office.
  • Set back from the road.
  • All the way back to the village.
  • He leaned back in his chair.
  • Tom went back to the window.
  • Britain would be back on his feet.
  • They got back into the car.
  • You must come back to the kitchen.
  • She went back into the living room.
    (Sinclair,1991 120).


40
  • Lexical Combinations on the Syntagmatic Axis

Figure 6-1 A continuum for syntagmatic
combinations
41
  • Defining features
  • 1. Collocations are syntagmatic
    associations of words in contexts.
  • 2. Collocations may exist in a grammatical
    construct, or may cut across structure
    boundaries.
  • 3.  Collocations are recurrent or
    significant expressions in terms of
    statistics.
  • 4.  Collocations are largely
    register-dependent.
  • 5.  Collocations are arbitrary and
    conventional.
  • 6.  Collocations vary in length.

42
3.1 Generalizing collocational patterns on the
basis of colligations and lexical co-occurrences
3. Corpus approaches to the study of
collocations
  • Colligation a grammatical construct in which a
    key word co-occurs with other words. Table
    6.3
  • Concordance e.g. of 'data'

43
3.2 Corpus-driven lexical computing
  • Node
  • The node word in a collocational study is
    the one whose lexical behaviour is under
    examination.
  • Span
  • The span is the measurement, in words, of
    the co-text of a word selected for study. Usually
    a span of 4/ 4, or a span of 5, is adopted in
    collocational studies, which means that four or
    five words on either side of the node word will
    be taken to be its relevant environment. 
  • Collocates
  • Collocates refer to those items which are in
    the environment defined by the span.
  • The idea is to investigate the collocational
    pattern of the node word by examining its 2SN
    occurrences of collocates 2S stands for the
    defined span, and N stands for the occurrences of
    the node word in a corpus.

44
Statistical measures
  • Z-score
  • Z-score compares the difference between the
    observed frequency of a collocate and its
    expected frequency in standard deviation units,
    and, thus, tells where one score lies in relation
    to other scores. It is used to compare a scores
    relative position in two or more score sets. In
    statistics, Z-score is usually applied to the
    test of a large sample while T-score is applied
    to the test of a small sample.

45
  • W the total number of words in a corpus
  • N the total occurrences of the node
  • C1 the occurrences of a collocate in the corpus
  • S the defined span
  • C2 the frequency of the collocate co-occurring
    with the node
  • The probability of a collocate co-occurring with
    each successive node
  • C1(2S1)/W
  • The probability of a collocate co-occurring with
    a node occurring N times

46
The expected frequency of the co-occurrence of
the node and the collocate
47
Casual collocation and Threshold frequency a
minimum of frequency is set for statistical
measurement if a word form co-occurs with the
node less than the minimum frequency, it will not
go to the statistical measure   Significant
collocation A significant collocation is one in
which the two items co-occur more often than
could be predicted on the basis of their
respective frequencies in the length of text
under consideration. Table 6.5
??performed??????Z??????? Table 6.6 ??
knowledge ??????T???????
48
  • Mutual Information.
  • Mutual information measures the amount of
    information that the occurrence of one word
    yields about the probability of the occurrence of
    another word.
  • MI principle In a corpus of 10 million words,
    the word kin occurs 10 times. This will mean
    that the probability of occurrence of kin is
    0.000001. But if, in the same corpus, the word
    kith occurs 5 times and, in all the five
    instances, kin follows kith. If we have seen
    kith, we could have estimated the probability
    of seeing kin to be 0.5. So the occurrence of
    the word kith gives us a great deal of
    information about the likelihood of seeing the
    word kin nearby.

49
MI calculates the probability of the two words
co-occurring by comparing the product of their
relative frequencies in the corpus with the
observed frequencies of their co-occurrences. The
difference between these values will reveal the
degree of significance of the co-occurrence.
??,a ?b????????????,P(a, b)????????, P(a)
???a????????,P(b)???b?????????? ?a ?b??????,?P(a,
b)??P(a). P(b)???,?????? ???I(a,b)?????????,
??,
50
??????????W,F(a)???a ?????,F(b)???b? ????, F(a,
b)?????????????,?
????????????
??,
51
  • If word a and word b tend to occur in
    conjunction, their mutual information will be
    high, which means that their collocational
    strength is strong. If they are not related and
    occur together only by chance, their mutual
    information will be zero, which means their
    collocational strength is weak or no mutual
    attraction exists between them. Finally, if the
    two events tend to avoid each other, their
    mutual information will be negative (The two
    words repel each other).
  • Table 6.7 ?? knowledge ??????MI ?????

52
3.3 Extended collocations Word Clusters and
chunks
  • A word cluster is a continuous collocational
    sequence.
  • The principle of idiom is that a language user
    has available to him or her a large number of
    semi-preconstructed phrases that constitute
    single choices, even though they might appear to
    be analysable into segments
  • It thus appears that a model of language
    which divides grammar and lexis, and which uses
    the grammar to provide a string of lexical choice
    points, is a secondary model. It cannot be
    relinquished, because a text still has many
    switch points where the open-choice model will
    come into play. It has an abstract relevance, in
    the sense that much of the text shows a potential
    for being analysed as the result of open choices,
    but the other principle, the idiom principle,
    dominates. (Sinclair (1987320 1991 110) )

53
  • Extracting word clusters of various length
    from the corpus
  • Table 6.8 The 20 most frequent 4-word
    clusters in the LOB
  • The ratio between observed frequency and
    expected frequency

If the observed frequency of the word form a and
that of word form b are F(a) and F(b)
respectively, then the probability of the two
forming a sequence is  F(a)/W
F(b)/W Multiply this theoretical probability by
the corpus size W, we can obtain the expected
frequency of the two word forms forming a
sequence
54
  • If a sequence consists of n word forms,that is,
    a1, a2, a an, then,
  • According to the formula we can work out the
    expected frequencies of the above clusters and
    the observed frequency/ expected frequency ratio.
  • Table 6.9 statistics of the most frequent
    clusters of point in LOB

55
  • Instances of Extended collocations

56
4 Extensions
  • 4.1 Collocations and culture
  • Frequent collocations reflect cultural nits
    and meaning

The case of afternoon tea.
57
4.2 Collocations and social phenomena
  • Newly emerging collocations reflect growth of new
    phenomena and concepts

Instances of working mothers.
Instances of single parent
58
4.3 Collocations and ideology
  • Cultural key words Social values and attitudes

Labour Casual, cheap, deskilling, manual,
semi-skilled, unemployed, unproductive, unskilled
Labourer Agricultural, building, casual, clerk,
farm, manual, poor, shop, unemployed, unskilled
Career
59
Collocations and ideology
  • Critical discourse analysis how language is used
    to intervene society

Frequent collocations in Chris Patterns
speech Individual rights, individual freedom,
opportunities of the individual, privacy of the
individual, respects for the individual, rule of
law
Purpose to create an illusion of a good colony
60
4.4 Collocations and language change
  • Strong collocations tend to become fixed phrases
    and convey packages of information

Instances of falling standards
61
  • Thank you for attention!
  • Comments and questions are welcome!

62
  • He is a heavy drinker.
  • He is drinking pretty heavily.
  • He drinks heavily.
  • He is putting in some heavy drinking.

  • back

63
back
64
?6.3 Colligations and collocations
back
65
(No Transcript)
66
(No Transcript)
67
  • N
  • field, input, inspection, laboratory,
    performance, survey
  •   N
  • Management, banks, base, introduction
  •  ADJ
  • analytical, attitudinal, available,
    centralized, digital, distributed, empirical,
    experimental, intuitional, invented,
    longitudinal, measured, natural, observational,
    preliminary, quantitative, raw

  • back

68
back
69
back
70
22 80 70 65 80 60 190 266 104 79 417 149 1445 707
338
back
71
Table 6.8 The 20 most frequent 4-word clusters in
the LOB
back
72
back
Write a Comment
User Comments (0)
About PowerShow.com