Current trends in corpus linguistics - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Current trends in corpus linguistics

Description:

Bowker, Lynne, and Jennifer Pearson (2002): Working with Specialised Corpora: A ... Granger, Sylviane, Joseph Hung and Stephanie Petch-Tyson (eds. ... – PowerPoint PPT presentation

Number of Views:888
Avg rating:3.0/5.0
Slides: 29
Provided by: home8
Category:

less

Transcript and Presenter's Notes

Title: Current trends in corpus linguistics


1
Current trends in corpus linguistics
  • Wolfgang Teubert
  • University of Birmingham
  • Teubertw_at_bham.ac.uk

2
Cognitive linguistics vs.corpus linguistics
  • Cognitive linguistics is interested in how
    thought is turned into language and how language
    is turned into thought.
  • Cognitive linguistics looks at language from a
    mental/psychological perspective
  • Corpus linguistics looks at language from a
    social perspective
  • Corpus linguistics deals with exchanging and
    sharing content
  • Corpus linguistics is interested in how
    communication works between the members of a
    discourse community

3
Cognitive linguistics, or how does the mind work?
  • Meanings are in the head.
  • Cognitivism looks into behaviourisms black box.
  • The mind uses the universal language of thought.
  • There is an innate language faculty/organ
  • Natural language expressions are converted into
    mental representations.
  • Mental representations are processed
    syntactically (algorithmically).
  • The mind is a syntactic engine.

4
Corpus linguistics
  • Corpus linguistics has a social perspective.
  • The language community exchanges and shares
    content in the discourse.
  • We are interested in meaning, not in peoples
    heads.
  • The language of bees (dancing) has meaning only
    to us (the observers) , not to the bees.

5
The agenda of corpus linguisticsReader Corpus
Linguistics Critical Concepts in Linguistics
(Routledge 2007)
  • Theoretical aspects (2)
  • History of CL
  • Corpus composition and compilation (1)
  • Standardisation, tagging, alignment,
    software, (7)
  • Lexicography, collocations, idioms (5)
  • Terminology
  • Grammar (3)
  • Translation, parallel corpora, multilinguality
    (5)
  • CDA, stylistics, rhetoric, evaluation (1)
  • Language history, historical linguistics
  • Language teaching (2)
  • Spoken language, discourse studies (2)

6
The agenda of corpus linguisticsIJCL recent
issues
  • (theory) My version of corpus linguistics
  • (composition) The representativeness of the Czech
    National Corpus
  • (software) the automatic recognition of verb
    patterns
  • (lexicography/collocation) Semantic prosody and
    semantic preference
  • (grammar) It-extraposition in English
  • (translation) A corpus-based view of similarity
    and difference in translation
  • (CDA) Examining the ideology of sleaze
  • (language teaching) ESL teachers questions and
    corpus evidence
  • (spoken language) Syllable contractions in a
    Mandarin conversational dialogue corpus

7
Corpus composition and compilation
  • Representativeness
  • Opportunistic corpora
  • The Internet as (virtual) corpus
  • Learners corpora
  • Diachronic corpora (monitor corpora)
  • Parallel corpora
  • Special language corpora
  • Spoken language corpora and transcription
  • Corpus compiler vs. corpus user

8
Methodology and software development
  • The issue of annotation/tagging pro and con
  • The issue of the tagset
  • The need for better automatic alignment
  • Corpus-driven sense disambiguation software
  • Software for diachronic corpora
  • Intertextuality software
  • Paraphrase recognition software
  • Named entities detection software
  • Neology detection software

9
Lexicography, phraseology, collocation study
  • The Cobuild project
  • Corpus size
  • The synchronic perspective generalisation
  • The collocation issue (more or less fixed
    expressions)
  • The paraphrase issue
  • Evaluation connotation deontic meaning
  • The diachronic perspective meaning as unique
    events
  • Using corpora to investigate antonym acquisition

10
Terminology and special languages
  • Hard terminology standardisation in engineering,
    production and maintenance
  • Soft terminology emergence of new ideas in the
    sciences
  • Terminology mining (paraphrases and definitions)
  • The detection of new ideas
  • Fixedness in genre-specific language and
    intercultural differences comparing English and
    Chinese Fire News corpora

11
Grammar
  • Rule-based or list-based?
  • The relationship between grammar and lexis
  • Lexicogrammar and local grammar
  • Pattern grammar
  • Valency grammar
  • General grammar vs. the dictionary
  • let-imperatives in English
  • Genitive and of-construction in modern written
    English

12
Parallel corpora etc. (1)
  • Parallel corpora re-using previous translations
  • The advantage of working with parallel corpora
  • The compilation of parallel corpora
  • The definition of the translation unit (smallest
    recurrent text segment for which there is only
    one target language equivalent)
  • What is source language, what is target language?
  • The Hong Kong Legal Document Corpus
  • A TranslationBase built from a parallel corpus
  • Parallel corpora and cultural contrast

13
Parallel corpora Extracted adjectivenoun
phrases as translation equivalents
  • 105 straight line
  • 104 legal officer
  • 101 residential care
  • 101 criminal offences
  • 100 annual allowance
  • 99 long term
  • 98 human remains
  • 98 conclusive evidence
  • 97 written permission
  • 97 public bus
  • 97 personal representatives
  • 97 first column
  • 96 notifiable workplace
  • 96 listed company
  • 95 light bus
  • 105 straight line
  • 104 legal officer
  • 101 residential care
  • 101 criminal offences
  • 100 annual allowance
  • 99 long term
  • 98 human remains
  • 98 conclusive evidence
  • 97 written permission
  • 97 public bus
  • 97 personal representatives
  • 97 first column
  • 96 notifiable workplace
  • 96 listed company
  • 95 light bus

14
Parallel corpora etc. (2)Linguistic anthropology
  • A Chinese-English/English-Chinese corpus of
    fiction
  • A cross-cultural study of emotions
  • grief, sorrow, melancholy, depression in the
    English and Chinese culture
  • Individual person vs. society
  • The diachronic dimension

15
Parallel corpora etc. (3) grief and sorrow some
translation equivalents
16
Parallel corpora etc. (4) grief and sorrow
beishang
  • ????,?????????,???,????????????????,????????????,?
    ??????????????
  • In the dining-room she had been demure and
    discreet. Now all pretense of grief had passed
    away from her. Her eyes shone with the joy of
    living , and her face still quivered with
    amusement at some remark of her companion.
  • ????????????,????????????,????????????,????????,??
    ??????????????
  • Suellen and Careen had cried themselves to sleep,
    as they did at least twice a day when they
    thought of Ellen , tears of grief and weakness
    oozing down their sunken cheeks
  • ??????????????,???????????,????????????????,
    ????????,??????,???????,????
  • The manager suffered this as a personal appeal.
    It came to him as if they were alone, and he
    could hardly restrain the tears for sorrow over
    the hopeless, pathetic, and yet dainty and
    appealing woman whom he loved

17
Words denoting emotions vs. words for facial
expressions, gestures etc.
  • hun shen fa chan (trembling all over the body)
  • tui fa ruan (legs becoming soft)
  • lian dou huang-le (face also yellow/fear)
  • liang-le ban jie (cold over half the body)
  • lei gan qi jue (tear dry breath cut off)
  • bian se (change colour/get angry)
  • lian bai qi ye (face white breath hiccup/angry)

18
Critical discourse analysis (1)
  • Discourse and society
  • Discourse and reality
  • Mass media and the construction of social reality
  • Comparing texts
  • Corpus linguistics methodology for CDA
  • Analysing the homosexual marriage discourse

19
Critical discourse analysis (2)
  • Marriage is for procreation.
  • the purpose of marriage is to produce children.
  • Marriage is for raising children
  • marriage is only between a man and a woman
  • marriage is the union of a man and a woman
  • Marriage is a sacred sacrament.
  • Marriage is a sacred institution.
  • Gay marriage is a civil rights issue
  • Prohibiting homosexual marriage is discrimination
  • Why Gay Marriage is a Fundamental Human Rights
    Issue
  • opposition to gay marriage is based on religious
  • In some ways marriage is a white GLBT issue.
  • same-sex marriage is the culmination of a larger

20
Critical discourse analysis (3)
  • Prohibiting homosexual marriage is a
    discrimination against homosexuals and a
    violation of multiple human rights.  Article 16
    of the Universal Declaration of Human Rights
    states that, Men and women of full age, without
    any limitation due to race, nationality or
    religion, have the right to marry and found a
    family.
  • Some today claim that a marriage among
    homosexuals is a right. What the Considerations
    highlights is that people cannot simply create
    rights claiming them to be human rights when
    they are not founded in the natural moral order.
    The State can create certain categories of legal
    rights. It cannot create natural moral law.

21
Historical linguistics (1)
  • Diachronic perspective
  • Semantic change
  • Each text segment occurrence a unique event
  • The necessity of diachronic corpora
  • How to organise diachronic corpora
  • Texts as reactions to previous texts
  • The issue of intertextuality
  • The necessity of software development

22
Historical linguistics (2) diachronic corpus of
social Vatican encyclicals property
  • Every man has by nature the right to possess
    property as his own. 1891,Rerum novarum,  6  
  • The natural right itself of owning goods ought
    always to remain intact and inviolate, since this
    indeed is a right that the state cannot take
    away. 1931, Quadragesimo anno,  49
  • Every man has in principle the right to use all
    the material goods of this earth, and this right
    can by no means be abolished, not even by other
    rights. 1941, Whitsun address.
  • The right to private ownership of goods has
    permanent validity. 1961, Mater et magistra,
     109
  • Private property does not constitute for anyone
    an absolute and unconditional right. 1967,
    Populorum progressio.  23
  • The violation of the human right to ownership of
    property leads to inefficiency. 1991, Centesimus
    annus,  24

23
Language teaching translation studies
  • Teaching foreign languages vs. teaching
    translation
  • Learners corpora, over-use and under-use, error
    analysis
  • ESL teachers questions and corpus evidence
  • Using monolingual corpora
  • Using parallel corpora for language learning and
    for translation
  • Replacing the word by the unit of meaning and the
    translation unit, respectively
  • CALL (computer-aided language learning)
  • Tim Johns DDL ( data-driven learning) webpage
    (Birmingham)
  • TALC (Teaching and Language Corpora)

24
Spoken language
  • Spoken language corpora vs. speech corpora
  • Spoken language and discourse analysis
  • HKCAC The Hong Kong Cantonese Adult Language
    Corpus
  • The corpus of Spoken Israeli Hebrew
  • Syllable contractions in a Mandarin
    conversational dialogue corpus
  • // CAN I help you // The use of rise and
    rise-fall tones in the Hong Kong Corpus of Spoken
    English

25
Language theory (1)
  • The focus of CL is on meaning.
  • CL deals with recorded (written, transcribed)
    language.
  • Texts are symbolic form and meaning are
    indivisible.
  • Meaning is only in the discourse.
  • Speakers intentions are irrelevant.
  • Meaning is independent of reality.
  • The discourse contains only testimony, not 1st
    person experiences.
  • Meaning is paraphrase.
  • Meaning is always provisional.
  • Meaning is negotiated by the discourse community
    members.
  • The discourse is self-reflexive it talks about
    language use. The discourse is democratic.

26
Language theory (2)
  • Synchronic CL makes general claims based on
    recurrence.
  • Synchronic CL defines types of tokens.
  • Diachronic CL deals with unique occurrences of
    units.
  • Diachronic CL looks at intertextuality.
  • What is said is a reaction to what has been said
    before.
  • CL looks for differences in languages, not for
    universals.
  • The single word is not privileged. What is a unit
    of meaning is a matter of perspective.
  • Units of meaning are monosemous.
  • There is no difference between linguistic meaning
    and encyclopaedic meaning.
  • CL is not about true meaning, it is about
    interpretation.
  • Linguists are not privileged over lay speakers.

27
Conclusions
  • CL is growing stronger (three journals now).
  • CL is about meaning.
  • CL deals with all aspects of language.
  • CL is different from computational linguistics
    and cognitive linguistics.
  • CL has a social perspective.
  • CL brings better dictionaries.
  • CL facilitates translation.
  • Corpus linguistics analyses the discourse.
  • CL needs a diachronic dimension.

28
Brief bibliography
  • Altenberg, Bengt, and Sylvian Granger (eds.)
    (2002) Lexis in Contrast. Amsterdam Benjamins
  • Aston, Guy, Silvia Bernardini and Dominic Stewart
    (eds.) (2004) Corpora and Language Learners.
    Amsterdam Benjamins
  • Biber, Doug, Susan Conrad and Randi Reppen
    (1998) Corpus Linguistics. Cambridge Cambridge
    University Press
  • Bowker, Lynne, and Jennifer Pearson (2002)
    Working with Specialised Corpora A practical
    Guide to Using Corpora. London Routledge
  • Garside, Roger, Geoffrey Leech and Anthony
    McEnery 1997) Corpus Annotation. London Longman
  • Granger, Sylviane, Joseph Hung and Stephanie
    Petch-Tyson (eds.) (2002) Computer Learner
    Corpora, Second Language Acquisition and Foreign
    Language Teaching. Amsterdam Benjamins
  • Granger, Sylviane, Jacques Lerot and Stephanie
    Petch-Tyson (eds.) (2003) Corpus-Based
    Approaches to Contrastice Linguisitcs and
    Translation Studies. Amsterdam Rodopi
  • Halliday, M.A.K., Wolfgang Teubert and Colin
    Yallop, with Anna Cermakova (2004) Lexicology
    and Corpus Linguistics. London Continuum
  • Hoey, Michael (1991) Patterns of Lexis in Text.
    Oxford Oxford University Press
  • Hunston, Susan, and Gill Francis (1999) Pattern
    Grammar. Amsterdam Benjamins
  • Hunston, Susan (2002) Corpora in Applied
    Linguistics. Cambridge Cambrifge University
    Press
  • Krishnamurthy, Ramesh (ed.) (2004) English
    Collocation Studies, by John M. Sinclair, Susan
    Jones and Robert Daley. London Continuum
  • Nelson, Gerald, Sean Wallis and Bas Aarts (2002)
    Exploring Natural Language Working with the
    British Component of the International Corpus of
    English Amsterdam Benjamins
  • Olohan, Maeve (2004) Introducing Corpora in
    Translation Studies. London Routledge
  • Pearson, Jennifer (1998) Terms in Context.
    Benjamins Amsterdam
  • Peters, Pam, Peter Collins and Adam Smith (eds.)
    (2002) New Fronties of Corpus Research.
    Amsterdam Rodopi
  • Sinclair, John (1991) Corpus, Concordance,
    Collocation. Oxford Oxford University Press
  • Sinclair, John (2004) Trust the Text Language,
    Corpus and Discourse. London Routledge
  • Sinclair, John (ed.) (1987) Looking Up An
    Account of the COBUILD Project in Lexical
    Computing. London HarperCollins
Write a Comment
User Comments (0)
About PowerShow.com