Title: Current trends in corpus linguistics
1Current trends in corpus linguistics
- Wolfgang Teubert
- University of Birmingham
- Teubertw_at_bham.ac.uk
2Cognitive linguistics vs.corpus linguistics
- Cognitive linguistics is interested in how
thought is turned into language and how language
is turned into thought. - Cognitive linguistics looks at language from a
mental/psychological perspective - Corpus linguistics looks at language from a
social perspective - Corpus linguistics deals with exchanging and
sharing content - Corpus linguistics is interested in how
communication works between the members of a
discourse community
3Cognitive linguistics, or how does the mind work?
- Meanings are in the head.
- Cognitivism looks into behaviourisms black box.
- The mind uses the universal language of thought.
- There is an innate language faculty/organ
- Natural language expressions are converted into
mental representations. - Mental representations are processed
syntactically (algorithmically). - The mind is a syntactic engine.
4Corpus linguistics
- Corpus linguistics has a social perspective.
- The language community exchanges and shares
content in the discourse. - We are interested in meaning, not in peoples
heads. - The language of bees (dancing) has meaning only
to us (the observers) , not to the bees.
5The agenda of corpus linguisticsReader Corpus
Linguistics Critical Concepts in Linguistics
(Routledge 2007)
- Theoretical aspects (2)
- History of CL
- Corpus composition and compilation (1)
- Standardisation, tagging, alignment,
software, (7) - Lexicography, collocations, idioms (5)
- Terminology
- Grammar (3)
- Translation, parallel corpora, multilinguality
(5) - CDA, stylistics, rhetoric, evaluation (1)
- Language history, historical linguistics
- Language teaching (2)
- Spoken language, discourse studies (2)
6The agenda of corpus linguisticsIJCL recent
issues
- (theory) My version of corpus linguistics
- (composition) The representativeness of the Czech
National Corpus - (software) the automatic recognition of verb
patterns - (lexicography/collocation) Semantic prosody and
semantic preference - (grammar) It-extraposition in English
- (translation) A corpus-based view of similarity
and difference in translation - (CDA) Examining the ideology of sleaze
- (language teaching) ESL teachers questions and
corpus evidence - (spoken language) Syllable contractions in a
Mandarin conversational dialogue corpus
7Corpus composition and compilation
- Representativeness
- Opportunistic corpora
- The Internet as (virtual) corpus
- Learners corpora
- Diachronic corpora (monitor corpora)
- Parallel corpora
- Special language corpora
- Spoken language corpora and transcription
- Corpus compiler vs. corpus user
8Methodology and software development
- The issue of annotation/tagging pro and con
- The issue of the tagset
- The need for better automatic alignment
- Corpus-driven sense disambiguation software
- Software for diachronic corpora
- Intertextuality software
- Paraphrase recognition software
- Named entities detection software
- Neology detection software
9Lexicography, phraseology, collocation study
- The Cobuild project
- Corpus size
- The synchronic perspective generalisation
- The collocation issue (more or less fixed
expressions) - The paraphrase issue
- Evaluation connotation deontic meaning
- The diachronic perspective meaning as unique
events - Using corpora to investigate antonym acquisition
10Terminology and special languages
- Hard terminology standardisation in engineering,
production and maintenance - Soft terminology emergence of new ideas in the
sciences - Terminology mining (paraphrases and definitions)
- The detection of new ideas
- Fixedness in genre-specific language and
intercultural differences comparing English and
Chinese Fire News corpora
11Grammar
- Rule-based or list-based?
- The relationship between grammar and lexis
- Lexicogrammar and local grammar
- Pattern grammar
- Valency grammar
- General grammar vs. the dictionary
- let-imperatives in English
- Genitive and of-construction in modern written
English
12Parallel corpora etc. (1)
- Parallel corpora re-using previous translations
- The advantage of working with parallel corpora
- The compilation of parallel corpora
- The definition of the translation unit (smallest
recurrent text segment for which there is only
one target language equivalent) - What is source language, what is target language?
- The Hong Kong Legal Document Corpus
- A TranslationBase built from a parallel corpus
- Parallel corpora and cultural contrast
13Parallel corpora Extracted adjectivenoun
phrases as translation equivalents
- 105 straight line
- 104 legal officer
- 101 residential care
- 101 criminal offences
- 100 annual allowance
- 99 long term
- 98 human remains
- 98 conclusive evidence
- 97 written permission
- 97 public bus
- 97 personal representatives
- 97 first column
- 96 notifiable workplace
- 96 listed company
- 95 light bus
- 105 straight line
- 104 legal officer
- 101 residential care
- 101 criminal offences
- 100 annual allowance
- 99 long term
- 98 human remains
- 98 conclusive evidence
- 97 written permission
- 97 public bus
- 97 personal representatives
- 97 first column
- 96 notifiable workplace
- 96 listed company
- 95 light bus
14Parallel corpora etc. (2)Linguistic anthropology
- A Chinese-English/English-Chinese corpus of
fiction - A cross-cultural study of emotions
- grief, sorrow, melancholy, depression in the
English and Chinese culture - Individual person vs. society
- The diachronic dimension
15Parallel corpora etc. (3) grief and sorrow some
translation equivalents
16Parallel corpora etc. (4) grief and sorrow
beishang
- ????,?????????,???,????????????????,????????????,?
?????????????? - In the dining-room she had been demure and
discreet. Now all pretense of grief had passed
away from her. Her eyes shone with the joy of
living , and her face still quivered with
amusement at some remark of her companion. - ????????????,????????????,????????????,????????,??
?????????????? - Suellen and Careen had cried themselves to sleep,
as they did at least twice a day when they
thought of Ellen , tears of grief and weakness
oozing down their sunken cheeks - ??????????????,???????????,????????????????,
????????,??????,???????,???? - The manager suffered this as a personal appeal.
It came to him as if they were alone, and he
could hardly restrain the tears for sorrow over
the hopeless, pathetic, and yet dainty and
appealing woman whom he loved
17Words denoting emotions vs. words for facial
expressions, gestures etc.
- hun shen fa chan (trembling all over the body)
- tui fa ruan (legs becoming soft)
- lian dou huang-le (face also yellow/fear)
- liang-le ban jie (cold over half the body)
- lei gan qi jue (tear dry breath cut off)
- bian se (change colour/get angry)
- lian bai qi ye (face white breath hiccup/angry)
18Critical discourse analysis (1)
- Discourse and society
- Discourse and reality
- Mass media and the construction of social reality
- Comparing texts
- Corpus linguistics methodology for CDA
- Analysing the homosexual marriage discourse
19Critical discourse analysis (2)
- Marriage is for procreation.
- the purpose of marriage is to produce children.
- Marriage is for raising children
- marriage is only between a man and a woman
- marriage is the union of a man and a woman
- Marriage is a sacred sacrament.
- Marriage is a sacred institution.
- Gay marriage is a civil rights issue
- Prohibiting homosexual marriage is discrimination
- Why Gay Marriage is a Fundamental Human Rights
Issue - opposition to gay marriage is based on religious
- In some ways marriage is a white GLBT issue.
- same-sex marriage is the culmination of a larger
20Critical discourse analysis (3)
- Prohibiting homosexual marriage is a
discrimination against homosexuals and a
violation of multiple human rights. Article 16
of the Universal Declaration of Human Rights
states that, Men and women of full age, without
any limitation due to race, nationality or
religion, have the right to marry and found a
family. - Some today claim that a marriage among
homosexuals is a right. What the Considerations
highlights is that people cannot simply create
rights claiming them to be human rights when
they are not founded in the natural moral order.
The State can create certain categories of legal
rights. It cannot create natural moral law.
21Historical linguistics (1)
- Diachronic perspective
- Semantic change
- Each text segment occurrence a unique event
- The necessity of diachronic corpora
- How to organise diachronic corpora
- Texts as reactions to previous texts
- The issue of intertextuality
- The necessity of software development
22Historical linguistics (2) diachronic corpus of
social Vatican encyclicals property
- Every man has by nature the right to possess
property as his own. 1891,Rerum novarum, Â 6 Â - The natural right itself of owning goods ought
always to remain intact and inviolate, since this
indeed is a right that the state cannot take
away. 1931, Quadragesimo anno, Â 49 - Every man has in principle the right to use all
the material goods of this earth, and this right
can by no means be abolished, not even by other
rights. 1941, Whitsun address. - The right to private ownership of goods has
permanent validity. 1961, Mater et magistra,
 109 - Private property does not constitute for anyone
an absolute and unconditional right. 1967,
Populorum progressio. Â 23 - The violation of the human right to ownership of
property leads to inefficiency. 1991, Centesimus
annus, Â 24
23Language teaching translation studies
- Teaching foreign languages vs. teaching
translation - Learners corpora, over-use and under-use, error
analysis - ESL teachers questions and corpus evidence
- Using monolingual corpora
- Using parallel corpora for language learning and
for translation - Replacing the word by the unit of meaning and the
translation unit, respectively - CALL (computer-aided language learning)
- Tim Johns DDL ( data-driven learning) webpage
(Birmingham) - TALC (Teaching and Language Corpora)
24Spoken language
- Spoken language corpora vs. speech corpora
- Spoken language and discourse analysis
- HKCAC The Hong Kong Cantonese Adult Language
Corpus - The corpus of Spoken Israeli Hebrew
- Syllable contractions in a Mandarin
conversational dialogue corpus - // CAN I help you // The use of rise and
rise-fall tones in the Hong Kong Corpus of Spoken
English
25Language theory (1)
- The focus of CL is on meaning.
- CL deals with recorded (written, transcribed)
language. - Texts are symbolic form and meaning are
indivisible. - Meaning is only in the discourse.
- Speakers intentions are irrelevant.
- Meaning is independent of reality.
- The discourse contains only testimony, not 1st
person experiences. - Meaning is paraphrase.
- Meaning is always provisional.
- Meaning is negotiated by the discourse community
members. - The discourse is self-reflexive it talks about
language use. The discourse is democratic.
26Language theory (2)
- Synchronic CL makes general claims based on
recurrence. - Synchronic CL defines types of tokens.
- Diachronic CL deals with unique occurrences of
units. - Diachronic CL looks at intertextuality.
- What is said is a reaction to what has been said
before. - CL looks for differences in languages, not for
universals. - The single word is not privileged. What is a unit
of meaning is a matter of perspective. - Units of meaning are monosemous.
- There is no difference between linguistic meaning
and encyclopaedic meaning. - CL is not about true meaning, it is about
interpretation. - Linguists are not privileged over lay speakers.
27Conclusions
- CL is growing stronger (three journals now).
- CL is about meaning.
- CL deals with all aspects of language.
- CL is different from computational linguistics
and cognitive linguistics. - CL has a social perspective.
- CL brings better dictionaries.
- CL facilitates translation.
- Corpus linguistics analyses the discourse.
- CL needs a diachronic dimension.
28Brief bibliography
- Altenberg, Bengt, and Sylvian Granger (eds.)
(2002) Lexis in Contrast. Amsterdam Benjamins - Aston, Guy, Silvia Bernardini and Dominic Stewart
(eds.) (2004) Corpora and Language Learners.
Amsterdam Benjamins - Biber, Doug, Susan Conrad and Randi Reppen
(1998) Corpus Linguistics. Cambridge Cambridge
University Press - Bowker, Lynne, and Jennifer Pearson (2002)
Working with Specialised Corpora A practical
Guide to Using Corpora. London Routledge - Garside, Roger, Geoffrey Leech and Anthony
McEnery 1997) Corpus Annotation. London Longman - Granger, Sylviane, Joseph Hung and Stephanie
Petch-Tyson (eds.) (2002) Computer Learner
Corpora, Second Language Acquisition and Foreign
Language Teaching. Amsterdam Benjamins - Granger, Sylviane, Jacques Lerot and Stephanie
Petch-Tyson (eds.) (2003) Corpus-Based
Approaches to Contrastice Linguisitcs and
Translation Studies. Amsterdam Rodopi - Halliday, M.A.K., Wolfgang Teubert and Colin
Yallop, with Anna Cermakova (2004) Lexicology
and Corpus Linguistics. London Continuum - Hoey, Michael (1991) Patterns of Lexis in Text.
Oxford Oxford University Press - Hunston, Susan, and Gill Francis (1999) Pattern
Grammar. Amsterdam Benjamins - Hunston, Susan (2002) Corpora in Applied
Linguistics. Cambridge Cambrifge University
Press - Krishnamurthy, Ramesh (ed.) (2004) English
Collocation Studies, by John M. Sinclair, Susan
Jones and Robert Daley. London Continuum - Nelson, Gerald, Sean Wallis and Bas Aarts (2002)
Exploring Natural Language Working with the
British Component of the International Corpus of
English Amsterdam Benjamins - Olohan, Maeve (2004) Introducing Corpora in
Translation Studies. London Routledge - Pearson, Jennifer (1998) Terms in Context.
Benjamins Amsterdam - Peters, Pam, Peter Collins and Adam Smith (eds.)
(2002) New Fronties of Corpus Research.
Amsterdam Rodopi - Sinclair, John (1991) Corpus, Concordance,
Collocation. Oxford Oxford University Press - Sinclair, John (2004) Trust the Text Language,
Corpus and Discourse. London Routledge - Sinclair, John (ed.) (1987) Looking Up An
Account of the COBUILD Project in Lexical
Computing. London HarperCollins