Title: Corpus linguistics for translators
1Corpus linguistics for translators
- Amanda Saksida
- University of Nova Gorica
2... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
3... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
Hedwig Harry Hogwarts Hagrid Quidditch ...
4... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
Hedwig Harry Hogwarts Hagrid Quidditch ...
wart hog Phacochoerus aethiopicus
5... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
Hedwig Harry Hogwarts Hagrid Quidditch ...
wart hog Phacochoerus aethiopicus
6(No Transcript)
7Course outline
- Introductory what is corpora, hystory, typology,
online corpora, - Areas where corpora are being used,
- Corpus-based translation studies interesting
examples - Tools for building and usage of corpora
8What is corpus
- A corpus is a collection of pieces of language
that are selected and ordered according to
explicit linguistic criteria in order to be used
as a sample of the language. - Computer corpus a corpus which is encoded in a
standardised and homogeneous way for open-ended
retrieval tasks. Its constituent pieces of
language are documented as to their origins and
provenance. - (Guidelines of the Expert Advisory Group on
Language Engineering Standards, 1996) - Big collections of modern texts
- Electronic form
- Representative for language/dialect
- Base for desctiptive studies (not prescriptive!)
9Brief hystory of corpus linguistics
- 1964 Brown corpus (1 M words)
- John Sinclair and the Cobuild-Revolution gt Bank
of English (470 M), - British National Corpus (100 M) gt Other
languages Czec, Hungarian, Croatian, Slovac, ) - Web as corpus with the digital revolution, more
and more texts are available on the net gt
programs that build corpora using on-line texts
(WebBootCat, http//www.sketchengine.co.uk/auth/wb
c/mycorp.cgi)
10Types of corpora
- Kinds of corpora
- Medium written texts / spoken language
- Size referential corpora / specialized corpora
- Time span synchronic/diachronic corpora
- Tagging lemmatized / POS-tagged corpus
- Language mono- or multilingual corpora
- paralell
- comparable
- translational
11Corpus usage
- Lexicography
- Descriptive Grammars
- Translational tools and studies
- Foreign languages learning
- Socio-linguistic studies
- Language technologies
12Keywords
- Concordance
- KWIC (Keyword in Context)
- Type / Token
- Tag / Lemma
- Collocation
13What can a corpus tell us?
- Word frequency
- How frequent a word / word form is (copared to
other words)? - Lexical information
- Which word frequently coocur?
- Which affixes can a word have?
- Syntactical information
- In which syntactical structures can a word occur?
- Semantical information
- What are the possible meanings of a word?
- Pragmatic information
- In which texts can we find a word? What stylistic
inforamtion does a word or it's context bear?
Does the usage of a word stagnate, is the
frequency increasing or decreasing?
14What can a corpus tell us?
- Translational studies
- Parallel corpus studies can reveal
characteristics of translated texts, such as
tendencies towards explicitness and avoidance of
repetition. - Comparison between the translation part of the
corpus and a corpus of texts of the same genre,
written in the target language for the
translation corpus, reveals a tendency towards
what we might call the Eliza Doolittle
phenomenon the translated texts, more than the
texts in the control corpus, tend to contain
those TL phrases, structures, and so on, which,
from a comparative point of view, seem
particularly characteristic of the
TL. (Malmkjaer 1996)
15Some of the online corpora
- British National Corpus
- http//www.natcorp.ox.ac.uk/
- http//view.byu.edu
- Bank of English
- http//www.collins.co.uk/Corpus/CorpusSearch.aspx
- CORIS
- http//corpus.cilta.unibo.it8080/DEMOCORISCorpQue
ry.html - FidaPLUS
- www.fidaplus.net
- Good link
- http//devoted.to/corpora
16Tools for translating
- Sentence alignment
- TRADOS WinAlign
- ATRIL DejaVu
- Vanilla Aligner (unix/linux)
- Concordances
- Wordsmith Tools (www.lexically.net)
- Sketch Engine (http//www.sketchengine.co.uk)
- MonoConc/ParaConc (www.athel.com)
- aConCorde - gut für Arabisch (http//www.comp.lee
ds.ac.uk/andyr/software/aConCorde/) - CQP (ims.uni-stuttgart.de)
- Manatee / Bonito (www.textforge.cz)
17Corpus linguistics in Turkey
- Kemal Oflazer http//www.andrew.cmu.edu/user/ko/
- Informatics Institute corpus http//www.ii.metu.e
du.tr/corpus/