Corpus linguistics for translators - PowerPoint PPT Presentation

About This Presentation
Title:

Corpus linguistics for translators

Description:

Corpus linguistics for translators Amanda Saksida University of Nova Gorica Course outline Introductory: what is corpora, hystory, typology, online corpora, Areas ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 18
Provided by: lojzeLug
Category:

less

Transcript and Presenter's Notes

Title: Corpus linguistics for translators


1
Corpus linguistics for translators
  • Amanda Saksida
  • University of Nova Gorica

2
... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
3
... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
Hedwig Harry Hogwarts Hagrid Quidditch ...
4
... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
Hedwig Harry Hogwarts Hagrid Quidditch ...
wart hog Phacochoerus aethiopicus
5
... He cast a sídeways look at Harry under his
bushy eyebrows. Be grateful if yeh didnt
mention that ter anyone at Hogwarts, he said.
Im er not supposed ter do magic, strictly
speakin.
Hedwig Harry Hogwarts Hagrid Quidditch ...
wart hog Phacochoerus aethiopicus
6
(No Transcript)
7
Course outline
  • Introductory what is corpora, hystory, typology,
    online corpora,
  • Areas where corpora are being used,
  • Corpus-based translation studies interesting
    examples
  • Tools for building and usage of corpora

8
What is corpus
  • A corpus is a collection of pieces of language
    that are selected and ordered according to
    explicit linguistic criteria in order to be used
    as a sample of the language.
  • Computer corpus a corpus which is encoded in a
    standardised and homogeneous way for open-ended
    retrieval tasks. Its constituent pieces of
    language are documented as to their origins and
    provenance.
  • (Guidelines of the Expert Advisory Group on
    Language Engineering Standards, 1996)
  • Big collections of modern texts
  • Electronic form
  • Representative for language/dialect
  • Base for desctiptive studies (not prescriptive!)

9
Brief hystory of corpus linguistics
  • 1964 Brown corpus (1 M words)
  • John Sinclair and the Cobuild-Revolution gt Bank
    of English (470 M),
  • British National Corpus (100 M) gt Other
    languages Czec, Hungarian, Croatian, Slovac, )
  • Web as corpus with the digital revolution, more
    and more texts are available on the net gt
    programs that build corpora using on-line texts
    (WebBootCat, http//www.sketchengine.co.uk/auth/wb
    c/mycorp.cgi)

10
Types of corpora
  • Kinds of corpora
  • Medium written texts / spoken language
  • Size referential corpora / specialized corpora
  • Time span synchronic/diachronic corpora
  • Tagging lemmatized / POS-tagged corpus
  • Language mono- or multilingual corpora
  • paralell
  • comparable
  • translational

11
Corpus usage
  • Lexicography
  • Descriptive Grammars
  • Translational tools and studies
  • Foreign languages learning
  • Socio-linguistic studies
  • Language technologies

12
Keywords
  • Concordance
  • KWIC (Keyword in Context)
  • Type / Token
  • Tag / Lemma
  • Collocation

13
What can a corpus tell us?
  • Word frequency
  • How frequent a word / word form is (copared to
    other words)?
  • Lexical information
  • Which word frequently coocur?
  • Which affixes can a word have?
  • Syntactical information
  • In which syntactical structures can a word occur?
  • Semantical information
  • What are the possible meanings of a word?
  • Pragmatic information
  • In which texts can we find a word? What stylistic
    inforamtion does a word or it's context bear?
    Does the usage of a word stagnate, is the
    frequency increasing or decreasing?

14
What can a corpus tell us?
  • Translational studies
  • Parallel corpus studies can reveal
    characteristics of translated texts, such as
    tendencies towards explicitness and avoidance of
    repetition.
  • Comparison between the translation part of the
    corpus and a corpus of texts of the same genre,
    written in the target language for the
    translation corpus, reveals a tendency towards
    what we might call the Eliza Doolittle
    phenomenon the translated texts, more than the
    texts in the control corpus, tend to contain
    those TL phrases, structures, and so on, which,
    from a comparative point of view, seem
    particularly characteristic of the
    TL. (Malmkjaer 1996)

15
Some of the online corpora
  • British National Corpus
  • http//www.natcorp.ox.ac.uk/
  • http//view.byu.edu
  • Bank of English
  • http//www.collins.co.uk/Corpus/CorpusSearch.aspx
  • CORIS
  • http//corpus.cilta.unibo.it8080/DEMOCORISCorpQue
    ry.html
  • FidaPLUS
  • www.fidaplus.net
  • Good link
  • http//devoted.to/corpora

16
Tools for translating
  • Sentence alignment
  • TRADOS WinAlign
  • ATRIL DejaVu
  • Vanilla Aligner (unix/linux)
  • Concordances
  • Wordsmith Tools (www.lexically.net)
  • Sketch Engine (http//www.sketchengine.co.uk)
  • MonoConc/ParaConc (www.athel.com)
  • aConCorde - gut für Arabisch (http//www.comp.lee
    ds.ac.uk/andyr/software/aConCorde/)
  • CQP (ims.uni-stuttgart.de)
  • Manatee / Bonito (www.textforge.cz)

17
Corpus linguistics in Turkey
  • Kemal Oflazer http//www.andrew.cmu.edu/user/ko/
  • Informatics Institute corpus http//www.ii.metu.e
    du.tr/corpus/
Write a Comment
User Comments (0)
About PowerShow.com