Title: IT AND TRANSLATION
1IT AND TRANSLATION
2Rationale for IT Applications to Translation
A computer is a device that can be used to
magnify human productivity. Properly used, it
does not dehumanize by imposing its own Orwellian
stamp on the products of human spirit .
..Translation is a fine and exacting art, but
there is much about it that is mechanical and
routine, if this were given over to a machine,
the productivity of the translator would not only
be magnified but this work would become more
rewarding, more exciting, more human.
Martin Kay (1987)
3COURSE OVERVIEW
- ESSENTIALS
- TEXT PROCESSING
- MT
- TM
- WORKING WITH CORPORA
- TERMINOLOGY EXTRACTION AND GLOSSARY PRODUCTION
(MONOLINGUAL AND BILINGUAL CORPORA)
4COURSE OVERVIEW - DETAILS
- 1) ESSENTIALS
- Types of computer aides
- CAT vs. MT
- History of CAT tools
- General principles of working with CAT tools
- Reference materials
- Localization and internationalization
- UNIX
SOME OF THIS TODAY!
5COURSE OVERVIEW - DETAILS
- 2) TEXT PROCESSING
- Word and WordPad (tips and tricks)
- Fonts, code pages, keyboard layout, language
tools in Windows XP and Office - Speech recognition software
- Scanning
- OCR
- File types (essential info on the most common
file types and file conversion utilities)
6COURSE OVERVIEW - DETAILS
- 3) MT
- How it works, brief exhibition
- Systran Pro
- Prompt
- Neuro Tran
- Babelfish
DESKTOP BASED
SUPPORTS CROATIAN (partially Serbian)
WEB BASED
7COURSE OVERVIEW - DETAILS
- 4) TM
- Overview (what it is, standards and file formats)
- Desktop vs. server based TM programs
- WinAlign
- WordFast
- Trados (nowadays SDL Trados) Freelance edition
- Sisulizer
8COURSE OVERVIEW - DETAILS
- 5) WORKING WITH CORPORA
- Essentials
- Concordancing (WordSmith, Concordancer, AntConc)
- Advanced corpora analysis WordSmith, TigerSearch
- Lemmatization and annotation
- Parallel corpora ParaConc
9COURSE OVERVIEW - DETAILS
- 6) TERMINOLOGY EXTRACTION AND GLOSSARY PRODUCTION
- Essentials
- Doing it automatically Trados (i.e. SDL)
MultiTerm (Desktop and Extract) - Doing it semi-automatically ParaConc,
Concordancer
10COURSE REQUIREMENTS
- Basic computer literacy
- Positive outlook
- Computers dont bite
- CAT tools are not complex, they are actually made
to make you more efficient - Interest in translation
- Willingness to become several times more
efficient in doing translations
11SCHEDULE
- HONESTLY, WE DONT KNOW FOR CERTAIN!
- THATS WHY WE NEED YOUR EMAIL ADDRESSES, SO THAT
WE CAN KEEP YOU UPDATED WITH THE LATEST SCHEDULE
DEVELOPMENTS - PROBABLY LOCATION 25 (lectures) and 38
(computer lab), SATURDAYS, at1600 OCLOCK
12LITERATURE
- Geoffrey Samuelsson-Brown, A Practical Guide for
Translators (Topics in Translation), Multilingual
Matters, 4th edition (May 28, 2004) - H. L. Somers (Editor), Computers and Translation
A Translator's Guide (Benjamins Translation
Library, 35), John Benjamins Publishing Co, 1st
edition (May 2003) - Bert Esselink, A Practical Guide to Localization
(Language International World Directory), John
Benjamins Publishing Co, Revised 1st edition
(September 2000) - Silvia Pavel and Diane Nolet, Handbook of
Terminology, Translation Bureau of Canada, 1st
edition (2001) - Frank Austermuhl, Electronic Tools for
Translators (Translation Practices Explained),
St. Jerome, 1st edition, (April 2001)
13COURSE OVERVIEW - GRADING
- This is a hands-on course
- You will be graded on the basis of the results of
your practical assignments - Creating TMs from parallel texts (fiction and
non-fiction e.g. a book and a manual) in a way,
you will be also creating a parallel corpus - Translating two short passages (fiction and
non-fiction) using your newly created TMs
14IT AND TRANSLATION
- ESSENTIALS AND MORE ABOUT THE COURSE
15TYPES OF COMPUTER AIDES
- Computer aides / tools that are relevant to
translators can be roughly classified into three
groups - Basic input and editing tools
- Reference tools
- Productivity tools
WORD PROCESSORS
Electronic books (desktop web) Electronic
dictionaries Web (Eurodicautom, onelook, etc.)
Software-based reference materials
(encyclopedias, e-Bible, etc.)
TM tools MT tools Speech Technology (i.e. voice
recognition)
16CAT vs MT
- As soon as you start using computer software in
the process of translating, you are entering the
realm of COMPUTER-AIDED TRANSLATION, or CAT in
short. - In other words, CAT is a form of translation
wherein a human translator translates texts using
computer software designed to support and
facilitate the translation process.
17CAT vs MT (continued)
- The problem is that COMPUTER-AIDED TRANSLATION,
is sometimes also called COMPUTER-ASSISTED
TRANSLATION, MACHINE-AIDED TRANSLATION or
MACHINE-ASSISTED TRANSLATION. - Due to the latter two terms, CAT is sometimes
confused with MACHINE TRANSLATION, or MT in
short.
18CAT vs MT (continued)
- Although these two concepts are related and
similar in some aspects, CAT and MT denote two
diametrically different processes - In CAT, the computer program merely supports the
translator, so the translator translates the text
himself/herself, making all the essential
decisions involved. - In MT, the translator supports the machine, that
is to say the computer (i.e. program) translates
the text, which is then edited by the translator,
or, in most cases, not edited at all.
19CAT vs MT (continued)
- Graphically represented, the difference is
Translation Technology Continuum
automation
human involvement
Computer-aided Translation (CAT)
Unaided Translation
Automatic Translation/ Machine Translation
Translation process aided by electronic tools
such as (most typically) Translation Memory
Translation process automated by use of Machine
Translation
Translation process not aided by any electronic
tools
Adapted from Hutchins Somers (1992)
20CAT its scope
WRONG!!!
- CAT is traditionally associated with large-scale
/ corporate translations - manuals and technical documentation
- software localization
- Typewriter-assisted (i.e. traditional)
translation is usually associated with
small-scale / individual translations (done by
freelancers) - fiction books, scientific papers, etc.
21CAT its scope (continued)
- This is notion of CAT being restricted to
corporate translation projects dates back to the
90s and is based exclusively on financial
criteria - during the early and mid 90s a combination of a
high-end computer and a high-end CAT tool cost as
much as a new car - from their very beginnings CAT tools were
designed to be capable of handling both big- and
small-scale projects, but initially no freelance
translator could afford them
22CAT its scope (continued)
- Even for a freelance translator, CAT route is
nowadays the only possibility if one wants to
provide high-quality, 100 terminologically
consistent and efficiently produced translations. - A testimony to that is the industry-standard TM
program Trados Trados Freelance edition has been
the companys best-selling TM program for a
number of years.
23CAT tools a bit about their history
- CAT tools were developed after (very)
disappointing initial experiments with MT tools. - So, in order to give you a proper overview of how
we got where we are now, we have to start with
the history of MT tools
24MT History how we switched to CAT
- MT research began in 1950s Warren Weavers
1949 Memo - When I look at an article in Russian, I say
This is really written in English, but it has
been coded in some strange symbols. I will now
proceed to decode. - (in Locke and Booth 195518)
25MT History how we switched to CAT
- Initially based on some misconception about human
translation - knowledge of two language systems suffices
- it is merely a matter of looking up dictionaries
- it is easy to define a good translation
- there is only one correct translation possible
26MT History how we switched to CAT
MT history milestones pre-ALPAC
- 1954 Georgetown system demo
- successful translation of 49 Russian sentences
into English - 1955-1966 50m spent in 20 research centres in
USA - 1966 Automatic Language Processing Advisory
Committee (ALPAC) Report concludes - ...MT is slower, less accurate and twice as
expensive as Human Translation... - ...there is no prospect of useful MT either
immediately or in the future...
27MT History how we switched to CAT
MT history milestones post-ALPAC
- 1969 privately funded projects
- Logos system (1969) Weidner-CAT (1977) ALPS
(1980) - 1975 Météo project in Canada
- 1976 European Commission acquires Systran
- 1979 Eurotra project in Europe for
Multilingual system - 1980 PC-based system
- 1990 data-driven system WebMT
-
28MT History how we switched to CAT
- 1975 Météo project in Canada
- Automatic translation of weather forecasts (En
-gt Fr) - Sublanguage approach (domain-specific MT)
- Most successful MT application to date
- public broadcasting since 1977
- Fr -gt En available since 1989
- only 4 of output needs post-editing
- rapid translation staff turnover no longer a
problem
29MT History how we switched to CAT
- Renewed interest in MT in late 80s and early
90s
- Technological factors
- specifically prevalence of PC with improved
processing power - Translation market factors
- official bilingualism/multilingualism create
institutional needs - globalisation creates huge commercial needs
- Advances in computational linguistics
- More realistic user expectations
- Internet creates casual access to multilingual
information
30MT History how we switched to CAT
- However, translations produced by MT were still
not reliable and accurate enough for large-scale
commercial applications. - So, it became evident that the human translator
cannot be eliminated and replaced by computers. - Actually, it became obvious that computers
programs should be used as TOOLS which only HELP
the translator.
31History of CAT Tools
- Unreliability of MT tools -gt large corporations
hire translation agencies - Translations agencies find it difficult to cope
with the increasing demand - Translation agencies develop their own in-house
CAT tools - Translation agencies begin to sell their CAT tools
32History of CAT Tools
- Two major players in the domain of CAT tools
development Trados and STAR Group both started
as - TRANSLATION AGENCIES!!!
STAR AG was founded as a small translation agency
in 1984 by Josef Zibung and Hanspeter Siegrist in
the northern Swiss city of Stein am Rhein near
Schaffhausen. It won and keept customers from the
automotive, machine tool, computer and
aeronautics industries like ABB, ATT, BMW,
Dornier, IBM, Mazda, Mercedes, Nissan, Saab and
Siemens.
TRADOS was founded in 1984 by Jochen Hummel and
Iko Knyphausen in Stuttgart, Germany to provide
translation services for IBM.
33TRADOS timeline
- 1990 - first version of TRADOS's main component,
MultiTerm was created for DOS - 1992 -TRADOS developed the first MultiTerm for
Windows (v3.1) - 1992 TRADOSs Translator's Workbench with
linguistic fuzzy-matching on translation memories
for DOS - 1994 - TRADOSs Translator's Workbench for
Windows
34TRADOS timeline (continued)
- 1997 BREAKTHROUGH Microsoft decides to base
its internal localization memory store on TRADOS - 1998 Microsoft acquires a share of 20 in TRADOS
TRADOS becomes a de-facto industry standard CAT
tool!!!
Thats why we will mostly work with TRADOS in
this course (as far as TM is concerned).
But we will also work with WordFast, because not
all people can afford Trados.
35WHAT WE WANT TO TEACH YOU HERE?
- TWO PRACTICAL EXAMPLES OF COMMON TRANSLATION
PROBLEMS
36(No Transcript)
37- IMPORTANT THINGS TO NOTE
- (quite obvious) the book has an index YOU
(i.e. the translator) are supposed to make it in
the translated version of the book - a vast index a lot of terminology
- some index terms appear on several pages that
are not necessarily in the same chapter (e.g. pg.
36, pg. 92 and pg. 255) a very serious problem
for the consistency of you translation
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43General principles of working with CAT tools
- The main goals are EFFICIENCY and CONSISTENCY
- CAT tools TM tools (in this case only)
- The basic idea is fairly simple
- Documents, especially technical ones, contain a
large amount of content that is similar or
identical to information already contained in
earlier versions or similar documents that have
been translated before. - that applies to the source editing language (SL)
as well as the target translation languages (TL).
44General principles of working with CAT tools
- So, wouldnt it be great to re-use previously
translated content as valuable reference material
for new translations as well so as to obtain
consistency of terminology and phrasing? - That is exactly what CAT tools do!
- CAT tools make it possible for translators to
work only on content that is being created for
the first time. Existing text and text similar to
existing text is taken from the available.
reference translations (i.e. from TM translation
memory).
45General principles of working with CAT tools
- So, wouldnt it be great to re-use previously
translated content as valuable reference material
for new translations as well so as to obtain
consistency of terminology and phrasing? - That is exactly what CAT tools do!
- CAT tools make it possible for translators to
work only on content that is being created for
the first time. Existing text and text similar to
existing text is taken from the available.
reference translations (i.e. from TM translation
memory).
46TRADOS - a screenshot
47A DREAM COME TRUE?
- TO ENJOY ALL THE BENEFITS OF CAT TOOLS FIRST YOU
HAVE TO CREATE A TM AND A TERMINOLOGY DATABASE - either from your old translations
- or from new translations (i.e. creating a TM from
scratch)
THAT IS WHERE OTHER CAT TOOLS (i.e. NON-TM CAT
tools) STEP IN TO SAVE THE DAY!!!
48REUSING YOU OLD TRANSLATIONS
- The best way to make a TM
- reliable source (YOU did the translation)
- readily available (stored on you PC)
49A BRIEF DIGRESSION
- The term LOCALIZATION has often popped up in
previous slides - What is LOCALIZATION?
50WHAT IS LOCALIZATION?
- Localization is the process of adapting,
translating and customizing a product (software)
for a specific market (for a specific locale or
cultural conventions the locale usually
determines conventions such as sort order,
keyboard layout, date, time, number and currency
formats). In terms of software localization, this
means the production of interfaces that are
meaningful and comprehensible to local users. - The Localization Industry Standards Association
(LISA) defines localization as Localization
involves taking a product and making it
linguistically and culturally appropriate to the
target locale (country/region and language) where
it will be used and sold. - Typically, this involves the translation of the
user interface (the messages a program presents
to users) to enable them to create documents and
data, modify them, print them, send them by
e-mail, etc.)
51LOCALIZATION what it includes
- Focal points of internationalization and
localization efforts include - Language
- Computer-encoded text
- Alphabets/scripts different systems of numerals
left-to-right script vs. right-to-left scripts.
Most recent systems use the Unicode to solve many
of these character encoding problems. - Graphical representations of text (printed
materials, online images containing text) - Spoken (Audio)
- Sub-titles for video
- Date/time format, including use of different
calendars - Formatting of numbers (decimal points,
positioning of separators, character used as
separator) - Time zones (UTC in internationalized
environments) - Currency
- Images and colors issues of comprehensibility
and cultural appropriateness - Names and titles
- Government assigned numbers (such as the Social
Security number in the US, National Insurance
number in the UK) and passports - Telephone numbers, addresses and international
postal codes - Weights and measures
- Paper sizes
- Differences between local standards (e.g. YU ISO
or JUS) and international standards (ISO)
52LOCALIZATION vs. INTERNATIONALIZATION
- The distinction between internationalization and
localization is subtle but important - Internationalization is the adaptation of
products for potential use virtually everywhere,
while - localization is the addition of special features
for use in a specific locale. - The processes are complementary, and must be
combined to lead to the objective of a system
that works globally.
53CAT tools for localization
- Over the last couple of years, in addition to
general-purpose TM tools such as Trados and
Transit, translation technology companies also
developed a number of TM tools specially designed
for localization - Alchemy CATALYST
- PASSOLO
- Sisulizer
SISULIZER is currently the industry standard
localization tool, so thats the one in which we
will work!!!
54SISULIZER a screenshot
55Other CAT tools (non-TM based)
- As we said earlier, computer-assisted translation
(CAT) is a broad and somewhat imprecise term
covering a range of tools, from the fairly simple
to the more complicated, which can include - Word processors, grammar and spell checkers,
terminology managers, eBooks, eDictionaries,
full-text search tools, concordancers, web, TM
tools, bitexts, etc.
56CAT - REFERENCE MATERIALS
- Reference materials are the primary source of
terminology in absence of translation memory. - Computer-based reference materials can be
classified into - Online libraries
- Specialized web resources
- Specialized software products
- Other materials in electronic formats
57Online Libraries
- Large collections of books in electronic form,
e.g. - eBrary (new scientific books, pay site)
- Internet Archive (hosting A Million Book
Project) - Project Gutenberg (PD fiction books, free)
- Questia (popular titles fiction and
non-fiction, pay site some sections free)
58Internet Archive
59eBrary
60Questia
61Questia
62Specialized web resources
- Online glossaries
- e.g. http//www.lai.com/glossaries.html
- Online terminology databases
- e.g. EURODICAUTOM
- Acronym dictionaries
- e.g. www.acronymfinder.com
- Online dictionaries
- e.g. www.thefreedictionary.com
- Online corpora (e.g. BNC and COCA)
63Online glossary language automation glossary
index
64Online terminology databases - EURODICAUTOM
65Online terminology databases - EURODICAUTOM
66Acronym dictionary www.acronymfinder.com
67Online dictionary www.thefreedictionary.com
68BNC British National Corpus
69BNC British National Corpus
70(No Transcript)
71COCA Corpus Of Contemporary American English
COCA Corpus Of Contemporary American English
72(No Transcript)
73Specialized software products
- Various programs that can be used for terminology
extraction - Electronic dictionaries
- General monolingual e.g. OED v3
- Specialized monolingual e.g. Cambridge
Pronouncing Dictionary, Collins Collocations - Bilingual e.g. Morton Benson, MidiDict
- Electronic Bible (e.g. e-Sword)
- Concordance programs (e.g. Concordancer)
- Data-mining programs (e.g. Summarizer Pro)
74Electronic dictionaries - OED
75Electronic Bible - e-Sword
76Concordancers
- Make it possible to see a word in context
- Useful for finding collocations and phrases
- Useful for extracting terminology
- Two types
- Monolingual concordancers (e.g. WordSmith)
- Polylingual concordancers (e.g. ParaConc)
77Monolingual Concordancer
78Parallel Concordancer
79Intellexer Summarizer Pro
80Intellexer Summarizer Pro
81THE END