Title: WebBased Machine Translation
1Web-Based Machine Translation
- Andy Way
- School of Computing
- Email away_at_computing.dcu.ie
- URL www.computing.dcu.ie/away
- Room L245
- Phone (700)5644
2Plan of Attack (1)
- What is MT?
- Why do we do it? How much is it used? How much
more could it be used? - Is it any good? What exactly is it good for? What
is it not good for? - What MT methods are there?
- Do on-line MT systems translate word-for-word?
How might we be able to tell?
3Plan of Attack (2)
- Do pairs of on-line MT systems work the same in
both directions? - How can we help these MT systems help us?
- The Future (?!)
- Further Reading/More Information
4What is MT?
- MT FAHQMT
-
- MAHT (on-line dictionaries, termbanks, TM etc )
- CAT
-
- HAMT (resolving ambiguity etc )
5Why do we do MT?
- To communicate in other languages than the ones
we know - (If were a company) To increase/maintain market
share - To speed up the translation process
- etc etc ...
6How much is it used?
- In 2000, MT specialist Scott Bennett said
Altavista's BabelFish ... initiated in late
1997, is now used a million times per day. - In 2001, Softissimo announced that the Internet
translation request volume processed by its
Reverso translation engine (www.reverso.net) has
now reached several million translation requests
(of Web pages, e-mail, short texts and results of
search engine requests) per month on its mail
translation portal and the portals of its
Internet partners. - V.d. Meer (2003) "Every day, portals like
Altavista and Google process nearly 10 million
requests for automatic translation."
7How much more could it be used?
- Volume of text required to be translated
currently exceeds translators capacity (demand
outstrips supply). This imbalance will only get
worse, cf. accession of new Member states in EU. - NB, also Official Languages Act 2003
- ?Solution automation (the only solution).
8How much more could it be used?
- translation and localisation industry have
focussed on product documentation which
represents probably less than 20 of all
text-based information repositories that need to
be localised - time five times the volume of text needs to be
translated in practically no time. - ?Corporate decision makers will have to begin
supporting multilingual communication initiatives
and strategies.
9How much more could it be used?
- GIL market growing from 4.2 billion in 2001 to
8.9 billion in 2006, an annual growth rate of
16.3. Localisation and translation services form
by far the largest part of this market with 69.8
of the total, i.e. 2.9 billion in 2001 and 5.8
billion in 2006, an annual growth rate of 14.6. - W.r.t. crosslingual applications, expected to
grow from less than 1 of the total market in
2001 - (42 million) to 193 million in 2006, 35
annual growth.
10Is MT any good? (1)
- Depends what you want to use it for and how you
use it!!
Cost
Input
MT
Output
11Is MT any good? (2)
- No pre-editing ? Lots of post-editing!
- Lots of pre-editing ? No(t much) post-editing!
- GARBAGE IN, GARBAGE OUT!!!
12Is MT any good? (3)
- Sometimes no pre-editing is required
- for gisting
- for company-internal circulation
- etc etc
- What its not good for is literary translation,
i.e. wont take translators jobs - will free
them up for new (more interesting) tasks and
create new niche markets
13(No Transcript)
14(No Transcript)
15MT Developers
- So MT is of use, and will become used much more
than it is currently, so - we need people out there who can improve
current systems and develop new ones. - ? lets look at how people currently design MT
systems
16MT Methods
- MT
- Rule-Based MT Data-Driven MT
- Transfer Interlingua EBMT SMT
17The Vauquois Pyramid for MT
- Interlingua
- Analysis Transfer
Generation - _source Direct _target
18Examples of MT methods Transfer
- English SVO, Irish VSO, Japanese SOV. So
translation between them is complicated by facts
about word order. - But at a deeper level, the languages are more
similar ...
19Transfer (contd)
- e.g. John saw Mary?Chonaic Seán Máire
- S S
- HEAD SUBJ OBJ GOV SUBJ
OBJ - see John Mary feic Seán Máire
20Examples of MT methods Transfer
- e.g. John likes Mary ? Marie plaît à Jean
- (SUBJ) (OBJ) (SUBJ) (IOBJ)
- Rule like(A1,A2) ? plaire(A2,A1).
- i.e. arguments are switched.
21Examples of MT methods Interlingua
- John likes Mary ? Marie plaît à Jean
- lexlike/plaire
- semExperiencer semPatient
- lexJohn/Jean lexMary/Marie
22Examples of MT methods EBMT
- Data-driven, compiles probabilities for
translations Needs - bilingual aligned corpora
- find best match(es) of _source
- establish translational equivalents
- recombine to generate _target.
23EBMT - translation chunks
- Sentence aligned
- The man swims ? Lhomme nage.
- The woman laughs ? La femme rit.
- Sub-sententially aligned
- the man ? Lhomme, swims ? nage, the ? l, man ?
homme, the ? la, woman ? femme, laughs ? rit ...
24EBMT deriving translations
- Lets now translate The man laughs
- Best matches
- the man ? Lhomme
- laughs ? rit
- Combined together, we get Lhomme rit
- Great, can you see any problems?! We can fix
these by looking on the Web
25Web Validation of Translations
- Input string the personal computers
- Chunks retrieved
- personal computers ? ordinateurs
personnels - the ? le /la/ l/ les
- Via Altavista, we get
- Les ordinateurs personnels 980 hits
- L ordinateurs personnels 0 hits
- La ordinateurs personnels 0 hits
- Le ordinateurs personnels 0 hits
-
26Examples of MT methods SMT
- Needs
- bilingual aligned corpora
- statistical models of languages and translation.
- Works by assuming that French is like English in
a noisy channel, i.e. in code! - cf. Speech Processing models!
27Examples of MT methods Hybridity
- Rule-based Methods
- generate good translations (if it works!)
- encode rule-based phenomena
- sent(Num) nounphrase(Num),
- verbphrase(Num).
28Examples of MT methods Hybridity
- Statistical Methods
- are robust
- can get a lot right automatically
- dont need specialised linguistic knowledge of
source, target, and how they relate to one
another. - So lets choose the best bits from each ...
29Do MT systems translate word-for-word?
- translate(Head1 Tail1, Head2Tail2)-
- biling_lex (Head1,Head2),
- translate (Tail1, Tail2).
- biling_lex(john,jean).
- biling_lex(swims,nage).
- etc etc .
- Well, the MT systems were using are a black box
(as opposed to a glass box), so we cant look at
the rules to tell definitively
30Translating word-for-word
- How can we tell then?
- Compare the input and the output for a suite of
test sentences and try and work out whats going
on
31Translating word-for-word
- If on-line MT systems did translate
word-for-word, they would - pick the most likely translation of each word
each time (i.e. no translational variation ever) - we could build up the translation of the sentence
compositionally. - Lets see if this is what happens by looking at
some real systems ...
32Translating word-for-word
- Lets translate We have just finished reading
this book ? French - Word-for word we get (from Babelfish)
- wenous, haveayez, justjuste, finishedfini,
- readinglecture,thisceci,booklivre
- Model 0 Translation Nous ayez juste fini lecture
ceci livre - hopeless!
33Translating word-for-word
- Lets give the MT system larger chunks
- we havenous avons, just finished reading
lecture finie just, this bookce livre - have just finished reading ont juste fini la
lecture - have just this book ont juste ce livre
34(No Transcript)
35Translating word-for-word
- Typing in the whole sentence, we get
- nous avons juste fini de lire ce livre, not bad!
- Capitalizing the we and adding a fullstop makes
no difference to the translation here. - Oracle translation nous venons de finir de lire
ce livre, so you can see Babelfish hasnt done
too badly here ...
36Translating word-for-word
- Lets try another sentence, The thief was kicking
the policeman - Word-for-word we get (from Reverso)
- thele, thiefVoleur, wasÉtait, kickingcoup de
pied, policemanpolicier - Model 0 Translation le Voleur Était coup de pied
le Policier, not very good!
37Translating word-for-word
- Building the translation up compositionally
- the thiefLe voleur,
- was kickingDonnait un coup de pied,
- the policemanLe policier
- Final translation Le voleur donnait un coup de
pied le policier, pretty good!
38(No Transcript)
39EN?FR FR ?EN?!
- That is, do both components use the same rules
and dictionaries? - Are the translation components reversible?
- Are the structural and lexical rules
bidirectional? - Only one way to find out lets see!
40EN?FR FR ?EN?!
- For our 2 strings, we get
- Babelfish Nous venons de finir de lire ce livre
- Reverso Nous venons de finir de lire ce livre
- --------------------------------------------------
------------- - Reverso Le voleur donnait un coup de pied au
policier - Babelfish Le voleur donnait un coup de pied le
policier
41EN?FR FR ?EN?!
- Lets see the pairwise translations. Babelfish
- We have just finished reading this book ? Nous
avons juste fini de lire ce livre - Nous venons de finir de lire ce livre ?
- We have just finished reading this book
- Aha!
42(No Transcript)
43EN?FR FR ?EN?!
- Babelfish, 2nd sentence pair
- The thief was kicking the policeman ?Le voleur
donnait un coup de pied le policier - Le voleur donnait un coup de pied au policier ?
The robber gave a kick to the police officer - Aha!
44(No Transcript)
45(No Transcript)
46EN?FR FR ?EN?!
- Reverso, 1st sentence pair
- We have just finished reading this book ? Nous
venons de finir de lire ce livre - Nous venons de finir de lire ce livre ?
- We have just stopped reading this book
- Aha!
47(No Transcript)
48(No Transcript)
49EN?FR FR ?EN?!
- Reverso, 2nd sentence pair
- The thief was kicking the policeman ?Le voleur
donnait un coup de pied au policier - Le voleur donnait un coup de pied au policier
?The thief kicked the policeman - Aha!
50(No Transcript)
51(No Transcript)
52How can we help MT Systems help us?
- These on-line MT systems are general purpose
systems. Generally, the problems are so great
that we will never achieve FAHQMT for such
language - But, we have more chance of success if we
restrict the sorts of texts with which we
confront our MT systems ...
53How to restrict MT Input?
- By constraining subject domain construct
sublanguage MT systems, e.g. Météo - By constraining the language used, i.e. by using
controlled languages
54How can we help MT Systems help us?
- Update dictionaries/glossaries/rules to the
domain/text type we need to translate! - Savings
- Time
- Customisation
55The Future?
- More of us will use MT, for more things
- Itll become (almost as) widely used as web
browsers - Speech to Speech Translation
- MT for specific websites, documents etc ...
- ?we need people like you to get interested in MT
and improve/develop systems!!
56Further Reading/More Information
- In the first instance, go to
- http//www.computing.dcu.ie/away/MT/mt.html
- Ill add more specific pointers suitable for 1st
year students soon.
57(No Transcript)