WebBased Machine Translation - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

WebBased Machine Translation

Description:

In 2000, MT specialist Scott Bennett said 'Altavista's BabelFish ... Babelfish: We have just finished reading this book ... Babelfish, 2nd sentence pair: ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 58
Provided by: away1
Category:

less

Transcript and Presenter's Notes

Title: WebBased Machine Translation


1
Web-Based Machine Translation
  • Andy Way
  • School of Computing
  • Email away_at_computing.dcu.ie
  • URL www.computing.dcu.ie/away
  • Room L245
  • Phone (700)5644

2
Plan of Attack (1)
  • What is MT?
  • Why do we do it? How much is it used? How much
    more could it be used?
  • Is it any good? What exactly is it good for? What
    is it not good for?
  • What MT methods are there?
  • Do on-line MT systems translate word-for-word?
    How might we be able to tell?

3
Plan of Attack (2)
  • Do pairs of on-line MT systems work the same in
    both directions?
  • How can we help these MT systems help us?
  • The Future (?!)
  • Further Reading/More Information

4
What is MT?
  • MT FAHQMT
  • MAHT (on-line dictionaries, termbanks, TM etc )
  • CAT
  • HAMT (resolving ambiguity etc )

5
Why do we do MT?
  • To communicate in other languages than the ones
    we know
  • (If were a company) To increase/maintain market
    share
  • To speed up the translation process
  • etc etc ...

6
How much is it used?
  • In 2000, MT specialist Scott Bennett said
    Altavista's BabelFish ... initiated in late
    1997, is now used a million times per day.
  • In 2001, Softissimo announced that the Internet
    translation request volume processed by its
    Reverso translation engine (www.reverso.net) has
    now reached several million translation requests
    (of Web pages, e-mail, short texts and results of
    search engine requests) per month on its mail
    translation portal and the portals of its
    Internet partners.
  • V.d. Meer (2003) "Every day, portals like
    Altavista and Google process nearly 10 million
    requests for automatic translation."

7
How much more could it be used?
  • Volume of text required to be translated
    currently exceeds translators capacity (demand
    outstrips supply). This imbalance will only get
    worse, cf. accession of new Member states in EU.
  • NB, also Official Languages Act 2003
  • ?Solution automation (the only solution).

8
How much more could it be used?
  • translation and localisation industry have
    focussed on product documentation which
    represents probably less than 20 of all
    text-based information repositories that need to
    be localised
  • time five times the volume of text needs to be
    translated in practically no time.
  • ?Corporate decision makers will have to begin
    supporting multilingual communication initiatives
    and strategies.

9
How much more could it be used?
  • GIL market growing from 4.2 billion in 2001 to
    8.9 billion in 2006, an annual growth rate of
    16.3. Localisation and translation services form
    by far the largest part of this market with 69.8
    of the total, i.e. 2.9 billion in 2001 and 5.8
    billion in 2006, an annual growth rate of 14.6.
  • W.r.t. crosslingual applications, expected to
    grow from less than 1 of the total market in
    2001
  • (42 million) to 193 million in 2006, 35
    annual growth.

10
Is MT any good? (1)
  • Depends what you want to use it for and how you
    use it!!

Cost
Input
MT
Output
11
Is MT any good? (2)
  • No pre-editing ? Lots of post-editing!
  • Lots of pre-editing ? No(t much) post-editing!
  • GARBAGE IN, GARBAGE OUT!!!

12
Is MT any good? (3)
  • Sometimes no pre-editing is required
  • for gisting
  • for company-internal circulation
  • etc etc
  • What its not good for is literary translation,
    i.e. wont take translators jobs - will free
    them up for new (more interesting) tasks and
    create new niche markets

13
(No Transcript)
14
(No Transcript)
15
MT Developers
  • So MT is of use, and will become used much more
    than it is currently, so
  • we need people out there who can improve
    current systems and develop new ones.
  • ? lets look at how people currently design MT
    systems

16
MT Methods
  • MT
  • Rule-Based MT Data-Driven MT
  • Transfer Interlingua EBMT SMT

17
The Vauquois Pyramid for MT
  • Interlingua
  • Analysis Transfer
    Generation
  • _source Direct _target

18
Examples of MT methods Transfer
  • English SVO, Irish VSO, Japanese SOV. So
    translation between them is complicated by facts
    about word order.
  • But at a deeper level, the languages are more
    similar ...

19
Transfer (contd)
  • e.g. John saw Mary?Chonaic Seán Máire
  • S S
  • HEAD SUBJ OBJ GOV SUBJ
    OBJ
  • see John Mary feic Seán Máire

20
Examples of MT methods Transfer
  • e.g. John likes Mary ? Marie plaît à Jean
  • (SUBJ) (OBJ) (SUBJ) (IOBJ)
  • Rule like(A1,A2) ? plaire(A2,A1).
  • i.e. arguments are switched.

21
Examples of MT methods Interlingua
  • John likes Mary ? Marie plaît à Jean
  • lexlike/plaire
  • semExperiencer semPatient
  • lexJohn/Jean lexMary/Marie

22
Examples of MT methods EBMT
  • Data-driven, compiles probabilities for
    translations Needs
  • bilingual aligned corpora
  • find best match(es) of _source
  • establish translational equivalents
  • recombine to generate _target.

23
EBMT - translation chunks
  • Sentence aligned
  • The man swims ? Lhomme nage.
  • The woman laughs ? La femme rit.
  • Sub-sententially aligned
  • the man ? Lhomme, swims ? nage, the ? l, man ?
    homme, the ? la, woman ? femme, laughs ? rit ...

24
EBMT deriving translations
  • Lets now translate The man laughs
  • Best matches
  • the man ? Lhomme
  • laughs ? rit
  • Combined together, we get Lhomme rit
  • Great, can you see any problems?! We can fix
    these by looking on the Web

25
Web Validation of Translations
  • Input string the personal computers
  • Chunks retrieved
  • personal computers ? ordinateurs
    personnels
  • the ? le /la/ l/ les
  • Via Altavista, we get
  • Les ordinateurs personnels 980 hits
  • L ordinateurs personnels 0 hits
  • La ordinateurs personnels 0 hits
  • Le ordinateurs personnels 0 hits

26
Examples of MT methods SMT
  • Needs
  • bilingual aligned corpora
  • statistical models of languages and translation.
  • Works by assuming that French is like English in
    a noisy channel, i.e. in code!
  • cf. Speech Processing models!

27
Examples of MT methods Hybridity
  • Rule-based Methods
  • generate good translations (if it works!)
  • encode rule-based phenomena
  • sent(Num) nounphrase(Num),
  • verbphrase(Num).

28
Examples of MT methods Hybridity
  • Statistical Methods
  • are robust
  • can get a lot right automatically
  • dont need specialised linguistic knowledge of
    source, target, and how they relate to one
    another.
  • So lets choose the best bits from each ...

29
Do MT systems translate word-for-word?
  • translate(Head1 Tail1, Head2Tail2)-
  • biling_lex (Head1,Head2),
  • translate (Tail1, Tail2).
  • biling_lex(john,jean).
  • biling_lex(swims,nage).
  • etc etc .
  • Well, the MT systems were using are a black box
    (as opposed to a glass box), so we cant look at
    the rules to tell definitively

30
Translating word-for-word
  • How can we tell then?
  • Compare the input and the output for a suite of
    test sentences and try and work out whats going
    on

31
Translating word-for-word
  • If on-line MT systems did translate
    word-for-word, they would
  • pick the most likely translation of each word
    each time (i.e. no translational variation ever)
  • we could build up the translation of the sentence
    compositionally.
  • Lets see if this is what happens by looking at
    some real systems ...

32
Translating word-for-word
  • Lets translate We have just finished reading
    this book ? French
  • Word-for word we get (from Babelfish)
  • wenous, haveayez, justjuste, finishedfini,
  • readinglecture,thisceci,booklivre
  • Model 0 Translation Nous ayez juste fini lecture
    ceci livre - hopeless!

33
Translating word-for-word
  • Lets give the MT system larger chunks
  • we havenous avons, just finished reading
    lecture finie just, this bookce livre
  • have just finished reading ont juste fini la
    lecture
  • have just this book ont juste ce livre

34
(No Transcript)
35
Translating word-for-word
  • Typing in the whole sentence, we get
  • nous avons juste fini de lire ce livre, not bad!
  • Capitalizing the we and adding a fullstop makes
    no difference to the translation here.
  • Oracle translation nous venons de finir de lire
    ce livre, so you can see Babelfish hasnt done
    too badly here ...

36
Translating word-for-word
  • Lets try another sentence, The thief was kicking
    the policeman
  • Word-for-word we get (from Reverso)
  • thele, thiefVoleur, wasÉtait, kickingcoup de
    pied, policemanpolicier
  • Model 0 Translation le Voleur Était coup de pied
    le Policier, not very good!

37
Translating word-for-word
  • Building the translation up compositionally
  • the thiefLe voleur,
  • was kickingDonnait un coup de pied,
  • the policemanLe policier
  • Final translation Le voleur donnait un coup de
    pied le policier, pretty good!

38
(No Transcript)
39
EN?FR FR ?EN?!
  • That is, do both components use the same rules
    and dictionaries?
  • Are the translation components reversible?
  • Are the structural and lexical rules
    bidirectional?
  • Only one way to find out lets see!

40
EN?FR FR ?EN?!
  • For our 2 strings, we get
  • Babelfish Nous venons de finir de lire ce livre
  • Reverso Nous venons de finir de lire ce livre
  • --------------------------------------------------
    -------------
  • Reverso Le voleur donnait un coup de pied au
    policier
  • Babelfish Le voleur donnait un coup de pied le
    policier

41
EN?FR FR ?EN?!
  • Lets see the pairwise translations. Babelfish
  • We have just finished reading this book ? Nous
    avons juste fini de lire ce livre
  • Nous venons de finir de lire ce livre ?
  • We have just finished reading this book
  • Aha!

42
(No Transcript)
43
EN?FR FR ?EN?!
  • Babelfish, 2nd sentence pair
  • The thief was kicking the policeman ?Le voleur
    donnait un coup de pied le policier
  • Le voleur donnait un coup de pied au policier ?
    The robber gave a kick to the police officer
  • Aha!

44
(No Transcript)
45
(No Transcript)
46
EN?FR FR ?EN?!
  • Reverso, 1st sentence pair
  • We have just finished reading this book ? Nous
    venons de finir de lire ce livre
  • Nous venons de finir de lire ce livre ?
  • We have just stopped reading this book
  • Aha!

47
(No Transcript)
48
(No Transcript)
49
EN?FR FR ?EN?!
  • Reverso, 2nd sentence pair
  • The thief was kicking the policeman ?Le voleur
    donnait un coup de pied au policier
  • Le voleur donnait un coup de pied au policier
    ?The thief kicked the policeman
  • Aha!

50
(No Transcript)
51
(No Transcript)
52
How can we help MT Systems help us?
  • These on-line MT systems are general purpose
    systems. Generally, the problems are so great
    that we will never achieve FAHQMT for such
    language
  • But, we have more chance of success if we
    restrict the sorts of texts with which we
    confront our MT systems ...

53
How to restrict MT Input?
  • By constraining subject domain construct
    sublanguage MT systems, e.g. Météo
  • By constraining the language used, i.e. by using
    controlled languages

54
How can we help MT Systems help us?
  • Update dictionaries/glossaries/rules to the
    domain/text type we need to translate!
  • Savings
  • Time
  • Customisation

55
The Future?
  • More of us will use MT, for more things
  • Itll become (almost as) widely used as web
    browsers
  • Speech to Speech Translation
  • MT for specific websites, documents etc ...
  • ?we need people like you to get interested in MT
    and improve/develop systems!!

56
Further Reading/More Information
  • In the first instance, go to
  • http//www.computing.dcu.ie/away/MT/mt.html
  • Ill add more specific pointers suitable for 1st
    year students soon.

57
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com