CSA405: Advanced Topics in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

CSA405: Advanced Topics in NLP

Description:

MT systems are machines, and buying an MT system should be very much like buying a car. ... Il y a une dizaine d'annees, on croyait que les pays industrialises etait ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 41
Provided by: michael307
Category:
Tags: nlp | advanced | csa405 | topics

less

Transcript and Presenter's Notes

Title: CSA405: Advanced Topics in NLP


1
CSA405 Advanced Topicsin NLP
  • Machine Translation I
  • Introduction to MT

2
Outline
  • MT Machine Translation
  • Why MT is important
  • What MT is and why MT is difficult
  • MT and the Human Translator

3
Why Machine Translation is Important
4
Implications of Multilinguality
5
Commerical Interest
  • US has invested in MT for intelligence purposes
  • MT is popular on the web - the most ued of
    Google's special features
  • EU spends more that 1B per annum on translation

6
Academic Interest
  • Different NL technologies include
  • parsing
  • generation
  • morphology
  • pronoun resolution
  • understanding ...

7
Misconceptions about MT
  • MT is a waste of time because
  • you will never make a machine that can translate
    Shakespeare.
  • the quality  of translation you can get from an
    MT system is very low
  • MT threatens the jobs of translators.
  • MT systems are machines, and buying an MT system
    should be very much like buying a car.

8
Facts about MT
  • There are many situations where the ability to
    produce reliable, if less than perfect,
    translations at high speed is valuable.
  • MT systems can take over some of the boring,
    repetitive translation jobs and allow human
    translation to concentrate on more interesting
    specialist tasks.
  • Building an MT system is an arduous and time
    consuming job, involving the construction of
    grammars and very large monolingual and bilingual
    dictionaries.

9
The Place for MT
  • Human Translators are good at
  • Getting the right turn of phrase
  • Preserving translation equivalence
  • Human Translators are bad at
  • Dictionary look-up
  • Consistency of translation
  • Translation of terminology
  • MT can exploit these weaknesses

10
Summary
  • MT is important because
  • There are too few human translators
  • Availability of materials in appropriate language
    has significant economic consequences.
  • Scientifically, it is still one of the best test
    areas for language technology

11
Why Translation is Difficult
12
What Makes MT Hard
  • Style and Meaning
  • Word Order
  • Word Sense
  • Pronouns
  • Tense
  • Idioms

13
Style and Meaning
  • As recently as a decade ago it was widely
    believed that infectious disease was no longer
    much of a threat in the developed world. The
    remaining challenges to public health there, it
    was thought, stemmed from noninfectious
    conditions such as cancer, heart disease and
    degenerative diseases.
  • Il y a une dizaine dannees, on croyait que les
    pays industrialises etait debarasses des risques
    lies aux maladies infectieuses et que la sante
    publique netait menacee que par des maladies
    comme le cancer, les troubles cardiaques, et les
    anomolies genetiques

14
Style and Meaning
  • English
  • Two sentences
  • infectious disease was no longer much of a threat
    in the developed world
  • The remaining challenges to public health there
  • noninfectious conditions
  • French
  • One sentence
  • les pays industrialises etait debarasses des
    risques lies aux maladies infectieuses
  • la sante publique netait menacee que
  • maladies

15
Different word orders
  • English word order is subject - verb - object
  • Japanese order is subject - object - verb
  • English IBM bought Lotus
  • Japanese IBM Lotus bought
  • English Reporters said IBM bought Lotus
  • Japanese Reporters IBM Lotus bought said

16
Word Sense Ambiguity
  • Bank as in river
  • Bank as in financial insitution
  • Plant as in tree
  • Plant as in factory
  • Different senses usually translate into different
    words

17
Hutchins Somers (1992)
18
Problems Contextual Interpretation
OPEN
19
Different Cultural Models
English Health Insurance German Krankenversiche
rung French Assurance Maladie
English validate French obliterer
20
Differences in Marking of Semantic Information
  • Head marking.
  • In English possessive relation is marked on the
    head The man's house
  • In Hungarian it is marked on the dependentThe
    man house-his
  • his house / sa maison
  • Direction and manner of motion marking
  • He ran into the room (English)
  • He entered the room running (French)

21
Summary
  • Translation is about more than equivalence of
    meaning.
  • Translation may involve the resolution of
    ambiguity.
  • Preservation of intention involves cultural
    background as well as linguistic knowledge.
  • Translation is a hard problem for humans let
    alone machines.

22
Similarities and Differences Between Languages
  • Differences
  • Morphology
  • Word order and syntactic structures
  • Marking of semantic distinctions
  • Lexical
  • Similarities
  • Communicative function for survival
  • Mechanisms for reference to people, eating,
    politeness, time.
  • Syntactic complexity
  • Nouns
  • Verbs

23
Machine Translation and Human Translators
24
In the Beginning ....was the dream of FAMT
  • Fully Automatic (High Quality) Machine
    Translation (Bar Hillel 1960)

Source Language text
TargetLanguage text
FAHQMT
25
FAMT
  • Basic Charactistics
  • No human intervention
  • Arbitrary text
  • Evaluation Criteria
  • Quality of ouput
  • Cost (/page)
  • Speed (pages/hour)

26
FAMT Success StoryTAUM METEO
  • Written by Chevalier et al. 1978.
  • Translation of weather reports from English to
    French
  • Highly constrained subset of English
  • Small number of senses for each word
  • Restricted syntactic constructions
  • System determines whether a given sentence is
    within its capabilities
  • Very fast, very accurate, no post-editing

27
FAMT MORAL
  • FAMT can work well but only if we give up one or
    more of the goals e.g.
  • Unrestricted text input
  • High quality translation
  • This observation has lead to research on
    sub-languages
  • And to the use of FALQT

28
FAMT is not the only way
  • FAMT lies at one extreme of a continuum of ways
    in which technology can be brought to bear upon
    the translation problem
  • At the other extreme there are word processing
    software, fax machines, and even mobile phones
  • Between these two extremes there are other points
    of interest where technology can radically affect
    the productivity of the individual translator.

29
MAHT and HAMT
  • Machine Aided Human Translation (MAHT)
  • Human Aided Machine Translation (HAMT).
  • The essential difference between these two lies
    not only in the way in which the person is
    involved but also in the extent of their
    involvement

30
MAHT - Translation Memories
  • Systems consist of a database in which each
    source sentence of a translation is stored
    together with the target sentence (this is called
    a translation memory "unit")
  • Any new source sentences will be searched for in
    the database and a match value is calculated.
  • When the match value is 100, the translation of
    the source sentence from the database is inserted
    into the text being translated.

31
MAHT - Translation Memories
  • If the match value is below 100 and above a
    certain user-definable percentage (i.e., "fuzzy
    match"), the old translation will be inserted as
    a translation proposal for the translator to
    review and edit.
  • Sentences with match values below that margin
    have to be translated from scratch.
  • New and changed translation proposals will then
    be stored in the database for future use.

32
MAHT - Translation Memories Advantages
  • Avoid redoing translation of repeated material
  • Use previous texts as a model for new
    translations
  • Ensure consistency throughout a translation

33
MAHT - Translation Memories - Drawbacks
  • If terminology changes between projects the
    content of a TM needs to be updated to reflect
    these changes.
  • Blind faith in exact matches (without validation)
    can generate incorrect translation since there is
    no verification of the context where the new
    segment is used compared to where the original
    one was used.

34
MAHT - Translation Memories - Remarks
  • Translation Process TM tools may not easily fit
    into existing translation or localization
    processes work best where work can be signed off
    in pieces rather than as a whole.
  • Customisation rarely works straight out of the
    box. Menu adaptation, filters to desktop
    applications may require significant effort.
  • Investment costs are high
  • Setup and maintenance of TMs has to factored in.
  • OpenTag/TMX formats for exchanging TM data
    between competing systems

35
MAHT Other Technology
  • Communication/coordination amongst translators
  • Integration of internet technologies and web
    services.
  • Database technology, smart indexing, and
    networking
  • Improvements can be achieved that are well within
    the scope of current technology.

36
HAMT Human Assisted Machine Translation
  • Machine retains the initiative but works in
    collaboration with human consultant.
  • System translates autonomously until it
    recognises that a linguistic difficulty of a
    certain type has arisen, e.g.
  • ambiguity
  • pronoun reference
  • unknown word
  • unrecognised construction
  • At this point it seeks help from the consultant.

37
HAMT Challenges
  • Reliable identification/classification of
    difficulty.
  • Reliable communication of difficulty to user.
  • Tradeoff between quality and scope of
    translation.

38
HAMT - Advantages
  • Modulo challenges a high quality of translation
    can be guaranteed.
  • Speed if large sections of text can be
    translated automatically.
  • Human consultant need not necessarily have all
    the skills of a human translator native
    competence in one or both languages may suffice.

39
Summary
  • Machine Translation is a continuum
  • FAMT
  • HAMT
  • MAHT
  • The utility of a given type of system cannot be
    assessed with very simple criteria
  • Utlility function involves at least the human
    cost, the machine cost, the quality of the
    result, and the nature of the translation
    requirements.

40
Some References
  • Jonathan Slocum, Machine Translation its
    History, Current Status, and Future Prospects,
    Proc ACL 1984, Stanford University,
    http//acl.ldc.upenn.edu/P/P84/P84-1116.pdf
  • Martin Kay Machine Translation, Computational
    Linguistics vol 11 numbers 2-3 1985.
  • Richard Kittredge Sublanguages, Computational
    Linguistics vol 11 numbers 2-3 1985.
Write a Comment
User Comments (0)
About PowerShow.com