Development of a German-English Translator - PowerPoint PPT Presentation

About This Presentation
Title:

Development of a German-English Translator

Description:

Sort by number for English grammar. Full run of program ... Rearranged to English structure: ... Grammar can only be rearranged in one specific way ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 13
Provided by: tjh5
Category:

less

Transcript and Presenter's Notes

Title: Development of a German-English Translator


1
Development of a German-English Translator
  • Felix Zhang
  • Period 5
  • 2007-2008
  • Thomas Jefferson High School for Science and
    Technology Computer Systems Research Lab

2
Summary of Quarter 2
  • NP Chunking
  • Lemmatization
  • Dictionary Lookup
  • Inflection
  • Noun-verb agreement

3
Scope for this quarter
  • Focus less on statistical methods
  • Get rudimentary grammar system working
  • Fix all the bugs Ive made since September

4
New and Modified Components
  • More info stored in NP chunking
  • Better noun-verb agreement
  • Grammar
  • Element Assignment
  • Priority Number Assignment

5
Noun-verb agreement
  • Simple method to eliminate more ambiguities
  • def eliminateother(attribs, sub, closest)
  • for x in attribs
  • if x01 "nou" and x ! sub
  • for y in x1
  • if y0 "nom"

  • attribsattribs.index(x)1.remove(y)?
  • return attribs

6
Noun phrase chunking
  • Now used for English sentences
  • Stores more info for later methods
  • the man make the children
  • NP Chunked English 'the', 'art', 'man',
    'nou', 'akk', 'mas', 'dat', 'pl',
    'make', 'ver', '3', 'pl', 'pres', 'the',
    'art', 'small', 'adj', 'child', 'nou',
    'nom', 'pl'

7
Element Assignment
  • Based on linguistic information
  • If case is nominative, chunk is subject
  • If accusative, chunk is direct object
  • 'the', 'art', 'man', 'nou', 'akk', 'mas',
    'dat', 'pl', 'dobj', 'make', 'ver', '3',
    'pl', 'pres', 'mverb', 'the', 'art',
    'small', 'adj', 'child', 'nou', 'nom',
    'pl', 'sub'

8
Priority Assignment
  • Each sentence element is assigned priority number
  • Based on position in English sentence
  • Assignments
  • sub 1
  • mverb 2
  • auxverb 3
  • iobj 4
  • dobj 5
  • Sort by number for English grammar

9
Full run of program
  • input den Mann machen die kleinen Kinder
  • The small children make the man
  • fzhang_at_ltsp1 /research python proj.py
  • Part of speech tags 'den', 'art', 'Mann',
    'nou', 'machen', 'ver', 'die', 'art',
    'kleinen', 'adj', 'Kinder', 'nou'
  • Morphological analysis 'Mann', 'nou',
    'akk', 'mas', 'dat', 'pl', 'machen',
    'ver', '1', 'pl', '3', 'pl', 'pres',
    'kleinen', 'adj', 'nom', 'pl', 'akk',
    'pl', 'Kinder', 'nou', 'nom', 'pl',
    'akk', 'pl'
  • Disambiguated after noun-verb agreement
    'Mann', 'nou', 'akk', 'mas', 'dat',
    'pl', 'machen', 'ver', '3', 'pl',
    'pres', 'kleinen', 'adj', 'nom', 'pl',
    'akk', 'pl', 'Kinder', 'nou', 'nom',
    'pl'
  • Lemmatized 'Mann', 'Mann', 'Man',
    'machen', 'machen', 'kleinen', 'klein',
    'Kinder', 'Kind'
  • Root translated 'den', 'the', 'Mann',
    'man', 'machen', 'make', 'die', 'the',
    'kleinen', 'small', 'Kinder', 'child'
  • NP Chunked English 'the', 'art', 'man',
    'nou', 'akk', 'mas', 'dat', 'pl',
    'make', 'ver', '3', 'pl', 'pres', 'the',
    'art', 'small', 'adj', 'child', 'nou',
    'nom', 'pl'
  • Inflected (only works before chunking)
  • 'the', 'the' 'man', 'akk', 'mas', 'man'
    'man', 'dat', 'pl', 'mans' 'make', '3',
    'pl', 'make' 'the', 'the' 'small', 'small'
    'child', 'nom', 'pl', 'childs'
  • Assigned an element type
  • 'the', 'art', 'man', 'nou', 'akk', 'mas',
    'dat', 'pl', 'dobj', 'make', 'ver', '3',
    'pl', 'pres', 'mverb', 'the', 'art',
    'small', 'adj', 'child', 'nou', 'nom',
    'pl', 'sub'
  • Assigned priority
  • '5', 'the', 'art', 'man', 'nou', 'akk',
    'mas', 'dat', 'pl', 'dobj', '2', 'make',
    'ver', '3', 'pl', 'pres', 'mverb', '1',
    'the', 'art', 'small', 'adj', 'child',
    'nou', 'nom', 'pl', 'sub'
  • Rearranged to English structure
  • '1', 'the', 'art', 'small', 'adj',
    'child', 'nou', 'nom', 'pl', 'sub', '2',
    'make', 'ver', '3', 'pl', 'pres', 'mverb',
    '5', 'the', 'art', 'man', 'nou', 'akk',
    'mas', 'dat', 'pl', 'dobj'

10
Problems
  • Ambiguities (again)?
  • One ambiguity can change the entire structure of
    the sentence
  • I gave a horse the hat vs. I gave the hat a
    horse
  • Attempt at all permutations possible
  • User disambiguation

11
Problems
  • Inflexible
  • Grammar can only be rearranged in one specific
    way
  • Subject Main verb Indirect Direct
    Auxiliary Verb
  • Does not accommodate for prepositions,
    conjunctions, etc.

12
Future research
  • Implement more statistical methods
  • Morphological info
  • Actual translation bilingual corpus
  • Create better parse tree Dependency grammar
Write a Comment
User Comments (0)
About PowerShow.com