IIT Kharagpur - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

IIT Kharagpur

Description:

Title : Machine Translation among Indian languages, English to Bengali ... Grapheme-to-Phoneme mapper. Evaluation metrics: Usability. Coverage ... – PowerPoint PPT presentation

Number of Views:1265
Avg rating:3.0/5.0
Slides: 12
Provided by: Spe553
Category:

less

Transcript and Presenter's Notes

Title: IIT Kharagpur


1
IIT Kharagpur
Components
  • Project proposal
  • Machine Translation

2
Components
  • Title Machine Translation among Indian
    languages, English to Bengali
  • Proposer Anupam Basu, Pabitra Mitra, Sudeshna
    Sarkar
  • Institution IIT Kharagpur
  • Language Bengali
  • Name of Components that will be implemented
  • Morphological Synthesis Engine
  • Annotation Standards
  • Components for Bengali
  • POS Tagger
  • Named entity Recognizer
  • Local Word Grouper (chunker)
  • Morphological Analyzer
  • Word Generator
  • Sentence Generator
  • Proposed Domain
  • Bengali News corpus.

3
Morphological Synthesis Engine
  • Language Generic engine (Horizontal)
  • Name of Component Morphological Synthesis Engine
  • Techniques Used
  • Using Combination of Language Specific rules and
    Paradigm Tables
  • Evaluation metrics
  • Usability
  • Coverage

4
Annotation Standard
  • Language All
  • Annotation Standards For
  • Part-of-Speech, Chunking and Named Entity tags
  • Final Size of the Tag sets
  • Part-of-Speech 25 40, Chunking 5 20, Named
    Entity 10 - 30
  • Average size of such Tag Sets in Other Languages
  • Same as above
  • Estimation of the Expected Size ( Pert Chart)

All the standard tag sets are to be designed
within first 2 months
  • Evaluation Metrics
  • Usability
  • Coverage

CEL, IIT Kharagpur
5
POS Tagger for Bengali
  • Language Bengali
  • Name of Component Bengali Part-of-Speech Tagger
  • Techniques Used
  • Bi-gram Hidden Markov Model
  • Semi-Supervised Learning.
  • Morphology driven transformation based learning
    for unknown word handling.
  • Performance of Techniques in other Languages
  • Bi-gram Hidden Markov Model 97-98 for English
  • Estimate of expected Performance (PERT Chart)
  • Evaluation metrics
  • Sentence/word level Accuracy.
  • Known/Unknown word Accuracy

6
Named Entity recognizer for Bengali
  • Language Bengali
  • Name of Component Bengali Named-entity
    recognizer
  • Techniques Used
  • Maximum Entropy model, Conditional Random Field
  • Performance of Techniques in other Languages
  • Precision 90-95 for English
  • Estimate of expected Performance (PERT Chart)
  • Evaluation metrics
  • Precision and Recall
  • F-measure

T3 month
T3 month
T3 month
T3 month
60
70
75
80
T 6 months
90
7
Bengali Local Word Grouper
  • Language Bengali
  • Name of Component Bengali Local Word Grouper
  • Techniques Used
  • Feature Structure Unification using greedy
    Algorithm
  • Statistical Chunking
  • MWE handling
  • Performance of Techniques in other Languages
  • LWG accuracy 90-95 for English
  • Estimate of expected Performance (PERT Chart)
  • Evaluation metrics
  • F-Score Harmonic mean of Precision and Recall

8
Morphological Analyzer for Bengali
  • Language Bengali
  • Name of Component Bengali Morphological Analyzer
  • Techniques Used
  • Backward traversal of a word along a DAG
    structure
  • Performance of Techniques in other Languages
  • Morphological Analyzer Accuracy 97-98 for
    English
  • Estimate of expected Coverage (PERT Chart)
  • Evaluation metrics
  • Coverage
  • Correctness

9
Word Generator for Bengali
  • Language Bengali
  • Name of Component Bengali Word Generator
  • Techniques Used
  • Using Combination of rules, Paradigm Tables.
  • Performance of Techniques in other Languages
  • NA
  • Estimate of expected Coverage (Pert Chart)
  • Evaluation metrics
  • Understandability
  • Quality
  • Completeness

10
Transliteration (Hindi Bengali)
  • Language Bengali to Hindi, Hindi to Bengali
  • Name of Component Transliteration
  • Techniques Used
  • Character Trigram Substitution
  • Grapheme-to-Phoneme mapper
  • Evaluation metrics
  • Usability
  • Coverage

11
Sentence Generator for Bengali
  • Language Bengali
  • Name of Component Sentence Generator for Bengali
  • Techniques Used
  • Grammar Rule Based
  • Evaluation metrics
  • Translation Quality (As judged by native speaker)
  • Understandability
  • Stylistics
Write a Comment
User Comments (0)
About PowerShow.com