Title: Machine Translation and NLP
1Machine Translation and NLP
??? ???? ?????? ??????
- ????? ????
- ???? ??????? ?????
- ????? ?????
2- Machine Translation An Overview
- English to Urdu Translation
- Urdu to English Translation
3??? ???
- Machine Translation
- An Overview
4Natural Language Processing
- Natural language is a term which denotes a
(naturally occurring) human language as opposed
to computer languages and other artificial
languages. - Natural Language Processing is the field of
inquiry concerned with the study and development
of computer systems for processing natural
(human) languages.
5Applications of NLP
- Question Answering Systems
- Chat Bots
- NL Questions to DB or Search Engine
- NL Commands to Computers and other machines
- Information Extraction
- Grammar Checkers
- Machine Translation
- .
6Machine Translation
- Translation of text of one Natural Language to
other Natural Language. - But what does Translation and Human Translation
means?
7Machine Translation Problems
- A word has many Part of Speech.
- A word of same part of speech have many meanings.
- A Grammar that covers all possible sentences of a
Natural Language is not possible. - Grammars of Natural Languages are ambiguous.
- Understanding a sentence requires background
knowledge of World.
8MT Problems (some examples)
- Time Flies like an arrow.
- Flies is noun or verb
- The spirit is willing, but the flesh is weak
- is translated in Russian as
- The vodka is good, but the steak is lousy
-
- Cette personne n'est pas de permanence
aujourd'hui (This person is not on duty today) - is translated in English as
- This person is not any today permanence
9MT Problems (some examples)
- The federal cabinet here on Saturday approved a
new labour policy to meet the challenges of
globalization and emerging technologies, giving
new directions for improvement and guidance in
the labour sector. (Dawn, 22nd September, 2002) -
- Real textual data is more like this text and not
like traditionally quoted sentences like - I have a book.
- He goes to school.
10English to Urdu Translator
11Structure of Translator
- Lexical Module
- Syntax Module
- Transformation Module
12Lexical Module
- Pre Processor
- Detect Proper Nouns
- Convert short forms (dont ? do not)
- Detect abbrevations like etc., mr.
- Tokenizer
- Search Database of words and proper nouns and
generate all possible interpretations of a word.
13Structure of Lexicon
- Word
- Category
- Noun, Pronoun, .
- SubCategory
- Auxillary Verb, Possesive Pronoun,
ToPreposition, .. - Sense
- Human, Animate, Uanimate
14Structure of Lexicon - Contd.
- Form
- Base, First,Second, (for Verb Form) First,
Second,Third (for Person) Comparative,
Superlative, for Adjectives - Number
- Singular, Plural
- Gender
- Masculine, Feminine
- Object Preposition Subject Preposition
- ?? ? ??? ??
15Structure of Lexicon - Contd.
- Object Count
- No of objects required with the verb
- Urdu Meaning
- Meaning for different forms
- Meaning of Adjective and Noun for different forms
of Gender and Number like ??? ??? ??? ???? ?
????? ????? ???
16Syntax Box
- Context Free Grammar of English Language.
- ?
- ? Noun
- ?
- ?
- ? Prep
-
17Some important points of Grammar
- Active and Passive forms of Positive and Negative
sentence are modeled. - Adverbial Phrases coming at beginning, last and
middle of the sentence are modeled. - Infinitive Verb Phrase (to VERB) is modeled.
- ..
18Partial Parsing
- The system use Bottom Up Chart Parser that makes
Partial Parsing possible. Hence it can deal
sentences which have some small error (or the
sentences that are not according to the grammar.) - I know him He lives here.
-
19Transformational Module
- Parse Structure from Syntactical Module is
traversed. - Urdu translation is built by re-arrangement and
inflection of words and phrases.
20Transformational Module (contd.)
- If more than one parses are generated by
Syntactical Module, then it uses Heuristics for
best interpretation. - If Auxiliary Verb is used as Main Verb, it has
negative weight. - If Adjective is used as noun, it has negative
weight - If Verb is used as noun, it has negative weight.
21English and Urdu Comparison
- SVO and SOV
- Order of Words in Phrases
- Many Forms of Adjective and Prepositions
- Many Forms of Verb
- Object Preposition and Subject Preposition
22SVO vs SOV
- English is Subject -Verb-Object Language.
- Hamid writes a letter.
- Urdu is Subject-Object- Verb Language.
- ???? ?? ????? ??
23Order of Words in Phrases
- For English
- ? Prep
- Example of red color.
- For Urdu
- -- Prep
- Example ??? ??? ??
24Many Forms of Adjective and Prepositions
- Blue Book, Blue Books, Blue Pen, Blue Pens
- ???? ????? ???? ??????? ???? ??? ? ???? ???
- Price of Book, Writer of Book
- ???? ?? ????? ???? ?? ????
25Many Forms of Adj and Prep (Contd.)
- Blue Color
- ???? ???
- Book of Blue Color
- (???) ???? ??? ?? ????
- ???? ??? ?? ???? (????)
26Many Forms of Verb
- Rule Based System for Verb Inflection
- Inflection forms of verb (can) depends on
- Tense of Sentence
- Gender, Number and Person of Subject or Object
- Transitive and Intransitive Verb
- Subject Preposition and Object Preposition
27Many Forms of Verb (examples)
- Verb Form depends on Subject (Gender, Number and
Person) and Tense - ???? ???? ?????? ??
- ???? ???? ?????? ??
- Verb Form depends on Object (Gender, Number and
Person) and Tense - ???? ?? ??? ?????
- ???? ?? ???? ?????
- Verb Form Depends on Verb Gender and Tense
- ???? ?? ???? ?? ??? ??
- ???? ?? ???? ?? ??? ??
28Subj Preposition and Obj Preposition
- Used in Past Indefinite Tense having Transitive
Urdu Verb - Commonly ?? is used with Subject and ?? is used
as Object - ?? ?? ?? ?? ?????
- In some cases, other prepositions like ?? can be
used. - ?? ?? ?? ?? ?????.
- Presence and absence of Object Preposition
depends on sense(semantic type) of verb. - ?? ?? ?? ?? ????? (He asked you)
- ? ? ?? ??? ???? ?????(He asked a question)
29Implementation of Translator
- Bottom Up Chart Parsing Framework
- Words in Database
- Grammar Rules in Database
- Transformational Framework
- Depth First Traversal of Parse Structure
- Script a Rule Body (corresponding to each rule in
database) - Can be customized to other NLP problems like
Grammar Checking etc.
30Future Directions
- Improvement in Grammar
- Interrogative Sentences
- Verb Phrases acting as Noun (Example Reading is
good hobby) - ...
- Statistical Disambiguation
- will select a suitable interpretation of word
depending on its adjacent words. - Improvements in Chart parser
- Every Production will have a weight, High weight
elements will be tried first to get quick
results. - Rule base system for preposition
- There is no one-to-one relationship between
English and Urdu Prepositions.
31Urdu to English Translation
32Urdu To English Translation Issues
- SOV and OSV
- Light Verbs
- Noun Phrase Boundary
33SOV and OSV
- Most of the time, Urdu Sentences are in SOV
Form but OSV is also grammatically valid. - ??? ?? ?? ?? ?????
- ?? ?? ??? ?? ?????
34Light Verbs
- Verbs that comes after main verbs.
- ??? ??? ???? ???
- I do work. (Incorrect)
- I work. (Correct)
- ??? ?? ?? ??? ???
- I gave wrote a letter. (Incorrect)
- I wrote a letter. (Correct)
35Noun Phrase Boundary
- ??? ??????? ????? ???? ?? ?????? ?? ??? ???-
- (NP Boundaries is specified by ?? and?? )
- ??? ??????? ????? ???? ?????? ???
- (No Hint for NP Boundaries)