Name the components that will be implemented: - PowerPoint PPT Presentation

1 / 4
About This Presentation
Title:

Name the components that will be implemented:

Description:

The languages covered are Hindi, Marathi, Sanskrit, Gujarati, ... the additional matras, vowels and consonants that are missing between the language pair. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 5
Provided by: tdilM
Category:

less

Transcript and Presenter's Notes

Title: Name the components that will be implemented:


1
Title Transliteration Module
Proposer M. S. Sridhar.
Institution Cyberscape Multimedia Limited.
Language/Language pair Any Language to Any
Language The languages covered are Hindi,
Marathi, Sanskrit, Gujarati, Punjabi, Bengali,
Oriya, Assamese, Tamil, Kannada, Telugu,
Malayalam, Romanised Diacritical fints and
English.
Name the components that will be
implemented Universal Transliteration Engine
2
Language/Language pair Between any Indian
Language
Name the component Universal Transliteration
Engine
List the technique(s) that will be used This
will keep ISCII as the base system for coding and
Devnagari as the base language. The approach will
be universal and this component can give a proper
transliterated output in any target
language. This will be made font independent and
can handle a variety of source and target fonts
including Unicode. Special pre processing and
post processing algorithms will handle
exceptional conditions. E.g. The implied halant
at the end of a word in North Indian scripts
should be converted to an explicit halant for the
Dravidian Languages. Kamal kamala, Anil Anila
etc. Special algorithms to handle the additional
matras, vowels and consonants that are missing
between the language pair. Short ay matra found
in dravidian languages, special LA found in Tamil
and Malayalam, special NA found in Tamil, special
Ja found in Oriya and Bengali are
examples. English will also be considered as an
Indian script and by that method we can achieve
transliteration between any two of the indian
languages including English. The component will
deliver the target script in any font encoding
including Unicode.
3
What is the performance of these techniques in
other languages? The method employed will produce
better results in an automated mode. With a
little exception logic to handle the
peculiarities of every language the accuracy can
be increased and final accuracy of 95 can be
obtained.
Give an estimate of the expected performance -
It can be 99 among the languages of a group
(dravidian languages or eastern languages
etc.). It will be 95 and above for the other
combinations.
Name the domain for which the performance will be
optimized The domains will be selected in the
order of importance and of commercial interest.
The major domains are Government administration,
Health and Medicine, Religion, Politics, Sports,
Science and Technology etc. The output thus
generated can be used in any font that is
commercially available. The output generated can
also be in Unicode.
4
A Typical Transliteration from Hindi to Oriya.
Write a Comment
User Comments (0)
About PowerShow.com