Transcription of names written in Farsi into English - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Transcription of names written in Farsi into English

Description:

Took names from Persian Wikipedia. Variety of origins. 62 ... Gather lists of name (Dictionary, Wikipedia, web) Use names to create a Finite State Transducer ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 26
Provided by: joshuaj7
Category:

less

Transcript and Presenter's Notes

Title: Transcription of names written in Farsi into English


1
Transcription of names written in Farsi into
English
  • Joshua Johanson
  • Inxight Software/Business Objects

2
Why Transcription?
  • ?????
  • mhmdi
  • transliteration
  • Mohammadi
  • transcription

3
Standardize
Dictionary (6750 names) Mohammadi ?????
?????
?????
Mohammadi
Orthographic (9355)
Reza Ali Mohammadi
mhmdi
Mohammadi
mohammadi
Basic 8 mhmdi
Mhmdi
Phonetic (13,758) frnsisku fræns?sko? Francisco
????????
Francisco
4
Standardize Input
  • Arabic letters are converted into Persian letters
  • ? - ?
  • ? -?
  • Nonstandard encodings are standardized
  • ?, ??? - ??
  • Convert to UTF-16
  • Spacing and non-joiners are dealt with in the
    dictionary

5
Dictionary Lookup
6
Create Dictionary
  • Web sites - such as yellow pages and baby name
    lists.
  • Account for spacing
  • Hand translated by Shahla Fahimi
  • Websites, internal lexicons, sample extractions

7
Dictionary Lookup
  • Look for complete names
  • Check for variance in diacritics
  • Only makes a difference in 64 names
  • If not one of 64 names, ignore
  • Check for possible affixes

8
  • Prefix
  • ??? Pour
  • ??? Pir
  • ????? Piruz

Root ?????Ghorabi ?????Ghoraie ?????Ghorban
??????????
Ghorbannejad
Suffix ???? Nejad ??? Neyat ????? Nezhadi
9
Testing
  • Took names from Persian Wikipedia
  • Variety of origins
  • 62 of names were found by the dictionary

10
Orthographic Comparison
11
Orthographic Comparison
  • Gather lists of name (Dictionary, Wikipedia, web)
  • Use names to create a Finite State Transducer
  • Allow possible short vowels and doubled letters
  • Allow all ambiguous combinations
  • Special sequences - ??? (Kha), ?? (U), ?? (I)
  • Check for joined names, like ??????? (Mohammad
    Reza).

12
n
a
y
?
?
?
o
K
Ø
Koyan, Kian
????
sh
?
i
?
a
n
a
?
?
13
Testing
  • Took names from dictionary
  • Gave the final answer correct 76 of time
  • Found a right answer 83 of the time
  • Gave wrong answer 13 of time

14
Phonetic Comparison
15
Phonetic Comparison
  • Reverse transcription for names that have been
    transcribed into Farsi from another language
  • Transcription can be based on pronunciation.
  • Michael - ????? (maykl)
  • Transcription can be based on spelling
  • Graham, Holmes
  • Alexander
  • Rider/Writer

16
Methodology
  • Use names from 1990 US Census
  • Names from a variety of origins
  • More resources for American names
  • Use CMU pronunciation dictionary for
    pronunciation of names
  • Convert to closest Persian pronunciation
  • Create FST similar to Orthographic comparison

17
Persianization
  • Split diphthongs
  • They can be written with two vowels, even if only
    one is used in English.
  • Convert non-existent sounds to closest existing
    sounds
  • Like /?/ and /w/
  • /?/ is not written in Persian convert to either
    t or d.
  • /z?/ (voiceless z) can be /s/ or /z/
  • Add ? before names beginning with an s followed
    by a consonant

18
?
l
?
?
p?l Paul pa??lPowell pæv?lPavel
Paul Powell Pavel
??
p
?
a?
l
????
??
?
?
Ø
æ
?
v
?
l
?
?
Ø
19
(No Transcript)
20
Testing
  • Took list of 2190 foreign names
  • Not all names were American
  • Found answer for 1337 names

21
Basic Transcription
22
Standard Transcription
  • Most of the work was done by Karine Megerdoomian
  • I added diacritics
  • Most letters and diacritics have a direct,
    unambiguous transcription
  • Short vowels are not guessed
  • Intelligent guess is made for ambiguous characters

23
(No Transcript)
24
(No Transcript)
25
Questions?
Write a Comment
User Comments (0)
About PowerShow.com