Interlingual word mapping - PowerPoint PPT Presentation

About This Presentation
Title:

Interlingual word mapping

Description:

MTP I Stage Project Presentation Guided by- Presented by-Prof. Pushpak Bhattacharyya Abhijeet Padhye Department of Computer Science and ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 26
Provided by: Abhijee3
Category:

less

Transcript and Presenter's Notes

Title: Interlingual word mapping


1
Interlingual word mapping
  • MTP I Stage Project Presentation
  • Guided by- Presented by-
  • Prof. Pushpak Bhattacharyya
    Abhijeet Padhye
  • Department of Computer Science and Engineering
  • Indian Institute of Technology, Bombay

2
Presentation pathway
  1. Motivation
  2. Introduction
  3. Introduction to Transliteration
  4. Syllables and their structure types
  5. Sonority Theory
  6. Relation between Sonority and Syllables
  7. What is Schwa?
  8. A Sonority theory based Syllabification module
  9. Results obtained
  10. References

3
Motivation
  • Language an integral part of society
  • Each has its specific structure and rules
  • Some basic concepts common to all
  • Helpful in processes like transliteration
    ultimately leading to better CLIR.
  • We are trying to exploit them for process of
    syllabification

4
Problem statement
  • To study some Phonological similarities
    between English, Hindi and Marathi and exploit
    them in order to achieve the goal of
    transliteration with high accuracy so as to be
    able to tackle problems like OOV words during
    Cross-Lingual Information Retrieval.

5
introduction
  • Concepts being emphasized
  • Transliteration
  • Theory of Syllables
  • Sonority Theory
  • Their relation
  • Theory of Schwa Schwa deletion
  • Mainly based on the properties of Sound
  • Driving force behind word pronunciation in any
    language

6
Introduction to transliteraton
  • A process of phonetically translating named
    entities like proper nouns from a source language
    to a target language.1
  • The process of transliteration should be as
    accurate as possible.
  • Faces the problem of multiple variants of words.

7
Proposed transliteration model
8
Basic of syllables
  • Syllable is a unit of spoken language
    consisting of a single uninterrupted sound formed
    generally by a Vowel and preceded or followed by
    one or more consonants.
  • Vowels are the heart of a syllable(Most Sonorous
    Element)
  • Consonants act as sounds attached to vowels.

9
Syllable structure
  • A syllable consists of 3 major parts-
  • Onset (C)
  • Nucleus (V)
  • Coda (C)
  • Vowels sit in the Nucleus of a syllable
  • Consonants may get attached as Onset or Coda.
  • Basic structure - CV

10
Possible syllable structures
  • The Nucleus is always present
  • Onset and Coda may be absent
  • Possible structures
  • V
  • CV
  • VC
  • CVC

11
syllable theories
  • Prominence Theory
  • E.g. entertaining /ent?te?n??/
  • The peaks of prominence vowels /e ? e? ?/
  • Number of syllables 4
  • Chest Pulse Theory
  • Based on muscular activities
  • Sonority Theory
  • Based on relative soundness of segment within
    words

12
Introduction to sonority theory
  • The Sonority of a sound is its loudness
    relative to other sounds with the same length,
    stress and speech.
  • Languages have sounds associated with them
  • Some sounds are more sonorous
  • Words in a language can be divided into syllables
  • Sonority theory distinguishes syllables on the
    basis of sounds.

13
Sonority hierarchy
  • Defined on the basis of amount of sound
    associated
  • The sonority hierarchy is as follows-
  • Vowels (a, e, i, o, u)
  • Liquids (y, r, l, v)
  • Nasals (n, m)
  • Fricatives (s, z, f,..sh, th etc.)
  • Affricates (ch, j)
  • Stops (b, d, g, p, t, k)

14
Sonority scale
  • Obstruents can be further classified into-
  • Fricatives
  • Affricates
  • Stops

15
Sonority theory syllables
  • A Syllable is a cluster of sonority, defined by
    a sonority peak acting as a structural magnet to
    the surrounding lower sonority elements.
  • Represented as waves of sonority or Sonority
    Profile of that syllable
  • Nucleus
  • Onset Coda

16
Sonority sequencing principle
  • The Sonority Profile of a syllable must rise
    until its Peak(Nucleus), and then fall.
  • Peak
  • (Nucleus)
  • Onset Coda

17
examples
  • ABHIJEET
  • Sonority Profile 1
  • A I E E
  • H J
  • B T
  • Sonority Profile 2
  • A I E E
  • H J
  • B T

18
Maximal onset principle
  • The Intervocalic consonants are maximally
    assigned to the Onsets of syllables in conformity
    with Universal and Language-Specific Conditions.
  • Determines underlying syllable division
  • Example
  • DIPLOMA
  • DIP LO MA DI PLO MA

19
The concept of schwa
  • First alphabet of IAL a
  • Unstressed and Toneless neutral vowel
  • Sanskrit is phonetically perfect no neutral
    vowels
  • Hindi, Bengali etc. allow schwa to be neutral
  • Some schwas deleted and some are not
  • Schwa deletion important issue for grapheme to
    phoneme conversion

20
Schwa deletion contexts
  1. Saphalya and Amantrana
  2. Priya and Tritiya
  3. Kavya and Ashva
  4. Badhai
  5. Samuha and Chehara
  6. Badara and Kalama
  7. Kalama and Banda

21
A sonority-theoretic model
  • Developed completely in Java
  • Platform independent
  • Tries to perform syllabification of words
  • Rides on the concepts of Sonority theory mainly
    sonority sequencing principle
  • Makes use of Javas Hashmap utility to save
    execution time.

22
Technical overview
  • Consists of three major functions-
  • SonorityHierarchy()
  • syllabify(String word)
  • accuracy()
  • Delete_schwa() Under Development
  • Stores and references the Sonority hierarchy from
    the hashmap
  • Tries to find the syllable boundaries according
    to their sonority profile
  • Tries to delete schwas present in the input

23
results
  • Syllabification and PRR generation modules
    implemented
  • Number of manually syllabified words 27614
  • No. of words fed as input 27614
  • No. of words correctly syllabified 26253
  • Accuracy obtained 95.86 for English and about
    70 for Hindi
  • Accuracy of Schwa deletion in English 77
  • Schwa deletion for Hindi is under developement

24
Problems and future work
  • Problems faced
  • First rule-based implementation failed
  • Some specific consonant and vowel clusters still
    result in erroneous syllabification
  • Future work
  • Schwa deletion for Hindi and Marathi
  • Implementation of Maximal Onset First principle
  • Packaging the above implementation in a stable
    transliteration module to be used further in CLIR

25
References
  1. Giegerich, H. J. 1992. English Phonology. An
    Introduction.
  2. Kahn, Daniel. 1976. Syllable-based
    generalizations in English phonology.
  3. Lass, Roger. Phonology An Introduction to Basic
    Concepts. Cambridge University Press, 1984
Write a Comment
User Comments (0)
About PowerShow.com