Computational Investigation of Palestinian Arabic Dialects - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Computational Investigation of Palestinian Arabic Dialects

Description:

Create lexicons and glossaries for Arabic. dialects automatically. ... An automated creation of a glossary to organize all the lexical items by grammatical features. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 17
Provided by: miri7
Category:

less

Transcript and Presenter's Notes

Title: Computational Investigation of Palestinian Arabic Dialects


1
Computational Investigation of Palestinian
Arabic Dialects
  • Ezra Daya
  • Rafi Talmon
  • Shuly Wintner

2
Background
  • Fieldwork study refers to Arabic
  • dialects spoken by people in 250 localities
  • Northern and central parts of Israel.
  • Localities in the West Bank.
  • Southern Lebanese communities in Galilee.
  • 1948s Palestinian refugees in existing Arabic
    localities .

3
Background cont.
  • Colloquial Arabic features
  • Non-official spoken language, usually not
    written.
  • Differs from place to place.
  • The similarity/distance between the Arabic
    dialects
  • can be measured
  • Considered by the speakers as less prestigious
  • compared to the official Arabic.

4
Background cont.
  • Work performed by special teams
  • Collecting and processing fieldwork material such
    as recorded interviews and linguistic
    questionnaires.
  • Transcription of the material that constitutes
    the basis of our work.
  • Defining an accurate description of the language
    varieties of Palestinian colloquial Arabic, their
    characteristics, and their geographical
    distribution.

5
Transcribed Text Sample
6
Objectives
  • Publication of the vast collected material
  • using computational linguistic techniques
  • in order to
  • Create lexicons and glossaries for Arabic
  • dialects automatically.
  • Create a linguistic atlas to graphically measure
    the similarities among the dialects.
  • Better understanding of morphological and
    phonemic dialectology features.

7
Linguistic Atlas
8
The challenge Rich Morphology
  • Semitic languages such as Arabic, have a rich
  • morphology and contain highly inflected forms.
  • Example
  • axdat is 3nd, singular, feminine, past
    form of the verb axad
  • Obtained by concatenating the
    suffix at and reducing
  • the vowel a to the base
    axad.

9
Rich Morphology cont.
  • Arabic has a complex system of morphology based
  • on triconsonantal roots that is common in Semitic
  • languages.
  • For example, there are 10 verb patterns, each
  • of which can be inflected in 3 numbers, 2
    genders,
  • 3 persons, several tenses and aspects, and can be
  • suffixed by several pronominal forms.

10
Traditional Approach
  • Assignment of linguists performing grammatical
    analysis of the transcribed texts and manually
    creating lexicon, glossaries and linguistic
    atlas.
  • Disadvantages
  • Lack of sophistication.
  • Time consuming.
  • Expensive human resources.

11
Innovative Approach
  • Devise an automated analysis of these
  • transcribed texts, in order to obtain
  • An automated creation of a glossary to organize
    all the lexical items by grammatical features.
    i.e. root, pattern etc.
  • Isolation of the phonetic and morphological
    features and characteristic of specific dialects
    in this surveyed area.
  • Measurement of dialect similarity.
  • Automated processing provides accuracy
  • and efficiency .

12
Linguistic Technologies
  • For this research we intend to exploit
  • existing computational linguistics technology for
  • the investigation of Palestinian Arabic dialects
  • by using
  • Finite-State technology.
  • Machine learning techniques.
  • Computational dialectology.

13
Finite State Technology
  • Employing the Xerox finite state tools and
  • techniques which are
  • Useful and efficient programs that process text
    in natural languages.
  • Concentrating on morphological analysis and
    generation.
  • Giving access to finite state operations and a
    regular expression compiler.

14
Machine Learning
  • Machine learning is concerned with the question
    of how to construct computer programs that
  • automatically improve with experience.
  • Two distinguished learning frameworks
  • according to the amount of supervision used
  • Supervised learning
  • when the learning algorithm is presented with
    pairs of strings of symbols., i.e. inflected and
    uninflected forms.
  • Unsupervised learning
  • when the algorithm is presented merely with a
    single set of words, and must work out what the
    morphological relationships are.

15
Computational Dialectology
  • Use measures to compute the distance
  • between two given dialects and to define
  • geographical dialect boundaries.
  • Example Edit Distance
  • The distance could be set sensitive to
  • phonological similarities.
  • Example

16
Previous Related Work
  • Morphological Tagging of the Quran
  • The system facilitates a variety of queries on
    the Quranic text that make reference to the
    words and their linguistic attributes and
    provides full morphological tagging of its
    words.
  • The core of the system is a set of finite-state
    based
  • rules which describe the morpho-phonological
    and
  • morpho-syntactic phenomena of the Quranic
  • language.
  • The system is currently being used for teaching
    and research purposes.
Write a Comment
User Comments (0)
About PowerShow.com