Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf - PowerPoint PPT Presentation

About This Presentation
Title:

Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf

Description:

Title: Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf s Law and Word Co-Occurrences Author: Reginald Smith – PowerPoint PPT presentation

Number of Views:251
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Investigating the Ancient Meroitic Language Using Statistical Natural Language Techniques: Zipf


1
Investigating the Ancient Meroitic Language Using
Statistical Natural Language Techniques Zipfs
Law and Word Co-Occurrences
  • Reginald Smith
  • August 10, 2006
  • Sudan Studies Association Conference
  • Rhode Island College

2
Meroitic is the language of the ancient kingdom
of Kush
  • Used for almost six hundred years from 2nd
    century BCE to 4th century CE
  • Phonetic language written right to left (like
    Arabic)
  • Transliteration made possible by work of British
    archaeologist FL Griffith around 1910

3
Meroitic remains largely undeciphered and an
enigma
  • No complete vocabulary is available
  • Some words such as place names, loan words, or
    simple concepts are known
  • For example or qore means
    king
  • Perhaps or qes is Kush
  • Many attempts have been made to understand
    Meroitic using phonology or comparative
    linguistics
  • Scholars have tried in vain to find a known
    language that is a relative (see sources in
    paper)
  • We wish we had a bilingual text like the Rosetta
    stone to guide us

4
A new method could use mathematics and linguistics
  • Statistical natural language processing analyzes
    the properties of language using a mix of
    statistics and linguistics
  • There are several properties of languages that
    are the same in all human languages
  • Certain techniques can also help us possibly
    infer meanings of words (by relating them to
    other known words)

5
Zipfs Law Frequencies of Words
  • If you rank order words in a text by how frequent
    ( of times a word appears) they are (1 being
    most frequent) and then relate this to the
    frequency of the word, you get Zipfs Law
  • Zipfs Law where F is the frequency of a word,
    C is a constant, R is the rank, and a is known as
    the power law exponent
  • For all languages a 1

6
Zipf Law Graphs
  • When you graph the frequency vs. the rank on a
    log-log graph (graphing the logarithm of
    frequency vs. the logarithm of rank) you get a
    straight line whose slope is a

Zipf line fit on data. The red line is the fitted
slope on the data points
Picture Source University of Helsinki CS
department
7
Does Meroitic follow Zipfs Law?
  • The two graphs below show log-log plots of
    frequency vs. rank for the Meroitic words in 69
    texts. The slopes are shown for each
  • The normal plot counts the words as is. The
    morpheme out plot split out suffixes like lowi
    as the separate words lo and wi
  • Since it has a slope of nearly -1 the morpheme
    out model of Meroitic seems to follow Zipfs Law

Normal plot Slope -0.81
Morpheme out plot Slope -1.03
8
So what does this show us (besides graphs)
  • Despite the apparently low amount of texts
    available, our sample of Meroitic is structured
    just like all other human languages (English,
    Chinese, etc.)
  • Therefore, even though we dont know the meaning
    of the words, we know that the language we have
    is representative
  • Even though most of our samples are redundant
    funeral stelae
  • We can then proceed to use other statistical
    techniques on Meroitic and also compare its
    statistical features to other languages

9
Step Two Word Co-occurrence
  • When words occur together in a text, they are
    said to co-occur
  • I am here has co-occurrence between I-am and
    am-here
  • Co-occurrences can tell us about the words if we
    have enough of them
  • Words that co-occur with the same words often
    have similar parts of speech or even meanings
  • Can we use word co-occurrence in Meroitic to
    analyze classes of words?

10
What I did with Meroitic
  • I analyzed Meroitic by matching together words
    that co-occurred with the same types of words
  • For example if you have two sentences I eat
    horses and We eat lizards
  • I match I and We because they both co-occur
    with eat
  • I also match horses and lizards because they
    also co-occur with eat (in the opposite
    direction)
  • I then graph connected words together and analyze
    them with software
  • What happens?

Technical note I actually used undirected edges
for co-occurring words in the graph shown on the
next page
11
Meroitic Words Graph
Group 3
  • Four main groups of words form that correspond
    well to Meroitic categories including positions
    and titles, verbs, places, and miscellaneous nouns

Group 4
Group 1
Group 2
12
Results
  • Techniques like the word co-occurrence matching
    can help us categorize Meroitic words that we
    previously guessed on by mapping them against
    words we already know the part of speech for
  • Similar statistical techniques may allow us to
    match words with a similar meaning to infer the
    meanings of some words
  • This is still speculative though

13
Conclusion
  • Statistical natural language processing is a new
    approach to Meroitic that could supplement other
    current efforts in the language
  • Much more work remains to be done, but this new
    avenue may help us move closer to the goal of
    understanding this beautiful and mysterious
    language
  • Acknowledgements I give my boundless
    appreciation to Dr. Richard Lobban and Dr.
    Laurance Doyle for the help and advice they gave
    me on this papers topics
Write a Comment
User Comments (0)
About PowerShow.com