With%206,500%20languages%20in%20the%20world,%20%20we%20must%20explore%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20new%20ways%20to%20learn,%20document,%20and%20share%20our%20linguistic%20knowledge. - PowerPoint PPT Presentation

About This Presentation
Title:

With%206,500%20languages%20in%20the%20world,%20%20we%20must%20explore%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20new%20ways%20to%20learn,%20document,%20and%20share%20our%20linguistic%20knowledge.

Description:

Title: Unlocking and Sharing LTCL Linguistic Knowledge Author. Last modified by: John Kovarik Created Date: 11/14/2002 10:30:14 PM Document presentation format – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 19
Provided by: 6559
Category:

less

Transcript and Presenter's Notes

Title: With%206,500%20languages%20in%20the%20world,%20%20we%20must%20explore%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20new%20ways%20to%20learn,%20document,%20and%20share%20our%20linguistic%20knowledge.


1
With 6,500 languages in the world,we must
explore
new ways to learn, document,
and share our linguistic knowledge.
  • John J. Kovarik
  • NSA/CSS Senior Language Technology Authority

2
Unlocking and Sharing LTCL Linguistic
KnowledgeKeywords CFG parsing, language
generation, computational linguistics
  • CALICO 05
  • University of Michigan
  • Ann Arbor, MI May 17-20, 2005

3
The Challenges of Learning and Sharing Knowledge
of an LCTL in the 21st Century
  • John J. Kovarik
  • National Security Agency

4
Presentation Overview
  • General LCTL Challenges
  • Challenges of Learning Mongolian
  • Recipe for New Approach
  • Khalka Mongolian Parts of Speech
  • Mongolian Morphological Affixes
  • Method of Lexical Knowledge Representation
  • Analyze, Parse, Build Grammar Model, Test
  • Iterate Repeatedly

5
LCTL Learning Challenges
  • Fewer Learned Resources to Learn from
  • Less Recognition Nationally
  • Less Opportunities to Document Whats Learned
  • Very Few Students to Learn from You
  • Almost All Learning Done Manually
  • Few Reliable 21st Century Applications
  • Microsoft IME
  • Font

6
Mongolian Learning Challenges
  • Input Method Emulator (IME)
  • MicroSoft IME
  • Keyboard arranged for native Mongols
  • American Mongolists prefer phonetic keyboard
  • a key on Mongolian keyboard mapped to ASCII a
    etc.
  • Fonts commonly used on Internet
  • Russian Cyrillic fonts are commonly used
  • and 0 commonly substituted for ? and ?
  • ? and ? often freely extended to ? and ?

7
Recipe for a New Approach
  • Take a student with a computational linguistics
    background
  • Infuse with curiosity and energy
  • Stir in access to the Internet
  • Add Mongolian syntax and morphology
  • Create morphological analyzer, context free
    parser, and grammatical generator for Mongolian
  • Resulting lexicons, software, and grammar models
    can be used by other linguistically adept
    students

8
Khalkha Mongolian Parts of Speech
  • Declinable Nouns
  • Declinable Adjectives
  • Inflected Verbs
  • Unchanging Adverbs
  • Declinable Converbs
  • Unchanging Postpositions
  • Unchanging Conjunctions
  • Unchanging Particles

9
Mongol Morphological Affixes
  • 27 verbal suffixes denoting tense and mood
  • 2 verb infixes denoting verb manner
  • Consultative
  • Passive
  • 6 verb paradigms or verb types
  • 3 irregular common verbs
  • 6 cases in singular and plural number
  • Both nouns and adjectives are declined

10
Lexical Knowledge Representations
  • Unchanging adverbs, conjunctions, particles, etc.
    and irregular verb forms (unchanging.txt file)
  • Lemmas of declinable nouns and adjectives
    (declinables.txt file)
  • Inflected verbs and nominalized verbs (regvb.txt
    file)
  • Affix files (casendings.txt, reflex.txt,
    infixes.txt, vbforms.txt)

11
Some Examples
  • declinables.txt file
  • N ??? Q ???
  • regverb.txt file
  • V ?? V ??
  • Affix files
  • casendings.txt g ??? d ? a ?? b ???
  • reflex.txt ?? ?? ??
  • infixes.txt C ?? R ?? P ??
  • vbforms.txt) ipf ?? i1p ? i3p ??? Ypf ?????
  • unchanging.txt file
  • Pg-gt?????? Pc-gt????????????

12
Merge Morphology Knowledge with the Power of the
Computer
  • Wrote yalgah.pl to become tireless lexical
    pedagogue
  • Searches for identifiable affixes by comparison
    with lexical knowledge affix files
  • Matches resulting lemma against lexical knowledge
    declinables, verbs, and unchanging words, then
    outputs word/part of speech tag to standard
    output file plus expository lexicon
  • Depending whether lemma can or cannot be matched,
    outputs
  • Lemma to Out Of Vocabulary (oov) file noting
    affixes found
  • Word/part of speech tag to standard output file

13
Additional Outputs
  • Expository Morphology File (named morphlex.txt)
  • IR-gtverb command imperative 2nd person singular
  • IREEREY-gtconverb future perfect continuative
  • IREG-gt verb command concessive 3rd person
    singular/plural
  • BAGA-gtadjective
  • HURAL-gtnoun nominative
  • IH-gtadjective
  • AJILDAA-gtreflexive noun dative-locative
  • ORLOO-gtverb indicative second past
  • Out Of Vocabulary File (named oov)
  • C gt 5 0 E 0 0 A 0 0 (UNKNOWNAHAASAA)
    WORD 0 LINE 2
  • FALLS OUTSIDE OF VOCABULARY
  • possible reflexive ending lt0 0 gt-ltAAgt
  • possible declinable case endingltbgt-lt0 0 A gt-ltAASgt
  • possible verbal part of speech ltYpf gt-lt0 E gt-ltAHgt
  • possible participial/converbal stem ltC gt 5
    gt--ltUNKNOWNgt

14
Feed Analytic Output to Parser
  • Developed context-free grammar (CFG) rules for
    both discourse and newspaper texts
  • S-gtSbj Prd S-gtPrd Sbj-gtNn Sbj-gtNP
  • NP-gtTg Nn NP-gtTg Ng Nn Prd-gtJ
  • Wrote parse.pl to validate CFG rules against
    input text tagged as to part of speech
  • When each sentence can be fully parsed, outputs a
    parse tree and an English gloss.
  • Working on "BAGA HURAL IH AJILDAA ORLOO ."
  • ENGLISH GLOSS large hural great work began .
  • The sentence does parse.
  • Branch nodes on tree
  • S -gt (Sbj Prd)
  • Sbj -gt (NP)
  • NP -gt (J Nn)
  • Prd -gt (NPd Vi2p)
  • NPd -gt (J Nd)
  • POS J Nn J Nd Vi2p

15
Feed Output to Generator
  • Wrote gramgen.pl to generate sentences based on
    lexical knowledge, morphological knowledge, and
    syntactic knowledge gained
  • Output routinely reviewed for accuracy and
    Chomskian explanatory adequacy of the grammar
    models created for the parser and generator
    engines

16
Iterative Process
  • First take new newspaper article or dialogue and
    run morphological analyzer on it until all words
    are listed within vocabulary (no output in the
    oov Out Of Vocabulary file
  • Run output through parser, creating new CFG rules
    until new text parses
  • Run generator for a hundred or more examples to
    ensure adequacy of new rules

17
Morpho-analyzer, Parser, GeneratorSoftware Led
This Student to Deeper Understanding of Mongolian
  • A linguistically adept learner can thus write
    software to help one learn deeper faster
  • Language tool development is thus grounded in
    gaining and applying language knowledge in a
    systematic and linguistically principled manner
    for oneself and others

18
Contact Information
  • John Kovarik
  • Email kovarik_at_afterlife.ncsc.mil
  • Home Page http//www.worldnet.att/kovariks
  • Phone 443-479-7188
Write a Comment
User Comments (0)
About PowerShow.com