Morphological Analysis of Hungarian in NooJ - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Morphological Analysis of Hungarian in NooJ

Description:

Agglutinative (and sometimes inflectional) The suffixes. Can have ... Orthography: there are difficulties, when digraphs are doubled. cs cscs ccs, gy gygy ggy ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 20
Provided by: pet9145
Category:

less

Transcript and Presenter's Notes

Title: Morphological Analysis of Hungarian in NooJ


1
Morphological Analysis of Hungarian in NooJ
  • Peter Vajda
  • Hungarian Academy of Sciences
  • Research Institute for Linguistics

2
Summary
  • Hungarian morphology
  • Linguistic resources
  • Some experiments with INTEX/NooJ
  • The solution
  • Examples
  • Derivation

3
Hungarian morphology
  • Agglutinative (and sometimes inflectional)
  • The suffixes
  • Can have many forms (vowel harmony)
  • Can change the form of the stem (there are groups
    of variants)
  • bokor (sg.) ? bokr ok (pl.) alma (sg.) ?
    almá k (pl.)
  • Sometimes begin with a linking vowel
  • plural -k / -ak / -ek / -ok / -ök
  • A noun (adj., num.) can have 7-800 forms
  • A verb can have 80 forms
  • Orthography there are difficulties, when
    digraphs are doubled
  • cs ? cscs ? ccs, gy ? gygy ? ggy

4
Nominal inflections
  • 18 cases (nominative, accusative, dative
    grammatical relations which are expressed by
    prepositions in French/English)
  • Expression of the possessives by suffixes
  • Which mark the number, the person, the number of
    the possessed
  • ház-a-m, ház-a-d, ház-a (my/your/his house)
  • ház-a-i-m, ház-a-i-d, ház-a-i (my/your/his
    houses)
  • Anaphorical possessive
  • A ház Péteré ? The house is Péters A házak
    Péteréi ? The houses are Péters
  • The maximal number of inflections can be five
  • barát-ai-tok-é-i-t
  • (I can see) those (things) of your friends

5
Verbal inflections
  • Two tenses present, past
  • three modes indicative, conditional, imperative
  • definite and indefinite conjugations
  • Néz-ek egy asztalt ? Néz-em az asztalt
  • I watch a table ? I watch the table
  • one special form where the subject is in 1st
    person and the object is in the 2nd
  • néz-lek (I watch you)
  • infinitive and conjugated infinitive (sometimes
    subjunctive in French)

6
The resources
  • Dictionary of Hungarian inflections (Elekfi,92)
  • A traditional description, profound and
    exhaustive
  • Two dimensional classification
  • Vowel harmony (3 classes) and
  • complex features of the stems (stem-types,
    linking vowel, etc., 55 classes)
  • Altogether 1700 different sub-classes
    (paradigms)
  • systematic differences and similarities are
    hidden
  • not convenient to use in finite-state transducers
  • We have converted it into a database, where we
    can retrieve all the forms from

7
The experiments with INTEX/NooJ
  • Brute-force method
  • We created one graph per sub-class for testing
    INTEX
  • 1700 sub-graphs
  • 45000 paths in the graphs
  • Using only dictionaries (.nod)
  • Dictionary of stems (70000 words)
  • ház,ház,NC2Astem1NW
  • Dictionary of suffixes (one million entries)
  • ()ak,lt1NC2Astem1gt0,1L,N1SanaPL
  • ()am,lt1NC2Astem1gt0,1L,N1SanaPSe1
  • ()at,lt1NC2Astem1gt0,1L,N1SanaACC
  • ()at,lt1NC2A1stem1gt0,1L,N1SanaACC
  • ()amat,lt1NC2Astem1gt0,1L,N1SanaPSe1ACC
  • dictionary of lexical forms (which have a zero
    morpheme as suffix)
  • ház,ház,NanaNOM

8
The linguistic solution
  • transform the database into a grammar based on
    morpho-phonological features
  • The grammatical features of stems and morphemes
    are in the dictionary
  • The features of the stems and the suffixes can be
    unified
  • Grammar
  • We have to describe the order of the morphemes
  • Introduce features which select from the
    allomorphs

9
The order of morphemes for nominals
10
The order of morphemes for nominals
barát-a-i-tok-é-i-t
barát,N PS PL ps_2 ps_pl
ANAPi ACC
11
(No Transcript)
12
Morpho-phonological features
  • To introduce features we examine the allomorphs
  • HÁZ HAJÓ
  • HÁZ - A HAJÓ-JA
  • ház,,Nnonj hajó,,Nj
  • HÁZ - AT HAJÓ - T
  • ház,,Nnonjacclink hajó,,Njaccnolink

13
The dictionary
14
(No Transcript)
15
The plural and the accusative kalap - ot (hat,
SGACC) kalap - ok - at (hats, PLACC)
16
Derivation
  • Can change or leave the category (POS)
  • Introduce new features
  • kosár kosar - ak (pl.) basket
  • kosar-as kosar - as - ok (pl.) basketball
    player
  • Simple cases are handled by graphs
  • Others are listed as lemmas in the dictionary

17
Assimilation and digraphs
  • some suffixes (eg. val/vel) enforce total
    assimilation
  • LÉC VEL ? LÉCCEL
  • PÉCS VEL ? PÉCCSEL
  • PLÉD VEL ? PLÉDDEL

18
  • Conclusion
  • We have adapted the traditional description
  • We have described the inflectional morphology of
    Hungarian in NooJ grammars/dictionaries
  • Handled some of the derivational morphology
  • Objectives
  • Find a simpler method for derivation
  • Disambiguation
  • Automatic methods to expand the dictionary
  • Automatic delegation of features

19
Thank you
Write a Comment
User Comments (0)
About PowerShow.com