Learning Morphological Disambiguation Rules for Turkish

About This Presentation

Title:

Learning Morphological Disambiguation Rules for Turkish

Description:

Learning Morphological Disambiguation Rules for Turkish Deniz Yuret Ferhan T re Ko University, stanbul Overview Turkish morphology The morphological ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 23

Provided by: Deniz66

Category:

more less

Transcript and Presenter's Notes

Title: Learning Morphological Disambiguation Rules for Turkish

1
Learning Morphological Disambiguation Rules for
Turkish

Deniz Yuret
Ferhan Türe
Koç University, Istanbul

2
Overview

Turkish morphology
The morphological disambiguation task
The Greedy Prepend Algorithm
Training
Evaluation

3
Turkish Morphology

Turkish is an agglutinative language Many
syntactic phenomena expressed by function words
and word order in English are expressed by
morphology in Turkish.
I will be able to go.
(go) (able to) (will) (I)
git ebil ecek im
Gidebilecegim.

4
Fun with Turkish Morphology
Avrupalilastiramadiklarimizdanmissiniz

Avrupa Europe
li European
las become
tir make
ama not able to

dik we were
larimiz those that
dan from
mis were
siniz you

5
So how long can words be?

uyu sleep
uyut make X sleep
uyuttur have Y make X sleep
uyutturt have Z have Y make X sleep
uyutturttur have W have Z have Y make X sleep
uyutturtturt have Q have W have Z

6
Morphological Analyzer for Turkish

masali
masalNounA3sgPnonAcc ( the story)
masalNounA3sgP3sgNom ( his story)
masaNounA3sgPnonNomDBAdjWith ( with
tables)
Oflazer, K. (1994). Two-level description of
Turkish morphology. Literary and Linguistic
Computing
Oflazer, K., Hakkani-Tür, D. Z., and Tür, G.
(1999) Design for a turkish treebank. EACL99
Kenneth R. Beesley and Lauri Karttunen, Finite
State Morphology, CSLI Publications, 2003

7
Features, IGs and Tags
masaNounA3sgPnonNomDBAdjWith

8 unique tags
11084 distinct tags observed in 1M word training
corpus

126 unique features
9129 unique IGs

8
Why not just do POS tagging?
from Oflazer (1999)
9
Why not just do POS tagging?

Inflectional groups can independently act as
heads or modifiers in syntactic dependencies.
Full morphological analysis is essential for
further syntactic analysis.

10
Morphological disambiguation

Ambiguity rare in English
lives lives or lifes
More serious in Turkish
42.1 of the tokens ambiguous
1.8 parses per token on average
3.8 parses for ambiguous tokens

11
Morphological disambiguation

Task pick correct parse given context
masalNounA3sgPnonAcc
masalNounA3sgP3sgNom
masaNounA3sgPnonNomDBAdjWith
Uzun masali anlat Tell the long story
Uzun masali bitti His long story ended
Uzun masali oda Room with long table

12
Morphological disambiguation

Task pick correct parse given context
masalNounA3sgPnonAcc
masalNounA3sgP3sgNom
masaNounA3sgPnonNomDBAdjWith
Key Idea
Build a separate classifier for each feature.

13
Decision Lists

If (W çok) and (R1 DA)
Then W has Det
If (L1 pek)
Then W has Det
If (W AzI)
Then W does not have Det
If (W çok)
Then W does not have Det
If TRUE
Then W has Det

pek çok alanda (R1)
pek çok insan (R2)
insan çok daha (R4)

14
Greedy Prepend Algorithm
GPA(data) 1 dlist NIL 2 default-class
Most-Common-Class(data) 3 rule If TRUE Then
default-class 4 while Gain(rule, dlist, data) gt
0 5 do dlist prepend(rule, dlist) 6
rule Max-Gain-Rule(dlist, data) 7 return
dlist
15
Training Data

1M words of news material
Semi automatically disambiguated
Created 126 separate training sets, one for each
feature
Each training set only contains instances which
have the corresponding feature in at least one of
their parses

16
Input attributes

For a five word window
The exact word string (e.g. WAli'nin)
The lowercase version (e.g. Wali'nin)
All suffixes (e.g. Wn, WIn, WnIn, W'nIn,
etc.)
Character types (e.g. Ali'nin would be described
with WUPPER-FIRST, WLOWER-MID, WAPOS-MID,
WLOWERLAST)
Average 40 features per instance.

17
Sample decision lists
Acc 0 1 WInI 1 WyI 1 WUPPER0 1 WIzI 1
L1bu 1 Wonu 1 R1mAK 1 Wbeni 0 Wgünü 1
WInlArI 1 Wonlarý 0 WolAyI 0 Wsorunu
(672 rules)
Prop 1 0 WSTFIRST 0 WTürk 1 WSTFIRST
R1UCFIRST 0 L1. 0 WAnAl 1 R1, 0 WyAD 1
WUPPER0 0 WlAD 0 WAK 1 R1UPPER 0 WMilli 1
WSTFIRST R1UPPER0 (3476 rules)
18
Models for individual features
19
Combining models

masalNounA3sgP3sgNom
masalNounA3sgPnonAcc
Decision list results and confidence (only
distinguishing features necessary)
P3sg yes (89.53)
Nom no (93.92)
Pnon no (95.03)
Acc yes (89.24)
score(P3sgNom) 0.8953 x (1 0.9392)
score(PnonAcc) (1 0.9503) x 0.8924

20
Evaluation

Test corpus 1000 words, hand tagged
Accuracy 95.87 (conf. int 94.57-97.08)
Better than the training data !?

21
Other Experiments

Retraining on own output 96.03
Training on unambiguous data 82.57
Forget disambiguation, lets do tagging with a
single decision list 91.23, 10000 rules

22
Contributions

Learning morphological disambiguation rules using
GPA decision list learner.
Reducing data sparseness and increase noise
tolerance using separate models for individual
output features.
ECOC, WSD, etc.

Write a Comment

User Comments (0)