Morphology - PowerPoint PPT Presentation

About This Presentation
Title:

Morphology

Description:

Words usually consist of a root plus affix(es), though some ... Morphophonemics. Morphemes and allomorphs. eg {plur}: (e)s, vowel change, y ies, f ves, um ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 32
Provided by: Har134
Category:

less

Transcript and Presenter's Notes

Title: Morphology


1
Morphology
  • See
  • Harald Trost Morphology. Chapter 2 of R Mitkov
    (ed.) The Oxford Handbook of Computational
    Linguistics, Oxford (2004) OUP
  • D Jurafsky JH Martin Speech and Language
    Processing, Upper Saddle River NJ (2000)
    Prentice Hall, Chapter 3 quite technical

2
Morphology - reminder
  • Internal analysis of word forms
  • morpheme allomorphic variation
  • Words usually consist of a root plus affix(es),
    though some words can have multiple roots, and
    some can be single morphemes
  • lexeme abstract notion of group of word forms
    that belong together
  • lexeme root stem base form dictionary
    (citation) form

3
Role of morphology
  • Commonly made distinction inflectional vs
    derivational
  • Inflectional morphology is grammatical
  • number, tense, case, gender
  • Derivational morphology concerns word building
  • part-of-speech derivation
  • words with related meaning

4
Inflectional morphology
  • Grammatical in nature
  • Does not carry meaning, other than grammatical
    meaning
  • Highly systematic, though there may be
    irregularities and exceptions
  • Simplifies lexicon, only exceptions need to be
    listed
  • Unknown words may be guessable
  • Language-specific and sometimes idiosyncratic
  • (Mostly) helpful in parsing

5
Derivational morphology
  • Lexical in nature
  • Can carry meaning
  • Fairly systematic, and predictable up to a point
  • Simplifies description of lexicon regularly
    derived words need not be listed
  • Unknown words may be guessable
  • But
  • Apparent derivations have specialised meaning
  • Some derivations missing
  • Languages often have parallel derivations which
    may be translatable

6
Morphological processes
  • Affixes prefix, suffix, infix, circumfix
  • Vowel change (umlaut, ablaut)
  • Gemination, (partial) reduplication
  • Root and pattern
  • Stress (or tone) change
  • Sandhi

7
Morphophonemics
  • Morphemes and allomorphs
  • eg plur (e)s, vowel change, y?ies, f?ves, um
    ?a, ?, ...
  • Morphophonemic variation
  • Affixes and stems may have variants which are
    conditioned by context
  • eg ing in lifting, swimming, boxing, raining,
    hoping, hopping
  • Rules may be generalisable across morphemes
  • eg (e)s in cats, boxes, tomatoes, matches,
    dishes, buses
  • Applies to both plur (nouns) and 3rd sing
    pres (verbs)

8
Morphology in NLP
  • Analysis vs synthesis
  • what does dogs mean? vs what is the plural of
    dog?
  • Analysis
  • Need to identify lexeme
  • Tokenization
  • To access lexical information
  • Inflections (etc) carry information that will be
    needed by other processes (eg agreement useful in
    parsing, inflections can carry meaning (eg tense,
    number)
  • Morphology can be ambiguous
  • May need other process to disambiguate (eg German
    en)
  • Synthesis
  • Need to generate appropriate inflections from
    underlying representation

9
Morphology in NLP
  • String-handling programs can be written
  • More general approach
  • formalism to write rules which express
    correspondence between surface and underlying
    form (eg dogs dog plur)
  • Computational algorithm (program) which can apply
    those rules to actual instances
  • Especially of interest if rules (though not
    program) is independent of direction analysis or
    synthesis

10
Role of lexicon in morphology
  • Rules interact with the lexicon
  • Obviously category information
  • eg rules that apply to nouns
  • Note also morphology-related subcategories
  • eg er verbs in French, rules for gender
    agreement
  • Other lexical information can impact on
    morphology
  • eg all fish have two forms of the plural (s and
    ?)
  • in Slavic languages case inflections differ for
    inanimate and animate nouns)

11
Problems with rules
  • Exceptions have to be covered
  • Including systematic irregularities
  • May be a trade-off between treating something as
    a small group of irregularities or as a list of
    unrelated exceptions (eg French irregular verbs,
    English f?ves)
  • Rules must not over/under-generate
  • Must cover all and only the correct cases
  • May depend on what order the rules are applied in

12
Tokenization
  • The simplest form of analysis is to reduce
    different word forms into tokens
  • Also called normalization
  • For example, if you want to count how many times
    a given word occurs in a text
  • Or you want to search for texts containing
    certain words (e.g. Google)

13
Morphological processing
  • Stemming
  • String-handling approaches
  • Regular expressions
  • Mapping onto finite-state automata
  • 2-level morphology
  • Mapping between surface form and lexical
    representation

14
Stemming
  • Stemming is the particular case of tokenization
    which reduces inflected forms to a single base
    form or stem
  • (Recall our discussion of stem base form
    dictionary form citation form)
  • Stemming algorithms are basic string-handling
    algorithms, which depend on rules which identify
    affixes that can be stripped

15
Finite state automata
  • A finite state automaton is a simple and
    intuitive formalism with straightforward
    computational properties (so easy to implement)
  • A bit like a flow chart, but can be used for both
    recognition (analysis) and generation
  • FSAs have a close relationship with regular
    expressions, a formalism for expressing strings,
    mainly used for searching texts, or stipulating
    patterns of strings

16
Finite state automata
  • A bit like a flow chart, but can be used for both
    recognition and generation
  • Transition network
  • Unique start point
  • Series of states linked by transitions
  • Transitions represent input to be accounted for,
    or output to be generated
  • Legal exit-point(s) explicitly identified

17
ExampleJurafsky Martin, Figure 2.10
  • Loop on q3 means that it can account for infinite
    length strings
  • Deterministic because in any state, its
    behaviour is fully predictable

18
Non-deterministic FSAJurafsky Martin, Figure
2.18
  • At state q2 with input a there is a choice of
    transitions
  • We can also have jump arcs (or empty
    transitions), which also introduce non-determinism

19
An FSA to handle morphology
Spot the deliberate mistake overgeneration
20
Finite State Transducers
  • A transducer defines a relationship (a mapping)
    between two things
  • Typically used for two-level morphology, but
    can be used for other things
  • Like an FSA, but each state transition stipulates
    a pair of symbols, and thus a mapping

21
Finite State Transducers
  • Three functions
  • Recognizer (verification) takes a pair of
    strings and verifies if the FST is able to map
    them onto each other
  • Generator (synthesis) can generate a legal pair
    of strings
  • Translator (transduction) given one string, can
    generate the corresponding string
  • Mapping usually between levels of representation
  • spys spies
  • Lexicalintermediate foxNPs foxs
  • Intermediatesurface foxs foxes

22
Some conventions
  • Transitions are marked by
  • A non-changing transition xx can be shown
    simply as x
  • Wild-cards are shown as _at_
  • Empty string shown as e

23
An examplebased on Trost p.42
spys spies
s
p
yi
e
s
e
e
toys toys
t
o
y
0
s
e
e
s
h
e
e
s
e
l
fv
e
w
i
fv
e
s
e
e
24
Using wild cards and loops
s
p
yi
e
s
0
0
t
o
y
0
s
0
0
Can be collapsed into a single FST
25
Another example (JM Fig. 3.9, p.74)
f o x c a t d o g
P s
Ne
q4
q1
g o o s e s h e e p m o u s e
S
Ne
q0
q5
q2
q7
S
g oe oe s e s h e e p m oi uesc e
Ne
P
q6
q3
lexicalintermediate
26
f o x c a t d o g
q1
q0
o
s1
s2
f
x
a
c
t
q0
q1
s3
s4
d
g
o
s5
s6
27
  • 0 ff oo xx 1 Ne 4 P ss 7
  • 0 ff oo xx 1 Ne 4 S 7
  • 0 cc aa tt 1 Ne 4 P ss 7
  • 0 ss hh ee pp 2 Ne 5 S 7
  • 0 gg oe oe ss ee 3 Ne 5 P 7

f o x N P s f o x s f o x N S f o x c
a t N P s c a t s s h e e p N S s h e e
p g o o s e N P g e e s e
f o x c a t d o g
P s
Ne
q4
q1
g o o s e s h e e p m o u s e
S
Ne
q0
q5
q2
q7
S
g oe oe s e s h e e p m oi uesc e
Ne
P
q6
q3
28
Lexicalsurface mappingJM Fig. 3.14, p.78
f o x N P s f o x s c a t N P s c a t
s
e ? e / x s z __ s
29
0 ff 0 oo 0 xx 1 e 2 ee 3 ss
4 0 0 cc 0 aa 0 tt 0 e 0
ss 0 0
f o x s f o x e s c a t s c a t s
other
q5
e other
z, s, x
s
e
z, s, x
e
ee
s
q0
q1
q4
q2
q3
, other
z, x

30
FST
  • But you dont have to draw all these FSTs
  • They map neatly onto rule formalisms
  • What is more, these can be generated
    automatically
  • Therefore, slightly different formalism

31
FST compiler http//www.xrce.xerox.com/competencie
s/content-analysis/fsCompiler/fsinput.html d o g
N P .x. d o g s c a t N P .x. c a t s
f o x N P .x. f o x e s g o o s e N P .x.
g e e s e  
s0 c -gt s1, d -gt s2, f -gt s3, g -gt s4. s1 a
-gt s5. s2 o -gt s6. s3 o -gt s7. s4 ltoegt
-gt s8. s5 t -gt s9. s6 g -gt s9. s7 x -gt
s10. s8 ltoegt -gt s11. s9 ltNsgt -gt s12. s10
ltNegt -gt s13. s11 s -gt s14. s12 ltP0gt -gt
fs15. s13 ltPsgt -gt fs15. s14 e -gt s16. fs15
(no arcs) s16 ltN0gt -gt s12.
Write a Comment
User Comments (0)
About PowerShow.com