Title: Morphology
1Morphology
- What is morphology?
- Finite State Transducers
- Two Level Morphology
2What is morphology?
- Decomposition of words into meaningful units
- anti dis establish ment arian ism
- Interacts with- syntax( categories and word
order) - establish verb ment
noun - phonology divine
divinity - obscene
obscenity - Interacts with semantics
- boy boys
- Peter Peterchen
-
3 Phonological String
morphological analyzer
dictionary lookup
syntactic analyzer
lexical- semantic
analysis discourse
processing
4Why store all words as morphemes rather than
all Morphological combinations as
words? What does the morphological analyzer
have to output?
5The what and the how
- Efficient and effective algorithm to decompose
categories into, - or build categories from, component
morphemes. - What this algorithm will be depends on problems
it has to solve. - In turn depends on representations computed.
- Given stem /lemma ( e.g. jump add material to
change category - Or grammatical properties of word jumped,
jumpable - order of composition matters
- ride/ riding
- enoble/ enobling/nobling Adj ---gt V,
Vgt Ving - trance/trancing/entrance/entrancing
6CONCATENATIVE MORPHOLOGICAL PROCESSES
COMPOUNDING firefighter PREFIXATION Un
well INFIXATION ( TAGALOG) fikas -
strong fumikas - be strong SUFFIXATION Kick
er CIRCUMFIXATION ( German) ge sag t
past prefix say past suffix
7Inflectional Morphology
- non category changing, required by syntax
- Agreement person/number
- Je parle
- Nous parlons
- Ils parlent
-
- Gender
- la petite ( the little one (fem))
- le petit ( the little one (masc))
- la squelette ( the skeleton)
-
-
8Derivational Morphology
- changes category. Not required by syntax
- Deverbal Nominal
- baker tion destroy/destruction
- catch er Roman's destruction of
the city -
- 'er' agent of action Catcher of the ball
- Johns catcher
of the ball - 'John" one who caught
9Regular vs Irregular Jump/jumped hit/hit
bring/brought sing/sang Productive/Non-Produc
tive adore/adorable, kick/kickable,
fax/faxable produce/production
destroy/destruction graft/graftuction Bring/
brought
10Regular (English) Verbs
Morphological Form Classes Regularly Inflected Verbs Regularly Inflected Verbs Regularly Inflected Verbs Regularly Inflected Verbs
Stem walk merge try map
-s form walks merges tries maps
-ing form walking merging trying mapping
Past form or ed participle walked merged tried mapped
11Irregular (English) Verbs
Morphological Form Classes Irregularly Inflected Verbs Irregularly Inflected Verbs Irregularly Inflected Verbs
Stem eat catch cut
-s form eats catches cuts
-ing form eating catching cutting
Past form ate caught cut
-ed participle eaten caught cut
12To love in Spanish
13- Productive and rule governed
- fax fax er
- ??? Crudoy cruduction
- Category sensitivity
- breakable/ manable
- sensitivity/ hittivity
- Semantic sensitivity
- un well un happy
- un ill un sad
14Store morphemes or words?
lebensversicherungsgesellschaftsangesteller leben
versicherung gesellschaftsangesteller life
insurance company Poss
employee Turkish Turkish verns have 40k forms
15Non- concatenative Morphology
- Templatic morphology (Semitic languages)lmd
(learn), lamad (he studied), limed (he taught),
lumad (he was taught)
16Concatenation Beads on a string
Agglutinative ( concatenative) languages are well
behaved for FSAs as long as we dont include
phonological or spelling changes
Verb Lexicon jumped
jump kissed
kiss streamed
stream hopped
hop, ???
verb
ed
q
q 1
q
q1
q2
0
17Pieces of a Morphological Analyzer
-er,est,ly
un
adj-root
q2
q3
q0
q1
The lexicon stores the lemmas, and divides them
into adjective classes really/clearly
bigly/redly Morphotactics State sequence
indicates order of morpheme composition e.g.
comparative or adverb formation is by suffixation
18Lexicon
- Arranged as TRIE ( letter strings in common
relative to position -
- n-k-e-y
- D-o
- -g
- Classed by part of speech category ( noun,
verb) and morphotactic - (which other affixes can precede or follow)
- or orthographic considerations.
19Orthography
- spelling rules- handle phonological or spelling
variation in - orthographic a morpheme
- Try /trying/tries
- Cringe/cringing/cringes
20FSA for Inflectional Morphology English Nouns
21FSA for Inflectional Morphology English Verbs
22FSA for Derivational Morphology Adjectival
Formation
23More Complex Derivational Morphology
24Using FSAs for Recognition English Nouns and
their Inflection
25- Orthographic
- Want association between morpheme and semantic
function - Want association between allographs or
allophones of the same - phoneme
- Allographs
- city -cities
- bake- baking
- divine-divinity
- try tried
26Finite State Transducers (FSTs)- the Big
Idea Need to relate lexical level, the level
that gives us the morphological analysis
(plural,able to the surface level that keeps
track of phonological/ or graphological
(spelling_ changes)
27Parsing vs recognition
- An FSA can give you the string composition of a
morphological sequence, and can tell you whether
a given morphological string is or is not in the
language. It recognizes the string - An FST parses the string. It tells you the
morphological structure associated with the
string. Other instances of parsing?
28Formal definition
- An FST defines a relation between sets of pairs
of strings - It contains at least a lexical level that is a
concatenation of morphemes - and a surface level that shows the correct
spelling for each - morpheme in a given context
- cat/sheep
s - e.g. noun (instanciated from lexicon) plural
-
E s - cats/sheep
-
-
29Q finite set of states q0 to qn ????finite
alphabet of complex symbols (feasible pairs)
io with one symbol from the input alphabet Q0
the start state F set of final states ?
(q, io) the transition function or
matrix??between states. Takes a state
from Q and a complex symbol io from
??and returns a new state. feasible pair a
relation of a symbol on one tape to a symbol
on the other tape. e.g. can
pls
30- default pair- the upper tape is the same as
the lower tape - same input as output
cat/ccaattpls - feasible pairs either stated in lexicon if
irregular - ggoeoessee goosegeese
- or by an automaton that stipulates correspondence
in rule - governed way if the relation is regular. If
regular, indicated as - Default paris and usually represented by one
symbol. - FSTs are closed under
-
inversion switches i/o labels -
composition union of two transducers -
one after the other.
31trie in lexicon, categories arranged by letter
one at a time with class at end. Allows parallel
search as long as things match e.g. metal ltNgt
meta ltrootgt metal, meta-language
32Kimmo-BasedMorphological Parsing
- Two-level morphology lexical level surface
level (Koskenniemi 83) - Finite-state transducers (FST) input-output pair
33Four-Fold View of FSTs
- As a recognizer
- As a generator
- As a translator
- As a set relater
34Terminology for Kimmo
- Upper lexical tape
- Lower surface tape
- Characters correspond to pairs, written ab
- If ab, write a for shorthand
- Two-level lexical entries
- word boundary
- morpheme boundary
- Other any feasible pair that is not in this
tranducer
35Nominal Inflection FST
36Lexical and Intermediate Tapes
37Spelling Rules
Name Rule Description Example
Consonant Doubling 1-letter consonant doubled before -ing/-ed beg/begging
E-deletion Silent e dropped before -ing and -ed make/making
E-insertion e added after s,z,x,ch,sh before s watch/watches
Y-replacement -y changes to -ie before -s, -i before -ed try/tries
K-insertion verbs ending with vowel -c add -k panic/panicked
38 Notation
x s z
__ s
e --gt e /
39Intermediate-to-Surface Transducer
40Two-Level Morphology
41Sample Run
42FSTs and ambiguity
Parse Example 1 unionizable Parse Example 2
assess
43What to do about Global Ambiguity?
- Accept first successful structure
- Run parser through all possible paths
- Bias the search in some manner
44Some Limitations
45 46(No Transcript)
47Stemming
- For some applications,dont need full
morphological analysis. - IR- dont care that e.g logician is related
to logical Just want - to know that if you are interested in articles
about logic - may want former two classes as well. So just
want to get back - to root list.
- Relate two forms by having a literal relation
rule. E.g - al---gt 0
- Is it useful in a big document may not be
necessary because the - will appear in many forms including form in
query -
48- stemming is morphologically impoverished so
error driven - - cant distinguish rules that apply at
morpheme boundaries - versus internal to root
- patronization patron ize ation
- organization organize ation
- But the stemmer will treat these as a single
class and derive - organ as an underlying root.
-
- -adverse/adversity
- universe / university
49Psycholinguistics
- Is the human lexicon efficient in the way
computational lexica - are?
- -Stanners et al (1979) where two words are
related inflection- - ally,then root stored and other forms rule
derived. Where - there is a derivational relationship, then
both forms are stored - paradigm repetition priming
- great, happy, peachy, adorable , round, short,
great -
small - Repetition priming for turns given turning
but not - select, selective
50- Marslen- Wilson et al (1994) May have priming
for - Semantically similar derivationally related
words - permit/permission
- create/creativity
- On-line versus long term storage lexicon
- Speech errors we have screw looses