Title: CMSC 723 / LING 645: Intro to Computational Linguistics
1CMSC 723 / LING 645 Intro to Computational
Linguistics
September 15, 2004 Dorr More about FSAs,
Finite State Morphology (JM 3) Prof. Bonnie J.
DorrDr. Christof MonzTA Adam Lee
2More about FSAs
- Transducers
- Equivalence of DFSAs and NFSAs
- Recognition as search depth-first,
breadth-search
3Recognition using NFSAs
4NFSA Recognition of baaa!
5Breadth-first Recognition of baaa!
6Regular languages
- Regular languages are characterized by FSAs
- For every NFSA, there is an equivalent DFSA.
- Regular languages are closed under concatenation,
Kleene closure, union.
7Concatenation
8Kleene Closure
9Union
10Morphology
- Definitions and Problems
- What is Morphology?
- Topology of Morphologies
- Approaches to Computational Morphology
- Lexicons and Rules
- Computational Morphology Approaches
11Morphology
- The study of the way words are built up from
smaller meaning units called Morphemes
- Abstract versus Realized HOP PAST ? hop ed ?
hopped ? /hapt/
Syntax Lexeme/Inflected Lexeme Grammars sentences
Morphology Morpheme/Allomorph Morphotactics words
Phonology Phoneme/Allophone Phonotactics letters
12Phonology and Morphology
- Phonology vs. Orthography
- Historical spelling
- night, nite
- attention, mission, fish
- Script Limitations
- Spoken English has 14 vowels
- heed hid hayed head had hoed hood whod hide
howd taught Tut toy enough - English Alphabet has 5
- Use vowel combinatios far fair fare
- Consonantal doubling (hopping vs. hoping)
13Syntax and Morphology
- Phrase-level agreement
- Subject-Verb
- John studies hard (STUDY3SG)
- Noun-Adjective
- Las vacas hermosas
- Sub-word phrasal structures
- ????????
- ?????????
- ThatinbookPLPoss1PL
- Which are in our books
14Topology of Morphologies
- Concatenative vs. Templatic
- Derivational vs. Inflectional
- Regular vs. Irregular
15Concatenative Morphology
- MorphemeMorphemeMorpheme
- Stems also called lemma, base form, root, lexeme
- hopeing ? hoping hop ? hopping
- Affixes
- Prefixes Antidisestablishmentarianism
- Suffixes Antidisestablishmentarianism
- Infixes hingi (borrow) humingi (borrower) in
Tagalog - Circumfixes sagen (say) gesagt (said) in
German - Agglutinative Languages
- uygarlastiramadiklarimizdanmissinizcasina
- uygarlastiramadiklarimizdanmissinizcasin
a - Behaving as if you are among those whom we could
not cause to become civilized
16Templatic Morphology
?
?
?
?
?
?
K T B
?
?
??
?
?
?
?
?
?
????
?????
maktuubwritten
ktuuvwritten
17Templatic Morphology Root Meaning
???? book
write
???
???
???? spelling
????? library
letter
????
?????
????? address
???? office
writer
???
????
18Derivational vs. Inflectional
- Word Classes
- Parts of speech noun, verb, adjectives, etc.
- Word class dictates how a word combines with
morphemes to form new words
19Derivational morphology
- Nominalization computerization, appointee,
killer, fuzziness - Formation of adjectives computational, clueless,
embraceable - CatVar Categorial Variation Database
- http//clipdemos.umiacs.umd.edu/catvar/
20Inflectional morphology
- Adds Tense, number, person, mood, aspect
- Word class doesnt change
- Word serves new grammatical role
- Five verb forms in English
- Other languages have (lots more)
21Nouns and Verbs (in English)
- Nouns have simple inflectional morphology
- cat
- cats, cats
- Verbs have more complex morphology
22Regulars and Irregulars
- Nouns
- Cat/Cats
- Mouse/Mice, Ox, Oxen, Goose, Geese
- Verbs
- Walk/Walked
- Go/Went, Fly/Flew
23Regular (English) Verbs
Morphological Form Classes Regularly Inflected Verbs Regularly Inflected Verbs Regularly Inflected Verbs Regularly Inflected Verbs
Stem walk merge try map
-s form walks merges tries maps
-ing form walking merging trying mapping
Past form or ed participle walked merged tried mapped
24Irregular (English) Verbs
Morphological Form Classes Irregularly Inflected Verbs Irregularly Inflected Verbs Irregularly Inflected Verbs
Stem eat catch cut
-s form eats catches cuts
-ing form eating catching cutting
Past form ate caught cut
-ed participle eaten caught cut
25To love in Spanish
26Computational Morphology
- Finite State Morphology
- Finite State Transducers (FST)
- Input/Output
- Analysis/Generation
27Computational Morphology
- WORD STEM (FEATURES)
- cats cat N PL
- cat cat N SG
- cities city N PL
- geese goose N PL
- ducks (duck N PL) or
- (duck V 3SG)
- merging merge V PRES-PART
- caught (catch V PAST-PART) or
- (catch V PAST)
28Building a Morphological Parser
- The Rules and the Lexicon
- General versus Specific
- Regular versus Irregular
- Accuracy, speed, space
- The Morphology of a language
- Approaches
- Lexicon only
- Lexicon and Rules
- Finite-state Automata
- Finite-state Transducers
- Rules only
29Lexicon-only Morphology
- The lexicon lists all surface level and lexical
level pairs - No rules ?
- Analysis/Generation is easy
- Very large for English
- What about Arabic or Turkish?
- Chinese?
acclaim acclaim N acclaim
acclaim V0 acclaimed acclaim
Ved acclaimed acclaim Ven acclaiming
acclaim Ving acclaims acclaim
Ns acclaims acclaim Vs acclamation
acclamation N acclamations acclamation
Ns acclimate acclimate
V0 acclimated acclimate
Ved acclimated acclimate
Ven acclimates acclimate
Vs acclimating acclimate Ving
30Building a Morphological Parser
- The Rules and the Lexicon
- General versus Specific
- Regular versus Irregular
- Accuracy, speed, space
- The Morphology of a language
- Approaches
- Lexicon only
- Lexicon and Rules
- Finite-state Automata
- Finite-state Transducers
- Rules only
31Lexicon and RulesFSA Inflectional Noun
Morphology
reg-noun Irreg-pl-noun Irreg-sg-noun plural
fox cat dog geese sheep mice goose sheep mouse -s
32Lexicon and Rules FSA English Verb Inflectional
Morphology
reg-verb-stem irreg-verb-stem irreg-past-verb past past-part pres-part 3sg
walkfrytalkimpeach cutspeakspokensing sang caughtateeaten -ed -ed -ing -s
33FSA for Derivational Morphology Adjectival
Formation
34More Complex Derivational Morphology
35Using FSAs for Recognition English Nouns and
their Inflection
36Morphological Parsing
- Finite-state automata (FSA)
- Recognizer
- One-level morphology
- Finite-state transducers (FST)
- Two-level morphology
- PC-Kimmo (Koskenniemi 83)
- input-output pair
37Terminology for PC-Kimmo
- Upper lexical tape
- Lower surface tape
- Characters correspond to pairs, written ab
- If aa, write a for shorthand
- Two-level lexical entries
- word boundary
- morpheme boundary
- Other any feasible pair that is not in this
transducer - Final states indicated with and non-final
states indicated with .
38Four-Fold View of FSTs
- As a recognizer
- As a generator
- As a translator
- As a set relater
39Nominal Inflection FST
40Lexical and Intermediate Tapes
41Spelling Rules
Name Rule Description Example
Consonant Doubling 1-letter consonant doubled before -ing/-ed beg/begging
E-deletion Silent e dropped before -ing and -ed make/making
E-insertion e added after s,z,x,ch,sh before s watch/watches
Y-replacement -y changes to -ie before -s, -i before -ed try/tries
K-insertion verbs ending with vowel -c add -k panic/panicked
42Chomsky and Halle Notation
x s z
__ s
e ? e /
43Intermediate-to-Surface Transducer
44State Transition Table
45Two-Level Morphology
46Sample Run
KIMMO DEMO
47FSTs and ambiguity
- Parse Example 1 unionizable
- union ize able
- un ion ize able
- Parse Example 2 assess
- assessv
- assN essN
- Parse Example 3 tender
- tenderAJ
- tenNumdAJerCMP
48What to do about Global Ambiguity?
- Accept first successful structure
- Run parser through all possible paths
- Bias the search in some manner
49Computational Morphology
- The Rules and the Lexicon
- General versus Specific
- Regular versus Irregular
- Accuracy, speed, space
- The Morphology of a language
- Approaches
- Lexicon only
- Lexicon and Rules
- Finite-state Automata
- Finite-state Transducers
- Rules only
50Computational Morphology
- The Rules and the Lexicon
- General versus Specific
- Regular versus Irregular
- Accuracy, speed, space
- The Morphology of a language
- Approaches
- Lexicon only
- Lexicon and Rules
- Finite-state Automata
- Finite-state Transducers
- Rules only (next time!!)
51Readings for next time