CMSC 723 / LING 645: Intro to Computational Linguistics - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

CMSC 723 / LING 645: Intro to Computational Linguistics

Description:

Mouse/Mice, Ox, Oxen, Goose, Geese. Verbs. Walk/Walked. Go/Went, Fly ... geese goose N PL. ducks (duck N PL) or (duck V 3SG) merging merge V PRES-PART ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 52
Provided by: ericgu6
Category:

less

Transcript and Presenter's Notes

Title: CMSC 723 / LING 645: Intro to Computational Linguistics


1
CMSC 723 / LING 645 Intro to Computational
Linguistics
September 15, 2004 Dorr More about FSAs,
Finite State Morphology (JM 3) Prof. Bonnie J.
DorrDr. Christof MonzTA Adam Lee
2
More about FSAs
  • Transducers
  • Equivalence of DFSAs and NFSAs
  • Recognition as search depth-first,
    breadth-search

3
Recognition using NFSAs
4
NFSA Recognition of baaa!
5
Breadth-first Recognition of baaa!
6
Regular languages
  • Regular languages are characterized by FSAs
  • For every NFSA, there is an equivalent DFSA.
  • Regular languages are closed under concatenation,
    Kleene closure, union.

7
Concatenation
8
Kleene Closure
9
Union
10
Morphology
  • Definitions and Problems
  • What is Morphology?
  • Topology of Morphologies
  • Approaches to Computational Morphology
  • Lexicons and Rules
  • Computational Morphology Approaches

11
Morphology
  • The study of the way words are built up from
    smaller meaning units called Morphemes
  • Abstract versus Realized HOP PAST ? hop ed ?
    hopped ? /hapt/

Syntax Lexeme/Inflected Lexeme Grammars sentences
Morphology Morpheme/Allomorph Morphotactics words
Phonology Phoneme/Allophone Phonotactics letters
12
Phonology and Morphology
  • Phonology vs. Orthography
  • Historical spelling
  • night, nite
  • attention, mission, fish
  • Script Limitations
  • Spoken English has 14 vowels
  • heed hid hayed head had hoed hood whod hide
    howd taught Tut toy enough
  • English Alphabet has 5
  • Use vowel combinatios far fair fare
  • Consonantal doubling (hopping vs. hoping)

13
Syntax and Morphology
  • Phrase-level agreement
  • Subject-Verb
  • John studies hard (STUDY3SG)
  • Noun-Adjective
  • Las vacas hermosas
  • Sub-word phrasal structures
  • ????????
  • ?????????
  • ThatinbookPLPoss1PL
  • Which are in our books

14
Topology of Morphologies
  • Concatenative vs. Templatic
  • Derivational vs. Inflectional
  • Regular vs. Irregular

15
Concatenative Morphology
  • MorphemeMorphemeMorpheme
  • Stems also called lemma, base form, root, lexeme
  • hopeing ? hoping hop ? hopping
  • Affixes
  • Prefixes Antidisestablishmentarianism
  • Suffixes Antidisestablishmentarianism
  • Infixes hingi (borrow) humingi (borrower) in
    Tagalog
  • Circumfixes sagen (say) gesagt (said) in
    German
  • Agglutinative Languages
  • uygarlastiramadiklarimizdanmissinizcasina
  • uygarlastiramadiklarimizdanmissinizcasin
    a
  • Behaving as if you are among those whom we could
    not cause to become civilized

16
Templatic Morphology
  • Roots and Patterns

?
?
?
?
?
?
K T B
?
?
??
?
?
?
?
?
?
????
?????
maktuubwritten
ktuuvwritten
17
Templatic Morphology Root Meaning
  • KTB writing stuff

???? book
write
???
???
???? spelling
????? library
letter
????
?????
????? address
???? office
writer
???
????
18
Derivational vs. Inflectional
  • Word Classes
  • Parts of speech noun, verb, adjectives, etc.
  • Word class dictates how a word combines with
    morphemes to form new words

19
Derivational morphology
  • Nominalization computerization, appointee,
    killer, fuzziness
  • Formation of adjectives computational, clueless,
    embraceable
  • CatVar Categorial Variation Database
  • http//clipdemos.umiacs.umd.edu/catvar/

20
Inflectional morphology
  • Adds Tense, number, person, mood, aspect
  • Word class doesnt change
  • Word serves new grammatical role
  • Five verb forms in English
  • Other languages have (lots more)

21
Nouns and Verbs (in English)
  • Nouns have simple inflectional morphology
  • cat
  • cats, cats
  • Verbs have more complex morphology

22
Regulars and Irregulars
  • Nouns
  • Cat/Cats
  • Mouse/Mice, Ox, Oxen, Goose, Geese
  • Verbs
  • Walk/Walked
  • Go/Went, Fly/Flew

23
Regular (English) Verbs
Morphological Form Classes Regularly Inflected Verbs Regularly Inflected Verbs Regularly Inflected Verbs Regularly Inflected Verbs
Stem walk merge try map
-s form walks merges tries maps
-ing form walking merging trying mapping
Past form or ed participle walked merged tried mapped
24
Irregular (English) Verbs
Morphological Form Classes Irregularly Inflected Verbs Irregularly Inflected Verbs Irregularly Inflected Verbs
Stem eat catch cut
-s form eats catches cuts
-ing form eating catching cutting
Past form ate caught cut
-ed participle eaten caught cut
25
To love in Spanish
26
Computational Morphology
  • Finite State Morphology
  • Finite State Transducers (FST)
  • Input/Output
  • Analysis/Generation

27
Computational Morphology
  • WORD STEM (FEATURES)
  • cats cat N PL
  • cat cat N SG
  • cities city N PL
  • geese goose N PL
  • ducks (duck N PL) or
  • (duck V 3SG)
  • merging merge V PRES-PART
  • caught (catch V PAST-PART) or
  • (catch V PAST)

28
Building a Morphological Parser
  • The Rules and the Lexicon
  • General versus Specific
  • Regular versus Irregular
  • Accuracy, speed, space
  • The Morphology of a language
  • Approaches
  • Lexicon only
  • Lexicon and Rules
  • Finite-state Automata
  • Finite-state Transducers
  • Rules only

29
Lexicon-only Morphology
  • The lexicon lists all surface level and lexical
    level pairs
  • No rules ?
  • Analysis/Generation is easy
  • Very large for English
  • What about Arabic or Turkish?
  • Chinese?

acclaim acclaim N acclaim
acclaim V0 acclaimed acclaim
Ved acclaimed acclaim Ven acclaiming
acclaim Ving acclaims acclaim
Ns acclaims acclaim Vs acclamation
acclamation N acclamations acclamation
Ns acclimate acclimate
V0 acclimated acclimate
Ved acclimated acclimate
Ven acclimates acclimate
Vs acclimating acclimate Ving
30
Building a Morphological Parser
  • The Rules and the Lexicon
  • General versus Specific
  • Regular versus Irregular
  • Accuracy, speed, space
  • The Morphology of a language
  • Approaches
  • Lexicon only
  • Lexicon and Rules
  • Finite-state Automata
  • Finite-state Transducers
  • Rules only

31
Lexicon and RulesFSA Inflectional Noun
Morphology
  • English Noun Lexicon

reg-noun Irreg-pl-noun Irreg-sg-noun plural
fox cat dog geese sheep mice goose sheep mouse -s
  • English Noun Rule

32
Lexicon and Rules FSA English Verb Inflectional
Morphology
reg-verb-stem irreg-verb-stem irreg-past-verb past past-part pres-part 3sg
walkfrytalkimpeach cutspeakspokensing sang caughtateeaten -ed -ed -ing -s
33
FSA for Derivational Morphology Adjectival
Formation
34
More Complex Derivational Morphology
35
Using FSAs for Recognition English Nouns and
their Inflection
36
Morphological Parsing
  • Finite-state automata (FSA)
  • Recognizer
  • One-level morphology
  • Finite-state transducers (FST)
  • Two-level morphology
  • PC-Kimmo (Koskenniemi 83)
  • input-output pair

37
Terminology for PC-Kimmo
  • Upper lexical tape
  • Lower surface tape
  • Characters correspond to pairs, written ab
  • If aa, write a for shorthand
  • Two-level lexical entries
  • word boundary
  • morpheme boundary
  • Other any feasible pair that is not in this
    transducer
  • Final states indicated with and non-final
    states indicated with .

38
Four-Fold View of FSTs
  • As a recognizer
  • As a generator
  • As a translator
  • As a set relater

39
Nominal Inflection FST
40
Lexical and Intermediate Tapes
41
Spelling Rules
Name Rule Description Example
Consonant Doubling 1-letter consonant doubled before -ing/-ed beg/begging
E-deletion Silent e dropped before -ing and -ed make/making
E-insertion e added after s,z,x,ch,sh before s watch/watches
Y-replacement -y changes to -ie before -s, -i before -ed try/tries
K-insertion verbs ending with vowel -c add -k panic/panicked
42
Chomsky and Halle Notation
x s z
__ s
e ? e /
43
Intermediate-to-Surface Transducer
44
State Transition Table
45
Two-Level Morphology
46
Sample Run
KIMMO DEMO
47
FSTs and ambiguity
  • Parse Example 1 unionizable
  • union ize able
  • un ion ize able
  • Parse Example 2 assess
  • assessv
  • assN essN
  • Parse Example 3 tender
  • tenderAJ
  • tenNumdAJerCMP

48
What to do about Global Ambiguity?
  • Accept first successful structure
  • Run parser through all possible paths
  • Bias the search in some manner

49
Computational Morphology
  • The Rules and the Lexicon
  • General versus Specific
  • Regular versus Irregular
  • Accuracy, speed, space
  • The Morphology of a language
  • Approaches
  • Lexicon only
  • Lexicon and Rules
  • Finite-state Automata
  • Finite-state Transducers
  • Rules only

50
Computational Morphology
  • The Rules and the Lexicon
  • General versus Specific
  • Regular versus Irregular
  • Accuracy, speed, space
  • The Morphology of a language
  • Approaches
  • Lexicon only
  • Lexicon and Rules
  • Finite-state Automata
  • Finite-state Transducers
  • Rules only (next time!!)

51
Readings for next time
  • JM Chapter 6
Write a Comment
User Comments (0)
About PowerShow.com