Natural%20Language%20Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Natural%20Language%20Processing

Description:

Mouse/mice, goose/geese, ox/oxen. Go/went, fly/flew ... Inputs: cats, goose, geese. 8/18/09. 15. Adding the Words ... As we saw earlier there are geese, mice and oxen ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 37
Provided by: james1100
Category:

less

Transcript and Presenter's Notes

Title: Natural%20Language%20Processing


1
Natural Language Processing
  • Lecture Notes 3

2
Morphology (Ch 3)
  • Finite-state methods are particularly useful in
    dealing with a lexicon.
  • So well switch to talking about some facts about
    words and then come back to computational methods

3
English Morphology
  • Morphology is the study of the ways that words
    are built up from smaller meaningful units called
    morphemes
  • Two classes of morhphemes
  • Stems The core meaning bearing units
  • Affixes adhere to stems to change their meanings
    and grammatical functions

4
Examples
  • Insubstantial, trying, unreadable

5
English Morphology
  • We can also divide morphology up into two broad
    classes
  • Inflectional
  • Derivational

6
Word Classes
  • Things like nouns and verbs
  • Well go into the gory details when we cover POS
    tagging
  • Relevant now how stems and affixes combine
    depends on word class of the stem

7
Inflectional Morphology
  • Inflectional morphology concerns the combination
    of stems and affixes where the resulting word
  • Has the same word class as the original
  • Serves a grammatical purpose different from the
    original (agreement, tense)
  • bird ? birds
  • like ? likes or liked

8
Nouns and Verbs (English)
  • Nouns are simple
  • Markers for plural and possessive
  • Verbs are only slightly more complex
  • Markers appropriate to the tense of the verb

9
Regulars and Irregulars
  • Some words misbehave
  • Mouse/mice, goose/geese, ox/oxen
  • Go/went, fly/flew
  • The terms regular and irregular will be used to
    refer to words that follow the rules and those
    that dont.
  • (Different meaning than regular languages!)

10
Regular and Irregular Verbs
  • Regulars
  • Walk, walks, walking, walked, walked
  • Irregulars
  • Eat, eats, eating, ate, eaten
  • Catch, catches, catching, caught, caught
  • Cut, cuts, cutting, cut, cut

11
Regular Verbs
  • If you know a regular verb stem, you can predict
    the other forms by adding a predictable ending
    and making regular spelling changes (details in
    the chapter)
  • The regular class is productive includes new
    verbs.
  • Emailed, instant-messaged, faxed, googled,

12
Derivational Morphology
  • Quasi-systematicity
  • Irregular meaning changes
  • Healthful vs. Healthy
  • Clue?clueless (lacking understanding)
  • Art ? artless (without guile not artificial)
  • Changes of word class
  • Examples
  • Computerize (V) Computerization (N)
  • Appoint (V) Appointee (N)
  • Computation (N) Computational (Adj)
  • eatation spellation sleepable scienceless

13
Morphological Processing Requirements
  • Lexicon
  • Word repository
  • Stems and affixes (with corresponding parts of
    speech)
  • Morphotactics
  • Morpheme ordering
  • Orthographic Rules
  • Spelling changes due to affixation
  • City -s ? cities (not citys)

14
Morphotactics using FSAs
English nominal inflection Inputs cats, goose,
geese
15
Adding the Words
Expand each non-terminal into each stem in its
class reg-noun cat, dog,
Then expand each step to the letters it includes
16
Derivational Rules
17
Limitations
  • FSAs can only tell us whether a word in in the
    language or not, but what if we want to know
    more?
  • What is the stem?
  • What are the affixes, and of what sort?

18
Parsing/Generation vs. Recognition
  • Recognition is usually not quite what we need.
  • Usually if we find some string in the language we
    need to find the structure in it (parsing)
  • Or we have some structure and we want to produce
    a surface form (production/generation)
  • Examples
  • From cats to cat N PL
  • From cat N Pl to cats

19
Applications
  • The kind of parsing were talking about is
    normally called morphological analysis
  • It can either be
  • An important stand-alone component of an
    application (spelling correction, information
    retrieval)
  • Or simply a step in a processing sequence

20
Finite State Transducers
  • Basic idea
  • Add another tape
  • Add extra symbols to the transitions
  • On one tape we read cats, on the other we write
    cat N PL, or the other way around.

21
FSTs
Two-level morphology represents a word as a
correspondence between lexical (the morphemes)
and surface (the orthographic word)
levels Parsing maps surface to lexical
level Visualize a FST as a 2-tape FSA which
recognizes/generates pairs of strings
22
Transitions
  • cc means read a c on one tape and write a c on
    the other
  • Ne means read a N symbol on one tape and write
    nothing on the other
  • PLs means read PL and write an s

23
Typical Uses
  • Typically, well read from one tape using the
    first symbol
  • And write to the second tape using the other
    symbol
  • Closure properties of FSTs inversion and
    composition
  • So, they may used in reverse and they may be
    cascaded

24
Ambiguity
  • Recall that in non-deterministic recognition
    multiple paths through a machine may lead to an
    accept state.
  • Didnt matter which path was actually traversed
  • In FSTs the path to an accept state does matter
    since different paths represent different parses
    and different outputs will result

25
Ambiguity
  • Whats the right parse for
  • Unionizable
  • Union-ize-able
  • Un-ion-ize-able
  • Each represents a valid path through the
    derivational morphology machine.

26
Ambiguity
  • There are a number of ways to deal with this
    problem
  • Simply take the first output found
  • Find all the possible outputs (all paths) and
    return them all (without choosing)
  • Bias the search so that only one or a few likely
    paths are explored

27
Details
  • Of course, its not as easy as
  • cat N PL lt-gt cats
  • As we saw earlier there are geese, mice and oxen
  • But there are also spelling/pronunciation changes
    that go along with inflectional changes (e.g.,
    plural of fox is foxes, not foxs)

28
Multi-Tape Machines
  • add another tape, and use the output of one tape
    machine as the input to the next
  • To handle irregular spelling changes well add
    intermediate tapes with intermediate symbols

29
Multi-Level Tape Machines
  • We use one machine to transduce between the
    lexical and the intermediate level, and another
    to handle the spelling changes to the surface
    tape

30
Lexical to Intermediate Level
31
Intermediate to Surface
  • The add an e rule as in foxs ?? foxes

epsilon
(error in the text)
32
Succeeds iff rule applies removes all s from
output string useful if lt 2 spelling rules apply
( separates all affixes and stems another nlp
process may care about mid-word morphemes )
  • Suppose q0, q1, q2 were not final states

epsilon
(error in the text)
33
Overall Plan
34
Summing Up
  • FSTs allow us to take an input and deliver a
    structure based on it
  • Or take a structure and create a surface form
  • Or take a structure and create another structure

35
Summing Up
  • In many applications its convenient to decompose
    the problem into a set of cascaded transducers
    where
  • The output of one feeds into the input of the
    next.
  • Well see this scheme again for deeper semantic
    processing.

36
Summing Up
  • FSTs provide a useful tool for implementing a
    standard model of morphological analysis
    (two-level morphology)
  • Toolkits such as ATT FSM Toolkit available
  • Other approaches are also used, e.g., the
    rule-based Porter Stemmer, and memory-based
    learning
Write a Comment
User Comments (0)
About PowerShow.com