Fall 2005

1 / 41
About This Presentation
Title:

Fall 2005

Description:

EECS 595 / LING 541 / SI 661&761 Natural Language Processing Fall 2005 Lecture Notes #2 Course logistics Instructor: Prof. Dragomir Radev (radev_at_umich.edu) Ph.D ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 42
Provided by: clairSiU

less

Transcript and Presenter's Notes

Title: Fall 2005


1

EECS 595 / LING 541 / SI 661761
Natural Language Processing
  • Fall 2005
  • Lecture Notes 2

2
Course logistics
  • Instructor Prof. Dragomir Radev
    (radev_at_umich.edu) Ph.D., Computer Science,
    Columbia University Formerly at IBM TJ Watson
    Research Center
  • Times Thursdays 240-525 PM, in 411, West Hall
  • Office hours TBA, 3080 West Hall Connector

Course home page
http//www.si.umich.edu/radev/NLP-fall2005
3
Linguistic Fundamentals
4
Syntactic categories
  • Substitution test



black Persian tabbysmall
Nathalie likes
cats.
  • Open (lexical) and closed (functional) categories

No-fly-zone yadda yadda yadda
the in
5
Morphology
The dog chased the yellow bird.
  • Parts of speech eight (or so) general types
  • Inflection (number, person, tense)
  • Derivation (adjective-adverb, noun-verb)
  • Compounding (separate words or single word)
  • Part-of-speech tagging
  • Morphological analysis (prefix, root, suffix,
    ending)

6
Part of speech tags
From Church (1991) - 79 tags
NN / singular noun / IN / preposition
/ AT / article / NP / proper noun / JJ
/ adjective / , / comma / NNS /
plural noun / CC / conjunction / RB /
adverb / VB / un-inflected verb / VBN /
verb en (taken, looked (passive,perfect)) / VBD
/ verb ed (took, looked (past tense)) / CS
/ subordinating conjunction /
7
Jabberwocky (Lewis Carroll)
  • Twas brillig, and the slithy tovesDid gyre
    and gimble in the wabeAll mimsy were the
    borogoves,And the mome raths outgrabe."Beware
    the Jabberwock, my son!The jaws that bite, the
    claws that catch!Beware the Jubjub bird, and
    shunThe frumious Bandersnatch!"

8
Nouns
  • Nouns dog, tree, computer, idea
  • Nouns vary in number (singular, plural), gender
    (masculine, feminine, neuter), case (nominative,
    genitive, accusative, dative)
  • Latin filius (m), filia (f), filium
    (object)German Mädchen
  • Clitics (s)

9
Pronouns
  • Pronouns she, ourselves, mine
  • Pronouns vary in person, gender, number, case (in
    English nominative, accusative, possessive, 2nd
    possessive, reflexive)

Mary saw her in the mirror. Mary saw herself in
the mirror.
  • Anaphors herself, each other

10
Determiners and adjectives
  • Articles the, a
  • Demonstratives this, that
  • Adjectives describe properties
  • Attributive and predicative adjectives
  • Agreement in gender, number
  • Comparative and superlative (derivative and
    periphrastic)
  • Positive form

11
Verbs
  • Actions, activities, and states (throw, walk,
    have)
  • English four verb forms
  • tenses present, past, future
  • other inflection number, person
  • gerunds and infinitive
  • aspect progressive, perfective
  • voice active, passive
  • participles, auxiliaries
  • irregular verbs
  • French and Finnish many more inflections than
    English

12
Other parts of speech
  • Adverbs, prepositions, particles
  • phrasal verbs (the plane took off, take it off)
  • particles vs. prepositions (she ran up a
    bill/hill)
  • Coordinating conjunctions and, or, but
  • Subordinating conjunctions if, because, that,
    although
  • Interjections Ouch!

13
Phrase structure
  • Constraints on word order
  • Constituents NP, PP, VP, AP
  • Phrase structure grammars

S
NP
VP
PN
V
N
Det
N
Spot
chased
a
bird
14
Phrase structure
  • Paradigmatic relationships (e.g., constituency)
  • Syntagmatic relationships (e.g., collocations)

S
NP
VP
VBD
That
man
PP
NP
the
butterfly
IN
NP
caught
a
net
with
15
Phrase-structure grammars
Peter gave Mary a book. Mary gave Peter a book.
  • Constituent order (SVO, SOV)
  • imperative forms
  • sentences with auxiliary verbs
  • interrogative sentences
  • declarative sentences
  • start symbol and rewrite rules
  • context-free view of language

16
Sample phrase-structure grammar
S ? NP VPNP ? AT NNSNP ? AT NNNP ? NP
PPVP ? VP PP VP ? VBD VP ? VBD NP P ? IN
NP
AT ? theNNS ? children NNS ? students NNS ?
mountains VBD ? slept VBD ? ate VBD ? saw IN
? in IN ? of NN ? cake
17
Phrase structure grammars
  • Local dependencies
  • Non-local dependencies
  • Subject-verb agreement

The women who found the wallet were given a
reward.
  • wh-extraction

Should Peter buy a book? Which book should Peter
buy?
  • Empty nodes

18
Dependency arguments and adjuncts
Sue watched the man at the next table.
  • Event dependents (verb arguments are usually
    NPs)
  • agent, patient, instrument, goal - semantic roles
  • subject, direct object, indirect object
  • transitive, intransitive, and ditransitive verbs
  • active and passive voice

19
Subcategorization
  • Arguments subject complements
  • adjuncts vs. complements
  • adjuncts are optional and describe time, place,
    manner
  • subordinate clauses
  • subcategorization frames

20
Subcategorization
  • Subject The children eat candy.Object The
    children eat candy.Prepositional phrase She put
    the book on the table.Predicative adjective We
    made the man angry.Bare infinitive She helped
    me walk.To-infinitive She likes to
    walk.Participial phrase She stopped singing
    that tune at the end.That-clause She thinks
    that it will rain tomorrow.Question-form
    clauses She asked me what book I was reading.

21
Subcategorization frames
  • Intransitive verbs The woman walked
  • Transitive verbs John loves Mary
  • Ditransitive verbs Mary gave Peter flowers
  • Intransitive with PP I rent in Paddington
  • Transitive with PP She put the book on the table
  • Sentential complement I know that she likes you
  • Transitive with sentential complement She told
    me that Gary is coming on Tuesday

22
Selectional restrictions and preferences
  • Subcategorization frames capture syntactic
    regularities about complements
  • Selectional restrictions and preferences capture
    semantic regularities bark, eat

23
Phrase structure ambiguity
  • Grammars are used for generating and parsing
    sentences
  • Parses
  • Syntactic ambiguity
  • Attachment ambiguity Our company is training
    workers.
  • The children ate the cake with a spoon.
  • High vs. low attachment
  • Garden path sentences The horse raced past the
    barn fell. Is the book on the table red?

24
Ungrammaticality vs. semantic abnormality
Slept children the. Colorless green ideas
sleep furiously. The cat barked.
25
Semantics and pragmatics
  • Lexical semantics and compositional semantics
  • Hypernyms, hyponyms, antonyms, meronyms and
    holonyms (part-whole relationship, tire is a
    meronym of car), synonyms, homonyms
  • Senses of words, polysemous words
  • Homophony (bass).
  • Collocations white hair, white wine
  • Idioms to kick the bucket

26
Discourse analysis
  • Anaphoric relations

1. Mary helped Peter get out of the car. He
thanked her.2. Mary helped the other passenger
out of the car. The man had asked her for
help because of his foot injury.
  • Information extraction problems (entity
    crossreferencing)

Hurricane Hugo destroyed 20,000 Florida homes.At
an estimated cost of one billion dollars, the
disasterhas been the most costly in the states
history.
27
Pragmatics
  • The study of how knowledge about the world and
    language conventions interact with literal
    meaning.
  • Speech acts
  • Research issues resolution of anaphoric
    relations, modeling of speech acts in dialogues

28
Other areas of NLP
  • Linguistics is traditionally divided into
    phonetics, phonology, morphology, syntax,
    semantics, and pragmatics.
  • Sociolinguistics interactions of social
    organization and language.
  • Historical linguistics change over time.
  • Linguistic typology
  • Language acquisition
  • Psycholinguistics real-time production and
    perception of language

29
Word classes andpart-of-speech tagging
30
Part of speech tagging
  • Problems transport, object, discount, address
  • More problems content
  • French est, président, fils
  • Book that flight what is the part of speech
    associated with book?
  • POS tagging assigning parts of speech to words
    in a text.
  • Three main techniques rule-based tagging,
    stochastic tagging, transformation-based tagging

31
Rule-based POS tagging
  • Use dictionary or FST to find all possible parts
    of speech
  • Use disambiguation rules (e.g., ARTV)
  • Typically hundreds of constraints can be designed
    manually

32
Example in French
ltSgt
beginning of sentence La rf b nms
u article teneur nfs nms
noun feminine singular Moyenne
jfs nfs v1s v2s v3s adjective feminine
singular en p a b
preposition uranium nms
noun masculine singular des
p r preposition
rivieres nfp noun
feminine plural , x
punctuation bien_que
cs subordinating conjunction
délicate jfs
adjective feminine singular À p
preposition calculer
v verb
33
Sample rules
  • BS3 BI1 A BS3 (3rd person subject personal
    pronoun) cannot be followed by a BI1 (1st person
    indirect personal pronoun). In the example il
    nous faut'' (\it we need) - il'' has the tag
    BS3MS and nous'' has the tags BD1P BI1P BJ1P
    BR1P BS1P. The negative constraint BS3 BI1''
    rules out BI1P'', and thus leaves only 4
    alternatives for the word nous''.
  • N K The tag N (noun) cannot be followed by a tag
    K (interrogative pronoun) an example in the test
    corpus would be ... fleuve qui ...''
    (...river, that...). Since qui'' can be tagged
    both as an E'' (relative pronoun) and a K''
    (interrogative pronoun), the E'' will be chosen
    by the tagger since an interrogative pronoun
    cannot follow a noun (N'').
  • R VA word tagged with R (article) cannot be
    followed by a word tagged with V (verb) for
    example l' appelle'' (calls him/her). The word
    appelle'' can only be a verb, but l''' can be
    either an article or a personal pronoun. Thus,
    the rule will eliminate the article tag, giving
    preference to the pronoun.

34
Stochastic POS tagging
  • HMM tagger
  • Pick the most likely tag for this word
  • P(wordtag) P(tagprevious n tags) find tag
    sequence that maximizes the probability formula
  • A bigram-based HMM tagger chooses the tag ti for
    word wi that is most probable given the previous
    tag ti-1 and the current word wi
  • ti argmaxj P(tjti-1,wi)
  • ti argmaxj P(tjti-1)P(witj) HMM equation
    for a single tag

35
Example
  • Secretariat/NNP is/VBZ expected/VBN to/TO race/VB
    tomorrow/ADV
  • People/NNS continue/VBP to/TO inquire/VB the/DT
    reason/NN for/IN the/DT race/NN for/IN outer/JJ
    space/NN
  • P(VBTO)P(raceVB)
  • P(NNTO)P(raceNN)
  • TO toVB (to sleep), toNN (to school)

36
Example (contd)
  • P(NNTO) .021
  • P(VBTO) .34
  • P(raceNN) .00041
  • P(raceVB) .00003
  • P(VBTO)P(raceVB) .00001
  • P(NNTO)P(raceNN) .000007

37
HMM Tagging
  • T argmax P(TW), where Tt1,t2,,tn
  • By Bayes rule P(TW) P(T)P(WT)/P(W)
  • Thus we are attempting to choose the sequence of
    tags that maximizes the rhs of the equation
  • P(W) can be ignored
  • P(T)P(WT) ?P(wiw1t1wi-1ti-1ti)P(tiw1t1wi-1t
    i-1)

38
Transformation-based learning
  • P(NNrace) .98
  • P(VBrace) .02
  • Change NN to VB when the previous tag is TO
  • Types of rules
  • The preceding (following) word is tagged z
  • The word two before (after) is tagged z
  • One of the two preceding (following) words is
    tagged z
  • One of the three preceding (following) words is
    tagged z
  • The preceding word is tagged z and the following
    word is tagged w

39
Confusion matrix
Most confusing NN vs. NNP vs. JJ, VBD vs. VBN
vs. JJ
40
Readings
  • JM Chapters 1, 2, 3, 8
  • What is Computational Linguistics by Hans
    Uszkoreithttp//www.coli.uni-sb.de/hansu/what_is
    _cl.html
  • Lecture notes 1

41
Readings
  • JM Chapters 3, 8
  • Lecture notes 2
Write a Comment
User Comments (0)