Title: Natural Language Processing
1Natural Language Processing
- Lecture Notes 13
- Chapter 18
2Outline
- Reference
- Kinds of reference phenomena
- Constraints on co-reference
- Preferences for co-reference
- The Lappin-Leass algorithm for coreference
- Coherence
- Hobbs coherence relations
- Rhetorical Structure Theory
3Part I Reference Resolution
- John went to Bills car dealership to check out
an Acura Integra. He looked at it for half an
hour - Id like to get from Boston to San Francisco, on
either December 5th or December 6th. Its ok if
it stops in another city along they way
4Some terminology
- John went to Bills car dealership to check out
an Acura Integra. He looked at it for half an
hour - Reference process by which speakers use words
John and he to denote a particular person - Referring expression John, he
- Referent the actual entity (but as a shorthand
we might call John the referent). - John and he corefer
- Antecedent John
- Anaphor he
5Discourse Model
- Model of the entities the discourse is about
- A referent is first evoked into the model. Then
later it is accessed from the model
Access
Evoke
He
John
Corefer
6Many types of reference
- (after Webber 91)
- According to John, Bob bought Sue an Integra, and
Sue bought Fred a Legend - But that turned out to be a lie (a speech act)
- But that was false (proposition)
- That struck me as a funny way to describe the
situation (manner of description) - That caused Sue to become rather poor (event)
- That caused them both to become rather poor
(combination of several events)
7Reference Phenomena
- Indefinite noun phrases generally new
- I saw an Acura Integra today
- Some Acura Integras were being unloaded
- I am going to the dealership to buy an Acura
Integra today. (specific/non-specific) - I hope they still have it
- I hope they have a car I like
- Definite noun phrases identifiable to hearer
because - Mentioned I saw an Acura Integra today. The
Integra was white - Identifiable from beliefs The Indianapolis 500
- Inherently unique The fastest car in
8Reference Phenomena Pronouns
- I saw an Acura Integra today. It was white
- Compared to definite noun phrases, pronouns
require more referent salience. - John went to Bobs party, and parked next to a
beautiful Acura Integra - He got out and talked to Bob, the owner, for more
than an hour. - Bob told him that he recently got engaged and
that they are moving into a new home on Main
Street. - ??He also said that he bought it yesterday.
- He also said that he bought the Acura yesterday
9Salience Via Structural Recency
- E So, you have the engine assembly finished.
Now attach the rope. By the way, did you buy the
gas can today? - A Yes
- E Did it cost much?
- A No
- E Good. Ok, have you got it attached yet?
10More on Pronouns
- Cataphora pronoun appears before referent
- Before he bought it, John checked over the
Integra very carefully.
11Inferrables
- I almost bought an Acura Integra today, but the
engine seemed noisy. - Mix the flour, butter, and water.
- Kneed the dough until smooth and shiny
- Spread the paste over the blueberries
- Stir the batter until all lumps are gone.
12Discontinuous sets
- John has an Acura and Mary has a Suburu. They
drive them all the time.
13Generics
- I saw no less than 6 Acura Integras today. They
are the coolest cars.
14Pronominal Reference Resolution
- Given a pronoun, find the referent (either in
text or as a entity in the world) - We will approach this today in 3 steps
- Hard constraints on reference
- Soft constraints on reference
- Algorithms which use these constraints
15Why people care
- Classic "text understanding"
- Information extraction, information retrieval,
summarization
16What influences pronoun resolution?
- Syntax
- Semantics/world knowledge
17Why syntax matters
- John kicked Bill. Mary told him to go home.
- Bill was kicked by John. Mary told him to go
home. - John kicked Bill. Mary punched him.
18Why syntax matters
- John kicked Bill. Mary told him to go home.
- Bill was kicked by John. Mary told him to go
home. - John kicked Bill. Mary punched him.
John
19Why syntax matters
- John kicked Bill. Mary told him to go home.
- Bill was kicked by John. Mary told him to go
home. - John kicked Bill. Mary punched him.
Bill
20Why syntax matters
- John kicked Bill. Mary told him to go home.
- Bill was kicked by John. Mary told him to go
home. - John kicked Bill. Mary punched him.
Bill
21Why syntax matters
- John kicked Bill. Mary told him to go home.
- Bill was kicked by John. Mary told him to go
home. - John kicked Bill. Mary punched him.
Grammatical role hierarchy
22Why syntax matters
- John kicked Bill. Mary told him to go home.
- Bill was kicked by John. Mary told him to go
home. - John kicked Bill. Mary punched him.
Grammatical role parallelism
23Why semantics matters
- The city council denied the demonstrators a
permit because they fearedadvocated violence.
24Why semantics matters
- The city council denied the demonstrators a
permit because they fearedadvocated violence.
25Why semantics matters
- The city council denied the demonstrators a
permit because they fearedadvocated violence.
26Why knowledge matters
- John hit Bill. He was severely injured.
27Margaret Thatcher admires Hillary Clinton, and
George W. Bush absolutely worships her.
28Hard constraints on coreference
- Number agreement
- John has an Acura. It is red.
- Person and case agreement
- John and Mary have Acuras. We love them (where
WeJohn and Mary) - Gender agreement
- John has an Acura. He/it/she is attractive.
- Syntactic constraints
- John bought himself a new Acura (himselfJohn)
- John bought him a new Acura (him not John)
29Pronoun Interpretation Preferences
- Selectional Restrictions
- John parked his Acura in the garage. He had
driven it around for hours. - Recency
- John has an Integra. Bill has a Legend. Mary
likes to drive it.
30Pronoun Interpretation Preferences
- Grammatical Role Subject preference
- John went to the Acura dealership with Bill. He
bought an Integra. - Bill went to the Acura dealership with John. He
bought an Integra - (?) John and Bill went to the Acura dealership.
He bought an Integra
31Repeated Mention preference
- John needed a car to get to his new job. He
decided that he wanted something sporty. Bill
went to the Acura dealership with him. He bought
an Integra.
32Parallelism Preference
- Mary went with Sue to the Acura dealership.
Sally went with her to the Mazda dealership. - Mary went with Sue to the Acura dealership.
Sally told her not to buy anything.
33Verb Semantics Preferences
- John telephoned Bill. He lost the pamphlet on
Acuras. - John criticized Bill. He lost the pamphlet on
Acuras. - Implicit causality
- Implicit cause of criticizing is object.
- Implicit cause of telephoning is subject.
34Verb Preferences
- John seized the Acura pamphlet from Bill. He
loves reading about cars. - John passed the Acura pamphlet to Bill. He loves
reading about cars.
35Pronoun Resolution Algorithm
- Lappin and Leass (1994) Given he/she/it, assign
antecedent. - Implements only recency and syntactic preferences
- Two steps
- Discourse model update
- When a new noun phrase is encountered, add a
representation to discourse model with a salience
value - Modify saliences.
- Pronoun resolution
- Choose the most salient antecedent
36Salience Factors and Weights
Sentence recency 100
Subject emphasis 80
Existential emphasis 70
Accusative (direct object) emphasis 50
Ind. Obj and oblique emphasis 40
Non-adverbial emphasis 50
Head noun emphasis 80
37Recency
- Weights are cut in half after each sentence is
processed - This, and a sentence recency weight (100 for new
sentences, cut in half each time), captures the
recency preferences
38Lappin and Leass (cont)
- Grammatical role preference
- Subject gt existential predicate nominal gt object
gt indirect object gt demarcated adverbial PP - Examples
- An Acura Integra is parked in the lot (subject)
- There is an Acura Integra parked in the lot (ex.
pred nominal) - John parked an Acura Integra in the lot (object)
- John gave his Acura Integra a bath (indirect obj)
- In his Acura Integra, John showed Susan his new
CD player (demarcated adverbial PP) - Head noun emphasis factor gives above 80 points,
but followed embedded NP nothing - The owners manual for an Acura Integra is on
Johns desk
39Lappin and Leass Algorithm
- Collect the potential referents (up to 4
sentences back) - Remove potential referents that do not agree in
number or gender with the pronoun - Remove potential references that do not pass
syntactic coreference constraints - Compute total salience value of referent from all
factors, including, if applicable, role
parallelism (35) or cataphora (-175). - Select referent with highest salience value. In
case of tie, select closest.
40Example
- John saw a beautiful Acura Integra at the
dealership. He showed it to Bob. He bought it.
Sentence 1
rec Subj Exist Obj Ind-obj Non-adv Head N Total
John 100 80 50 80 310
Integra 100 50 50 80 280
dealership 100 50 80 230
41After sentence 1
Referent Phrases Value
John John 155
Integra a beautiful Acura Integra 140
dealership the dealership 115
42He showed it to Bob
- He specifies male gender
- So Step 2 reduces set of referents to only John.
- Now update discourse model
- He in current sentence (recency100), subject
position (80), not adverbial (50) not embedded
(80), so add 310
Referent Phrases Value
John John, he1 155310
Integra a beautiful Acura Integra 140
dealership the dealership 115
43He showed it to Bob
- Can be Integra or dealership.
- Need to add weights
- Parallelism it Integra are objects (dealership
is not), so 35 for integra - Integra 175 to dealership 115, so pick Integra
- Update discourse model it is nonembedded object,
gets 100505080280
44He showed it to Bob
Referent Phrases Value
John John, he1 465
Integra a beautiful Acura Integra, it1 420
dealership the dealership 115
45He showed it to Bob
- Bob is new referent, is oblique argument, weight
is 100405080270
Referent Phrases Value
John John, he1 465
Integra a beautiful Acura Integra, it1 420
Bob Bob 270
dealership the dealership 115
46He bought it
Referent Phrases Value
John John, he1 232.5
Integra a beautiful Acura Integra, it1 210
Bob Bob 135
dealership the dealership 57.5
He2 will be resolved to John, and it2 to Integra
47A search-based solution
- Hobbs 1978 Resolving pronoun references
48Hobbs 1978
- Assessment of difficulty of problem
- Incidence of the phenomenon
- A simple algorithm that has become a baseline
- See handout
49A parse tree
50Hobbss point
- the naïve approach is quite good.
Computationally speaking, it will be a long time
before a semantically based algorithm is
sophisticated enough to perform as well, and
these results set a very high standard for any
other approach to aim for.
51Hobbss point
- Yet there is every reason to pursue a
semantically based approach. The naïve algorithm
does not work. Any one can think of examples
where it fails. In these cases it not only
fails it gives no indication that it has failed
and offers no help in finding the real
antecedent. - (p. 345)
52Reference Resolution Summary
- Lots of other algorithms and other constraints
- Centering theory constraints which focus on
discourse state, and focus. (read on your own) - Hobbs ref. resolution as by-product of general
reasoning (later in these notes) - Mitkov et al. (e.g.) Machine learning
53Part II Text Coherence
54What Makes a Discourse Coherent?
- The reason is that these utterances, when
juxtaposed, will not exhibit coherence. Almost
certainly not. Do you have a discourse? Assume
that you have collected an arbitrary set of
well-formed and independently interpretable
utterances, for instance, by randomly selecting
one sentence from each of the previous chapters
of this book.
55Better?
- Assume that you have collected an arbitrary set
of well-formed and independently interpretable
utterances, for instance, by randomly selecting
one sentence from each of the previous chapters
of this book. Do you have a discourse? Almost
certainly not. The reason is that these
utterances, when juxtaposed, will not exhibit
coherence.
56Coherence
- John hid Bills car keys. He was drunk
- ??John hid Bills car keys. He likes spinach
57What makes a text coherent?
- Appropriate use of coherence relations between
subparts of the discourse -- rhetorical structure - Appropriate sequencing of subparts of the
discourse -- discourse/topic structure - Appropriate use of referring expressions
58Hobbs 1979 Coherence Relations
- Result
- Infer that the state or event asserted by S0
causes or could cause the state or event asserted
by S1. - John bought an Acura. His father went ballistic.
59Hobbs Explanation
- Infer that the state or event asserted by S1
causes or could cause the state or event asserted
by S0 - John hid Bills car keys. He was drunk
60Hobbs Parallel
- Infer p(a1,a2...) from the assertion of S0 and
p(b1,b2) from the assertion of S1, where ai and
bi are similar, for all I. - John bought an Acura. Bill leased a BMW.
61Hobbs Elaboration
- Infer the same proposition P from the assertions
of S0 and S1 - John bought an Acura this weekend. He purchased a
beautiful new Integra for 20 thousand dollars at
Bills dealership on Saturday afternoon.
62An Inference-Based Algorithm
- Abduction A ? B B infer A (unsound)
- All Jaguars are fast. Johns car is fast.
Abductively infer Johns car is a Jaguar. - Defeasible Johns car is a Porsche, though.
- When we use abduction to recognize discourse
coherence, we want the best explanation. - Probabilities, heuristics, or both (Hobbs)
63Example
64Rhetorical Structure Theory
- One theory of discourse structure, based on
identifying relations between segments of the
text - Nucleus/satellite notion encodes asymmetry
- Some rhetorical relations
- Elaboration (set/member, class/instance,
whole/part) - Contrast multinuclear
- Condition Sat presents precondition for N
- Purpose Sat presents goal of the activity in N
65Relations
- A sample definition
- Relation evidence
- Constraints on N H might not believe N as much
as S think s/he should - Constraints on Sat H already believes or will
believe Sat - An example
- The governor supports big business.
- He is sure to veto House Bill 1711.
66Automatic Rhetorical Structure Labeling
- Supervised machine learning
- Get a group of annotators to assign a set of RST
relations to a text - Extract a set of surface features from the text
that might signal the presence of the rhetorical
relations in that text - Train a supervised ML system based on the
training set
67Features
- Explicit markers because, however, therefore,
then, etc. - Tendency of certain syntactic structures to
signal certain relations Infinitives are often
used to signal purpose relations Use rm to
delete files. - Ordering
- Tense/aspect
- Intonation
68Some Problems with RST
- How many Rhetorical Relations are there?
- How can we use RST in dialogue as well as
monologue? - RST forces an artificial tree structure on
discousres - Difficult to get annotators to agree on labeling
the same texts
69Summary
- Reference
- Kinds of reference phenomena
- Constraints on co-reference
- Preferences for co-reference
- The Lappin-Leass algorithm for coreference
- Coherence
- Hobbs coherence relations
- Rhetorical Structure Theory