Title: CS3730/ISP3120 Discourse Processing and Pragmatics
1CS3730/ISP3120Discourse Processing and Pragmatics
2Outline
- Finish going over the syllabus and list of
assigned papers. - Signup for presentation dates.
- Questions about Bonnie Webbers Chapter in the
Handbook of Discourse processing? - Questions about the Intro to the Handbook of
Discourse Processing? - Information about Reference
- Information about Annotation
3Syllabus and List of Assigned Papers
- Note that the weighting for calculating final
grades has changed, to give you more credit for
presentations and reaction essays. - Instructions for the course yahoo! group were
added to the syllabus. The data cannot be posted
to the web, so information about where it is has
been posted to the group. - Note that a journal article was removed from the
list of assigned papers, and the schedule changed
accordingly - Auditors the grade you will receive is NC, for
no credit This is a change effective this term.
4List of Assigned Papers
5Signup for Presentation Dates
- Non-auditors (NAs) should sign up for two
lectures. - NAs should randomly select a number
- NAs will select a presentation date in increasing
order, and then select their second presentation
date in decreasing order
6Signup for presentation dates
- Let doubles ( NAs 2) 24
- At most doubles/2 (rounded up) paired
presentations may be chosen during the first
round. - An individual NA may be involved in at most one
paired presentation - A day may have at most 2 presenters
- The following papers much have sole presenters
Lappin Leass 94, Mann Thompson 88, Hobbs 79
7Questions on Bonnie Webbers Chapter?
- Good for pointers into the literature.
- P. 800 ellipsis, eventualities, information
structure (general idea of what they are?) - P. 802-803 understand the examples?
8Questions about the Introduction to The Handbook
of DP?
- Many fields study discourse.
- Different definitions, theoretical paradigms, and
methodologies - Interesting links are described on page 6
- Most of the topics in Part 1, Discourse Analysis
and Linguistics, have influenced work in NLP
(except historical linguistics diachronic) - Throughout, many possibilities for NLP, some in
interdisciplinary work
9Reference
- Reference
- Kinds of reference phenomena
- Constraints on co-reference
- Preferences for co-reference
From Dan Jurafskys Lecture notes. Here are his
acknowledgements Thanks to Diane Litman, Andy
Kehler, Jim Martin!!! This material is from JM
Chapter 18, written by Andy Kehler, slides
inspired by Diane Litman Jim Martin
10Reference Resolution
- John went to Bills car dealership to check out
an Acura Integra. He looked at it for half an
hour - Id like to get from Boston to San Francisco, on
either December 5th or December 6th. Its ok if
it stops in another city along they way
11Why reference resolution?
- Conversational Agents Airline reservation system
needs to know what it refers to in order to
book correct flight - Information Extraction First Union Corp. is
continuing to wrestle with severe problems
unleashed by a botched merger and a troubled
business strategy. According to industry
insiders at Paine Webber, their president, John
R. Georgius, is planning to retire by the end of
the year.
12Some terminology
- John went to Bills car dealership to check out
an Acura Integra. He looked at it for half an
hour - Reference process by which speakers use words
John and he to denote a particular person - Referring expression John, he
- Referent the actual entity (but as a shorthand
we might call John the referent). - John and he corefer
- Antecedent John
- Anaphor he
13Many types of reference
- (after Webber 91)
- According to John, Bob bought Sue an Integra, and
Sue bought Fred a Legend - But that turned out to be a lie (a speech act)
- But that was false (proposition)
- That struck me as a funny way to describe the
situation (manner of description) - That caused Sue to become rather poor (event)
- That caused them both to become rather poor
(combination of several events)
14Reference Phenomena
- Indefinite noun phrases new to hearer
- I saw an Acura Integra today
- Some Acura Integras were being unloaded
- I am going to the dealership to buy an Acura
Integra today. (specific/non-specific) - I hope they still have it
- I hope they have a car I like
- Definite noun phrases identifiable to hearer
because - Mentioned I saw an Acura Integra today. The
Integra was white - Identifiable from beliefs The Indianapolis 500
- Inherently unique The fastest car in
15Indefinites (an aside)
- Lots of complexities an e.g. of one type
- The king and his men dont know Merry and Pippin,
and they cant even see what they are
(superordinate term figures versus basic level
term hobbits) -
There they the King and his men saw close
beside them a great rubbleheap and suddenly they
were aware of two small figures lying on it at
their ease, grey-clad, hardly to be seen among
the stones. The Two Towers, Tolkein
16Reference Phenomena Pronouns
- I saw an Acura Integra today. It was white
- Compared to definite noun phrases, pronouns
require more referent salience. - John went to Bobs party, and parked next to a
beautiful Acura Integra - He went inside and talked to Bob for more than an
hour. - Bob told him that he recently got engaged.
- ??He also said that he bought it yesterday.
- OK He also said that he bought the Acura yesterday
17More on Pronouns
- Cataphora pronoun appears before referent
- Before he bought it, John checked over the
Integra very carefully.
18Inferrables
- I almost bought an Acura Integra today, but the
engine seemed noisy. - Mix the flour, butter, and water.
- Kneed the dough until smooth and shiny
- Spread the paste over the blueberries
- Stir the batter until all lumps are gone.
19Generics
- I saw no less than 6 Acura Integras today. They
are the coolest cars.
20Pronominal Reference Resolution
- Given a pronoun, find the reference (either in
text or as a entity in the world) - We will look at constraints. The first student
presentations will look at resolution algorithms. - Hard constraints on reference
- Soft constraints on reference
21Hard constraints on coreference
- Number agreement
- John has an Acura. It is red.
- Person and case agreement
- John and Mary have Acuras. We love them (where
WeJohn and Mary) - Gender agreement
- John has an Acura. He/it/she is attractive.
- Syntactic constraints
- John bought himself a new Acura (himselfJohn)
- John bought him a new Acura (him not John)
22Pronoun Interpretation Preferences
- Selectional Restrictions
- John parked his Acura in the garage. He had
driven it around for hours. - Recency
- John has an Integra. Bill has a Legend. Mary
likes to drive it.
23Pronoun Interpretation Preferences
- Grammatical Role Subject preference
- John went to the Acura dealership with Bill. He
bought an Integra. - Bill went to the Acura dealership with John. He
bought an Integra - (?) John and Bill went to the Acura dealership.
He bought an Integra
24Repeated Mention preference
- John needed a car to get to his new job. He
decided that he wanted something sporty. Bill
went to the Acura dealership with him. He bought
an Integra.
25Parallelism Preference
- Mary went with Sue to the Acura dealership.
Sally went with her to the Mazda dealership. - Mary went with Sue to the Acura dealership.
Sally told her not to buy anything.
26Verb Semantics Preferences
- John telephoned Bill. He lost the pamphlet on
Acuras. - John criticized Bill. He lost the pamphlet on
Acuras. - Implicit causality
- Implicit cause of criticizing is object.
- Implicit cause of telephoning is subject.
27Manual Annotation
28From Webbers Chapter
- The aims of computational work in discourse and
dialog - Modeling particular phenomena in discourse and
dialog in terms of underlying computational
processes - Providing useful natural language services, whose
success depends in part on handling aspects of
discourse and dialog - What computation contributes is a coherent
framework for modeling these phenomena in terms
of search through a space of possible candidate
interpretations (in language analysis) or
candidate realizations (in language generation)
29Desiderata
- Interesting and rich enough
- Not so rich that automation is too far ahead of
the current state of the art - Too complex logical structure
- Knowledge bottleneck (without viable source)
- Too fine-grained or subtle
- Annotation instructions (aka coding manual)
feasible - Time required for training is reasonable
- Annotators can reliably perform the annotations
in a reasonable amount of time
30Minimal Process for NLP
- Develop initial coding manual
- At least two people perform sample annotations,
and discuss their disagreements and experiences - Revise coding manual
- Repeat 2-3 until agreement on training data is
sufficient - Independently annotate a fresh test set
- Evaluate agreement
31Additional Steps
- Develop initial coding manual
- At least two people perform sample annotations,
and discuss their disagreements and experiences.
Analysis of patterns of agreement and
disagreement using probability models (Wiebe et
al. ACL-99 Bruce Wiebe NLE-99 from work in
applied statistics) - Revise coding manual
- Repeat 2-3 until agreement on training data is
sufficient - Independently annotate a fresh test set
- Evaluate agreement
- Train more annotators, assess average time for
training and annotation - Evaluate other types of reliability (psychology,
content analysis, applied statistics literatures)
32Measures of Agreement
- Percentage Agreement OK, but not sufficient
- If the distribution of classes is highly skewed,
then the baseline algorithm of always assigning
the most frequent class would have high agreement - Kappa measures agreement over and above
agreement expected by chance - Details available in section 3 of this paper by
our group