SIGIR, August 2005, Salvador, Brazil On the Collective Classification of Email presentation

About This Presentation

Transcript and Presenter's Notes

Title: SIGIR, August 2005, Salvador, Brazil On the Collective Classification of Email

1
SIGIR, August 2005, Salvador, BrazilOn the
Collective Classification of Email Speech Acts
Vitor R. Carvalho William W. Cohen Carnegie
Mellon University
2
Outline

Email Speech Acts and Applications
Sequential Nature of Negotiations
Collective Classification and Results

3
Classifying Email into Acts Cohen, Carvalho
Mitchell, EMNLP-04

An Act is a verb-noun pair (e.g., propose
meeting)
One single email message may contain multiple
acts. Not all pairs make sense.
Try to describe commonly observed behaviors,
rather than all possible speech acts.
Also include non-linguistic usage of email
(delivery of files)
Most of the acts can be learned (EMNLP-04)

Verbs
Nouns
4
Email Acts - Applications

Email overload improved email clients.
Negotiating/managing shared tasks is a central
use of email
Tracking commitments, delegations, pending
answers
integrating to-do/task lists to email, etc.
Iterative Learning of Email Tasks and Speech Acts
Kushmerick Khoussainov, 2005
Predicting Social Roles and Group Leadership.
Leuski, 2004Carvalho et al., in progress

5
Idea Predicting Acts from Surrounding Acts
Example of Email Thread Sequence

Strong correlation with previous and next
messages acts

Delivery
Request
Request
Proposal
Delivery
Commit
Commit
Delivery

Act has little or no correlation with other acts
of same message

ltltIn-ReplyTogtgt
Commit
6
Related work on the Sequential Nature of
Negotiations

Winograd and Flores,1986 Conversation for
Action Structure
Murakoshi et al., 1999 Construction of
Deliberation Structure in Email

7
Related work on the Sequential Nature of
Negotiations

Kushmerick Lau, 2005 Learning the structure
of interactions between buyers and e-commerce
vendors

8
Data CSPACE Corpus

Few large, free, natural email corpora are
available
CSPACE corpus (Kraut Fussell)
Emails associated with a semester-long project
for Carnegie Mellon MBA students in 1997
15,000 messages from 277 students, divided in 50
teams (4 to 6 students/team)
Rich in task negotiation.
1500 messages (4 teams) had their Speech Acts
labeled.
One of the teams was double labeled, and the
inter-annotator agreement ranges from 72 to 83
(Kappa) for the most frequent acts.

9
Evidence of Sequential Correlation of Acts

Transition diagram for most common verbs from
CSPACE corpus
It is NOT a Probabilistic DFA
Act sequence patterns (Request, Deliver),
(Propose, Commit, Deliver), (Propose,
Deliver), most common act was Deliver
Less regularity than the expected (considering
previous deterministic negotiation state diagrams)

10
Content versus Context

Content Bag of Words features only
Context Parent and Child Features only ( table
below)
8 MaxEnt classifiers, trained on 3F2 and tested
on 1F3 team dataset
Only 1st child message was considered (vast
majority more than 95)

Request
Request
Proposal
???
Delivery
Commit
Parent message
Child message
Parent Boolean Features Child Boolean Features
Parent_Request, Parent_Deliver, Parent_Commit, Parent_Propose, Parent_Directive, Parent_Commissive Parent_Meeting, Parent_dData Child_Request, Child_Deliver, Child_Commit, Child_Propose, Child_Directive, Child_Commissive, Child_Meeting, Child_dData
Kappa Values on 1F3 using Relational (Context)
features and Textual (Content) features.
Set of Context Features (Relational)
11
Dependency Network

Dependency networks are probabilistic graphical
models in which the full joint distribution of
the network is approximated with a set of
conditional distributions that can be learned
independently. The conditional probability
distributions in a DN are calculated for each
node given its neighboring nodes (its Markov
blanket).

Approx inference (Gibbs sampling)
Markov blanket parent message and child
message
Heckerman et al., JMLR-2000. Neville Jensen,
KDD-MRDM-2003.

12
Collective Classification Procedure (based on
Dependency Networks Model)
13
Improvement over Content-only baseline
Kappa often improves after iteration
Kappa unchanged for deliver
14
Leave-one-team-out Experiments
Kappa Values

4 teams
1f3(170 msgs)
2f2(137 msgs)
3f2(249 msgs)
4f4(165 msgs)
(x axis) Bag-of-words only
(y-axis) Collective classification results
Different teams present different styles for
negotiations and task delegation.

15
Leave-one-team-out Experiments
Kappa Values

Consistent improvement of Commissive, Commit and
Meet acts

16
Leave-one-team-out Experiments

Deliver and dData performance usually decreases
Associated with data distribution, FYI, file
sharing, etc.
For non-delivery, improvement in avg. Kappa is
statistically significant (p0.01 on a two-tailed
T-test)

Kappa Values
17
Act by Act Comparative Results
Kappa values with and without collective
classification, averaged over the four test sets
in the leave-one-team out experiment.
18
Conclusion

Sequential patterns of email acts were studied in
the CSPACE corpus. Less regularity than expected.
We proposed a collective classification procedure
for Email Speech Acts based on a Dependency Net
model.
Modest improvements over the baseline on acts
related to negotiation (Request, Commit, Propose,
Meet, etc) . No improvement/deterioration was
observed for Deliver/dData (acts less associated
with negotiations)
Degree of linkage in our dataset is small which
makes the observed results encouraging.

19
Thank you!
20
Thank you!
21
Inter-Annotator Agreement

Kappa Statistic
A probability of agreement in a category
R prob. of agreement for 2 annotators labeling
at random
Kappa range -11

Inter-Annotator Agreement Inter-Annotator Agreement
Email Act Kappa
Deliver 0.75
Commit 0.72
Request 0.81
Amend 0.83
Propose 0.72

Write a Comment

User Comments (0)

About PowerShow.com

SIGIR, August 2005, Salvador, Brazil On the Collective Classification of Email PowerPoint PPT Presentation