Title: Learning to Classify Email into
1Learning to Classify Email into Speech Acts
- William W. Cohen, Vitor R. Carvalho and Tom M.
Mitchell - Presented by Vitor R. Carvalho
- IR Discussion Series - August 12th 2004 - CMU
2Imagine an hypothetical email assistant that can
detect speech acts
1
urgent Request - may take action - request
pending
Do you have any data with xml-tagged names? I
need it ASAP!
Urgent Request - May take action
2
A Commitment is detected. Should I send Vitor a
reminder on Sunday?
Sure. Ill put it together by Sunday.
should I add this Commitment to your to-do
list?
3
Heres the tar ball on afs vitor/names.tar.gz
A Delivery of data is detected. - pending
cancelled
Delivery is sent - to-do list updated
3Outline
- Setting the base
- Email speech act Taxonomy
- Data
- Inter-annotator agreement
- Results
- Learnability of email acts
- Different learning algorithms, acts, etc
- Different representations
- Improvements
- Collective/Relational/Iterative classification
4Related Work
- Email classification for
- topic/folder identification
- spam/non-spam
- Speech-act classification in conversational
speech - email is new domain - multiple acts/msg
- Winograds Coordinator (1987) users manually
annotated email with intent. - Extra work for (lazy) users
- Murakoshi et al (1999) hand-coded rules for
identifying speech-act like labels in Japanese
emails
5Email Acts Taxonomy
From Benjamin Han To Vitor Carvalho Subject
LTI Student Research Symposium Hey Vitor When
exactly is the LTI SRS submission deadline? Also,
dont forget to ask Eric about the SRS
webpage. See you Ben
- Single email message may contain multiple acts
- An Act is described as a verb-noun pair (e.g.,
propose meeting, request information) - Not all
pairs make sense - Try to describe commonly observed behaviors,
rather than all possible speech acts in English - Also include non-linguistic usage of email (e.g.
delivery of files)
Request - Information Reminder -
action/task
6A Taxonomy of Email Acts
Verb
Negotiate
Other
Remind
Greet
Conclude
Initiate
Amend
Propose
Request
Deliver
Refuse
Commit
7A Taxonomy of Email Acts
Noun
Activity
Information
Opinion
Ongoing Activity
Data
Single Event
Meeting Logistics Data
Other Data
Committee
Meeting
Other Short Term Task
ltVerbgtltNoungt
8A Taxonomy of Email Acts
Noun
Future work integration with task-oriented email
clustering
Activity
Information
Opinion
Ongoing Activity
Data
Single Event
Meeting Logistics Data
Other Data
Committee
Meeting
Other Short Term Task
Only will consider predicting top-level tasks,
not recursive structure
ltVerbgtltNoungt
9Corpora
- 4 different datasets
- from CSpace (Management game at GSIA)
- N01F3 (351 email messages)
- N02F2 (341 email messages)
- N03F2 (443 email messages) -
- from Project World CALO (simulation game at SRI)
- Pw_calo (222 email messages)
- 4 to 6 participants in each group
- N03F2 was manually labeled by 2 different
annotators (whats the agreement?)
10Corpora
- Few large, natural email corpora are available
- CSPACE corpus (Kraut Fussell)
- Email associated with a semester-long project for
GSIA MBA students in 1997 - 15,000 messages from 277 students in 50 teams (4
to 6/team) - Rich in task negotiation
- N02F2, N01F3, N03F2 all messages from students
in three teams (341, 351, 443 messages). - SRIs Project World CALO corpus
- 6 people in artificial task scenario over four
days - 222 messages (publically available)
Double-labeled
11Inter-Annotator Agreement
- Kappa Statistic
- A probability of agreement in a category
- R prob. of agreement for 2 annotators labeling
at random - Kappa range -11
Inter-Annotator Agreement Inter-Annotator Agreement
Email Act Kappa
Deliver 0.75
Commit 0.72
Request 0.81
Amend 0.83
Propose 0.72
12Inter-Annotator Agreement for messages with only
one single verb
13Learnability of Email ActsFeatures un-weighted
word frequency counts (BOW)5-fold
cross-validation(Directive Req or Prop or Amd)
14Using Different Learners
(Directive Act Req or Prop or Amd)
15Learning Requests only
16Learning Commissives
(Commissive Act Delivery or Commitment)
17Learning Deliveries only
18Learning to recognize Commitments
19Overview on Entire Corpus
Act Voted Perceptron AdaBoost SVM Decision Trees
Request Error 0.25 0.22 0.23 0.2
(450/907) F1 0.58 0.65 0.64 0.69
Propose Error 0.11 0.12 0.12 0.1
(140/1217) F1 0.19 0.26 0.44 0.13
Deliver Error 0.26 0.28 0.27 0.3
(873/484) F1 0.8 0.78 0.78 0.76
Commit Error 0.15 0.14 0.17 0.15
(208/1149) F1 0.21 0.44 0.47 0.11
Directive Error 0.25 0.23 0.23 0.19
(605/752) F1 0.72 0.73 0.73 0.78
Commissive Error 0.23 0.23 0.24 0.22
(993/364) F1 0.84 0.84 0.83 0.85
20Multi-class learning algor. X agreement
(for messages with just one single category)
Annot1 X learner Req Prop Amd Cmt Dlv
Request 27 0 4 3 24
Propose 1 2 0 4 8
Amend 2 0 6 6 4
Commit 1 1 3 11 11
Deliver 17 2 5 12 104
Annot1 X Annot2 Req Prop Amd Cmt Dlv
Request 55 1 0 1 1
Propose 0 11 1 3 0
Amend 0 0 15 1 2
Commit 0 0 0 24 3
Deliver 0 1 0 4 135
21Most Informative Features (are common words)
RequestAmendPropose
Commit
Deliver
22Learning document representation
- Variants explored
- TFIDF -gt TF weighting (dont downweight common
words) - bigrams
- For commitment i will, i agree, in top 5
features - For directive do you, could you, can you,
please advise in top 25 - count of time expressions
- words near a time expression
- words near proper noun or pronoun
- POS counts
23- Baseline classifier linear-kernel SVM with TFIDF
weighting
24Collective Classification (relational)
25Collective Classification
- BOW classifier output as features (7 binary
features req, dlv, amd, prop, etc) - MaxEnt Learner, Training set N03f2, Test set
N01f3 - Features current msg parent msg child
message (1st child only) - Related msgs messages with a parent and/or
child message
N01f3 dataset Req Dlv Cmt Prop Amd ReqAmdProp DlvCmt
Entire dataset (351) F1 54.61 74.47 34.61 28.98 16.00 68.30 80.97
Entire dataset (351) Kappa 28.21 34.88 23.94 21.76 13.02 35.00 22.84
Related msgs only (170) F1 56.92 71.71 38.09 39.21 22.22 75.00 80.47
Related msgs only (170) Kappa 33.08 32.74 24.02 28.72 17.93 43.70 27.14
useful for related messages
26Collective/Iterative Classification
TIME
0.53
- Start with baseline (BOW)
- How to make updates?
- Chronological order
- Using family-heuristics (child first, parent
first, etc) - Using posterior probability
- (Maximum Entropy learner)
- (Threshold, ranking, etc)
0.65
0.85
0.85
0.95
0.93
27Iterative Classification Commitment
28Iterative Classification Request
29Iterative Classification DlvCmt
30Conclusions/Summary
- Negotiating/managing shared tasks is a central
use of email - Proposed a taxonomy for email acts - could be
useful for tracking commitments, delegations,
pending answers, integrating to-do lists and
calendars to email, etc - Inter-annotator agreement ? 70-80s (kappa)
- Learned classifiers can do this to some
reasonable degree of accuracy (90 precision at
50-60 recall for top level of taxonomy) - Fancy tricks with IE, bigrams, POS offer modest
improvement over baseline TF-weighted systems
31Conclusions/Future Work
- Teamwork (Collective/Iterative classification)
seems to helps a lot! - Future work
- Integrate all features best learners
trickstune the system - Social network analysis