Title: Modeling%20Intention%20in%20Email%20%20Vitor%20R.%20Carvalho
1Modeling Intention in EmailVitor R. Carvalho
- Ph.D. Thesis Defense Thesis Committee
- Language Technologies Institute
William W. Cohen (chair) - School of Computer Science Tom M.
Mitchel - Carnegie Mellon University Robert E.
Kraut - July 22th 2008 Lise Getoor (Univ.
Maryland)
2Outline
- Motivation
- Email Acts
- Email Leaks
- Recommending Email Recipients
- Learning Robust Rank Models
- User Study
3Why Email
- The most successful e-communication application.
- Great tool to collaborate, especially in
different time zones. - Very cheap, fast, and convenient.
- Multiple uses task manager, contact manager, doc
archive, to-do list, etc. - Increasingly popular
- Clinton adm. left 32 million emails to the
National Archives - Bush adm.more than 100 million in 2009
(expected) - Visible impact
- Office workers in the U.S. spend at least 25 of
the day on email not counting handheld use
Shipley Schwalbe, 2007
Shipley Schwalbe, 2007
4Hard to manage
- People get overwhelmed
- Costly interruptions
- Serious impacts on work productivity
- Increasingly difficult to manage requests,
negotiate shared tasks and keep track of
different commitments - People make horrible mistakes
- Send messages to the wrong persons
- Forget to address intended recipients
- Oops, Did I just hit reply-to-all?
Dabbish Kraut, CSCW-2006.
Belloti et al. HCI-2005
5Thesis
- ?We present evidence that email management can be
potentially improved by the effective use of
machine learning techniques to model different
aspects of user intention.
6Outline
- Motivation
- Email Acts ?
- Preventing Email Information Leaks
- Recommending Email Recipients
- Learning Robust Rank Models
- User Study
7Classifying Email into Acts Cohen, Carvalho
Mitchell, EMNLP-04
- An Act is described as a verb-noun pair (e.g.,
propose meeting, request information) - Not all
pairs make sense - One single email message may contain multiple
acts - Try to describe commonly observed behaviors,
rather than all possible speech acts in English - Also include non-linguistic usage of email (e.g.
delivery of files)
Verbs
Nouns
8Data Features
- Data Carnegie Mellon MBA students competition
- Semester-long project for CMU MBA students. Total
of 277 students, divided in 50 teams (4 to 6
students/team). Rich in task negotiation. - 1700 messages (from 5 teams) were manually
labeled. One of the teams was double labeled, and
the inter-annotator agreement ranges from 0.72 to
0.83 (Kappa) for the most frequent acts. - Features
- N-grams 1-gram, 2-gram, 3-gram,4-gram and 5-gram
- Pre-Processing
- Remove Signature files, quoted lines
(in-reply-to) Jangada package - Entity normalization and substitution patterns
- SundayMonday ?day, numbernumber ?
hour, - me, her, him ,us or them ? me, after,
before, or during ? time, etc
9Classification Performance
Carvalho Cohen, HLT-ACTS-06 Cohen, Carvalho
Mitchell, EMNLP-04
5-fold cross-validation over 1716 emails, SVM
with linear kernel
10Predicting Acts from Surrounding Acts
Carvalho Cohen, SIGIR-05
Example of Email Thread Sequence
Deliver
Strong correlation between previous and next
messages acts
Request
Request
Propose
Deliver
Commit
Commit
Deliver
Both Context and Content have predictive value
for email act classification
Commit
Collective classification problem ? Dependency
Network
11Collective Classification with Dependency
Networks (DN)
Carvalho Cohen, SIGIR-05
- In DNs, the full joint probability distribution
is approximated with a set of conditional
distributions that can be learned independently.
The conditional probabilities are calculated for
each node given its Markov blanket.
Heckerman et al., JMLR-00 Neville Jensen,
JMLR-07
Inference Temperature-driven Gibbs sampling
12Act by Act Comparative Results
Modest improvements over the baseline
Only on acts related to negotiation Request,
Commit, Propose, Meet, Commissive, etc.
Kappa values with and without collective
classification, averaged over four team test sets
in the leave-one-team out experiment.
13Key Ideas
- Summary
- Introduced a new taxonomy of acts tailored to
email communication - Good levels of inter-annotator agreement
- Showed that it can be automated
- Proposed a collective classification algorithm
for threaded messages - Related Work
- Speech Act Theory Austin, 1962Searle,1969,
Coordinator system Winograd,1987, Dialog Acts
for Speech Recognition, Machine Translation, and
other dialog-based systems. Stolcke et al.,
2000 Levin et al., 03, etc. - Related applications
- Focus message in threads/discussions Feng et al,
2006, Action-items discovery Bennett
Carbonell, 2005, Task-focused email summary
Corsten-Oliver et al, 2004, Predicting Social
Roles Leusky, 2004, etc.
14Applications of Email Acts
- Iterative Learning of Email Tasks and Email Acts
- Predicting Social Roles and Group Leadership
- Detecting Focus on Threaded Discussions
- Semantically Enhanced Email
- Email Act Taxonomy Refinements
Kushmerick Khousainov, IJCAI-05
Leusky,SIGIR-04Carvalho et al, CEAS-07
Feng et al., HLT/NAACL-06
Scerri et al, DEXA-07
Lampert et al, AAAI-2008 EMAIL
15Outline
- Motivation
- Email Acts
- Preventing Email Information Leaks ?
- Recommending Email Recipients ?
- Learning Robust Rank Models
- User Study
16(No Transcript)
17(No Transcript)
18http//www.sophos.com/
19Preventing Email Info Leaks
Carvalho Cohen, SDM-07
Email Leak email accidentally sent to wrong
person
- Similar first or last names, aliases, etc
- Aggressive auto-completion of email addresses
- Typos
- Keyboard settings
Disastrous consequences expensive law suits,
brand reputation damage, negotiation setbacks,
etc.
20Preventing Email Info Leaks
Carvalho Cohen, SDM-07
- Method
- Create simulated/artificial email recipients
- Build model for (msg.recipients) train
classifier on real data to detect synthetically
created outliers (added to the true recipient
list). - Features textual(subject, body), network
features (frequencies, co-occurrences, etc). - Detect potential outliers - Detect outlier and
warn user based on confidence.
- Similar first or last names, aliases, etc
- Aggressive auto-completion of email addresses
- Typos
- Keyboard settings
21Simulating Email Leaks
- Several options
- Frequent typos, same/similar last names,
identical/similar first names, aggressive
auto-completion of addresses, etc. - We adopted the 3g-address criteria
- On each trial, one of the msg recipients is
randomly chosen and an outlier is generated
according to
a
1-a
Generate a random email address NOT in Address
Book
22Data and Baselines
- Enron email dataset, with a realistic setting
- For each user, 10 most recent sent messages
were used as test - Some basic preprocessing
- Baseline methods
- Textual similarity
- Common baselines in IR
- Rocchio/TFIDF Centroid 1971
- Create a TfIdf centroid for each user in
Address Book. For testing, rank according to
cosine similarity between test msg and each
centroid. - Knn-30 Yang Chute, 1994
- Given a test msg, get 30 most similar msgs in
training set. Rank according to sum of
similarities of a given user on the 30-msg set.
23Using Non-Textual Features
- Frequency features
- Number of received, sent and sentreceived
messages (from this user) - Co-Occurrence Features
- Number of times a user co-occurred with all other
recipients. - Auto features
- For each recipient R, find Rm (address with max
score from 3g-address list of R), then use
score(R)-score(Rm) as feature.
Combine with text-only scores using
perceptron-based reranking, trained on simulated
leaks
Text-based Feature (KNN30 score or TFIDF score)
Non-textual Features
24Email Leak Results
Carvalho Cohen, SDM-07
25Finding Real Leaks in Enron
Sorry. Sent this to you by mistake., I
accidentally sent you this reminder
- How can we find it?
- Grep for mistake, sorry or accident
- Note must be from one of the Enron users
- Found 2 valid cases
- Message germany-c/sent/930, message has 20
recipients, leak is alex.perkins_at_ - kitchen-l/sent items/497, it has 44 recipients,
leak is rita.wynne_at_ - Prediction results
- The proposed algorithm was able to find these two
leaks
26Another Email Addressing Problem
Sometimes people just forget an intended recipient
27Forgetting an intended recipient
- Particularly in large organizations,
- it is not uncommon to forget to CC an important
collaborator a manager, a colleague, a
contractor, an intern, etc. - More frequent than expected (from Enron
Collection) - at least 9.27 of the users have forgotten to add
a desired email recipient. - At least 20.52 of the users were not included as
recipients (even though they were intended
recipients) in at least one received message. - Cost of errors in task management can be high
- Communication delays, deadlines can be missed
- Opportunities wasted, costly misunderstandings,
task delays
Carvalho Cohen, ECIR-2008
28Data and Features
- Easy to obtain labeled data
- Two Ranking problems
- Predicting TOCCBCC
- Predicting CCBCC
- Features Methods
- Textual Rocchio (TFIDF) and KNN
- Non-Textual Frequency, Recency and Co-Occurrence
- Number of messages received and/or sent (from/to
this user) - How often was a particular user addressed in the
last 100 msgs - Number of times a user co-occurred with all other
recipients. Co-occurr means two recipients were
addressed in the same message in the training set
29Email Recipient Recommendation
Carvalho Cohen, ECIR-08
36 Enron users
MRR 0.5
44000 queries Avg 1267 q/user
30Rank Aggregation
Aslam Montague, 2001 Ogilvie Callan,
2003 Macdonald Ounis, 2006
- Many Data Fusion methods
- 2 types
- Normalized scores CombSUM, CombMNZ, etc.
- Unnormalized scores BordaCount, Reciprocal Rank
Sum, etc. - Reciprocal Rank
- The sum of the inverse of the rank of document in
each ranking.
31Rank Aggregation Results
32Intelligent Email Auto-completion
Carvalho Cohen, ECIR-08
TOCCBCC
CCBCC
33Related Work
- Email Leak
- Boufaden et al., 2005 proposed a privacy
enforcement system to monitor specific privacy
breaches (student names, student grades, IDs). - Lieberman and Miller, 2007 Prevent leaks based
on faces - Recipient Recommendation
- Pal McCallum, 2006, Dredze et al, 2008 CC
Prediction problem, Recipient prediction based on
summary keywords - Expert Search in Email
- Dom et al.,2003, Campbell et al,2003, Balog
de Rijke, 2006, Balog et al, 2006,Soboroff,
Craswell, de Vries (TREC-Enterprise 2005-06-07)
34Outline
- Motivation
- Email Acts
- Preventing Email Information Leaks
- Recommending Email Recipients
- Learning Robust Ranking Models ?
- User Study
35Can we learn a better ranking function?
- Learning to Rank machine learning to improve
ranking - Many recently proposed methods
- RankSVM
- RankBoost
- Committee of Perceptrons
- Meta-Learning Method
- Learn Robust Ranking Models in the pairwise-based
framework
Joachims, KDD-02
Freund et al, 2003
Elsas, Carvalho Carbonell, WSDM-08
36Pairwise-based Ranking
Goal induce a ranking function f(d) s.t.
Rank q
d1 d2 d3 d4 d5 d6 ... dT
We assume a linear function f
Constraints
Paired instance
O(n) mislabels produce O(n2) mislabeled pairs
37Effect of Pairwise Outiers
RankSVM
SEAL-1
38Effect of Pairwise Outiers
RankSVM
RankSVM
Loss Function
Pairwise Score Pl
39Effect of Pairwise Outiers
RankSVM
RankSVM
Loss Function
Pairwise Score Pl
Robust to outliers, but not convex
40Ranking Models 2 Stages
Base ranking model
Final model
Base Ranker
Sigmoid Rank
Non-convex
e.g., RankSVM, Perceptron, ListNet, etc.
Minimize (a very close approximation for) the
empirical error number of misranks Robust to
outliers (label noise)
41Learning
- SigmoidRank Loss
- Learning with Gradient Descent
42Email Recipient Results
Carvalho, Elsas, Cohen and Carbonell, SIGIR 2008
LR4IR
36 Enron users
plt0.01
p0.06
plt0.01
13.2
0.96
2.07
p0.74
plt0.01
plt0.01
12.7
1.69
-0.09
44000 queries Avg 1267 q/user
43Email Recipient Results
Carvalho, Elsas, Cohen and Carbonell, SIGIR 2008
LR4IR
44Email Recipient Results
Carvalho, Elsas, Cohen and Carbonell, SIGIR 2008
LR4IR
45Set Expansion (SEAL) Results
Wang Cohen, ICDM-2007
Carvalho et al, SIGIR 2008 LR4IR
18 features, 120/60 train/test splits, half
relevant
46Letor Results
Carvalho et al, SIGIR 2008 LR4IR
queries/features (106/25)
(50/44) (75/44)
47Related Work
- Classification with non-convex loss functions
tradeoff for outlier robustness, accuracy,
scalability, etc. - Perez-Cruz et al, 2003, Xu et al., 2006,
Zhan Shen, 2005, Collobert et al, 2006,
Liu et al, 2005, Yang and Hu, 2008 - Ranking with other non-convex loss functions
- FRank Tsai et al, 2007 a fidelity-based loss
function optimized in the boosting framework,
query normalization may be interfering in
performance gains, not a general stage (meta)
learner
48Outline
- Motivation
- Email Acts
- Preventing Email Information Leaks
- Recommending Email Recipients
- Learning Robust Ranking Models
- User Study ?
49User Study
- Choosing an Email System
- Gmail, Yahoo!Mail, etc.
- Widely adopted, interface/compatibility issues
- Develop a new client
- Perfect control, longer development, low
adoption. - Mozilla Thunderbird
- Open source community, easy mechanism to install
extensions, millions of users.
50User Study Cut Once
Balasubramanyan, Carvalho and Cohen, AAAI-2008
EMAIL
- Cut Once, a Mozilla Thunderbird extension for
Leak Detection and Recipient Recommendation - A few issues
- Poor documentation and limited customization of
interface - JavaScript is slow Imposed computational
restrictions - Disregard rare words and rare users.
- Implement two lightweight ranking methods
- 1) TFIDF
- 2) MRR (Frequency, Recency, TFIDF)
51Cut Once Screenshots
Main Window after installation
52Cut Once Screenshots
53Cut Once Screenshots
Logged Cut Once Usage - Time, Confidence and
Position in rank of clicked recommendations -
Baseline Ranking Method
54User Study Description
- 4 week long study
- most subjects from Pittsburgh
- After 1 week, qualified users were invited to
continue. 20 of compensation was paid after 1
week - After 4 weeks, users were fully compensated (
final questionnaire) - 26 subjects finished study
- 4 female and 22 male. Median age 28.5.
- Total of 2315 sent messages. Averages 113
address book entries - Mostly students. A few sys admin, 1 professor, 2
staff member. - Randomly assigned to two different ranking
methods - TFIDF and MRR
55Recipient Suggestions
- 17 subjects used the functionality (in 5.28 of
their sent msgs). - Average of 1 accepted suggestion per 24.37 sent
messages.
56Comparison of Ranking Methods
MRR better than TFIDF
Average Rank 3.14 versus 3.69
Rank Quality 3.51 versus 3.43
Difference is not statistically significant
Rough estimate factor of 5.5 5.54 22 weeks of
user study or 5.526 143 subjects for 4 weeks
Clicked Rank
57Results Leak Detection
- 18 subj used the leak deletion (in 2.75 of their
sent msgs). - Most frequent reported use was to clean up the
addressee list - Removing unwanted people (inserted by Reply-all)
- Remove themselves (automatically added)
- 5 real leaks were reported, from 4 different
users - These users did not use Cut Once to remove the
leaks - Clicked on the cancel button, and removed
manually - Uncomfortable or unfamiliar with interface
- Under pressure because of 10-sec timer
58Results Leak Detection
- 5 leaks ? 4 users.
- Network Admin two users with similar userIDs
- System Admin wrong auto-completion in 2 or 3
situations - Undergrad two acquaintances with similar names
- Grad student reply-all case
- Correlations
- 2 users used TFIDF, 2 MRR
- No significant correlation with size of Address
Book or sent msgs - Correlation with non-student occupations (95
confidence) - Estimate 1 leak every 463 sent messages
- Assuming a binomial dist with p5/2315, then 1066
messages are required send at least one leak
(with 90 confidence).
59Final Questionnaire
(Higher Better)
Not
60Frequent complaints
Training and Optimization
Interface
61Final Questionnaire
Compose-then-address instead of
address-then-compose behavior
62Conclusions
- Email acts
- A taxonomy of intentions in email communication
- Categorization can be automated
- Addressed Two Types of Email Addressing Mistakes
- Email Leaks (accidentally adding non-intended
recipients) - Recipient recommendation (forgetting intended
recipients) - Framed as a supervised learning problems
- Introduced new methods for Leak detection
- Proposed several models for recipient
recommendation, including combinations of base
methods. - Proposed a new general purpose ranking algorithm
- Robust to outliers - outperformed
state-of-the-art rankers on the recipient
recommendation task (and other ranking tasks) - User Study using a Mozilla Thunderbird extension
- Caught 5 real leaks, and showed reasonably good
prediction quality - Showed clear potential to be adopted by a large
number of email users
63Proof non-existence of a better advisor
Given the finite set of good advisors An
64Proof non-existence of a better advisor
Given the finite set of advisors An
Assuming
Q.E.D.
65Thank you.