Title: An Email and Meeting Assistant Using Graph Walks
1An Email and Meeting AssistantUsing Graph Walks
Einat Minkov William W. Cohen
CEAS-2006
2Documents and Links
- PageRank (Brin and Page, 98), HITS (Kleinberg,
98) - Co-training (Blum and Mitchell)
- Documents are not isolated objects they are
connected to other documents via hyperlinks - Document similarity/relatednessvia random graph
walk
3Structured Documents
- In structured data, documents are inter-connected
via other common objects. - Email and meeting entries are examples of
structured datatext meta-data - Represent email and meetings as a joint graph
- Derive extended similarity measures between graph
objects using lazy graph walks. - Show me recent relevant messages to this message
- What is the full name of Danny that is
mentioned in this message?
Framework
Questions we can ask
4Email as a Graph
Chris.germany_at_enron.com
Chris
alias
sent_from
sent_from_email
Mgermany_at_ch2m.com
sent_to_email
1.22.00
file1
On_date
sent_to
has_subj_term
Melissa Germany
has_term
work
where
yo
Im
you
5Email as a Graph
- A directed graph
- A node carries an entity type
- An edge carries a relation type
- Edges are bi-directional (cyclic)
- Nodes inter-connect via linked entities.
6Meetings
- Like Email messages, Meeting entries are
structured. - Share entities with Email
- Email and meetings can be naturally represented
as a joint graph.
TIME
TEXT
PERSONS
7The Joint Graph
nodex
Shared content
Social network
Timeline
8Edge Weights
- Graph G - nodes x,y,z
- - node types T(x), T(y), T(z)
- - edge labels - parameters
- Edge weight x ? y
- Prob. Distribution
a. Pick an outgoing edge label
b. Pick node y uniformly
9Graph Similarity
- Defined by lazy graph walks over k steps.
- Given
Stay probability
(larger values favor shorter paths)
A transition matrix
Initial node distribution
Output node distribution
We use this platform to perform SEARCH of
related items in the graph a query is initial
distribution Vq over nodes and a desired output
type Tout
10Evaluation
Many tasks/ applications can be phrased as search
queries in this framework.
TASK I Find Meeting Attendees
- Given a meeting text date
- Retrieve a ranked list of relevant
email-addresses (potential attendees)
TASK II Find Email Aliases
Given a persons name Retrieve a ranked list
of his/hers email-addresses
11Methods
Corpus
- Baseline String matching Use distance metric
(JARO-Winkler) Finds similar email-addresses
to personal / project names mentioned.
- 346 email files (Meetings folder)
- 334 meeting entries (Palm)
- Both over the same time span (about 6 months)
- The joint graph includes 3,680 nodes
- Graph walk
- 3 Steps
- Uniform weights
12Results Find Meeting Attendees
A. All email addresses
- 11-point precision-recall curve, averaged over
13 examples
meeting
term
date
B. One address per person
file
e-address
13Results Find Email Aliases
A. By first name
- 14 examples (2 to 5 email aliases each)
term
term
term
term
file
person
B. By full name
term
e-address
14Summary
- A Joint representation of email and meetings
- Denser links
- Augments social network information
- Supports Meeting management applications
- Preliminary results are promising.
- Application of learning and more results for
email-related tasks, available atContextual
Search and Name Disambiguation in Email Using
Graphs, Einat Minkov, William W. Cohen, Andrew
Y. Ng in SIGIR 2006
15Thanks! Questions?