SIGIR, August 2005, Salvador, Brazil On the Collective Classification of Email PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: SIGIR, August 2005, Salvador, Brazil On the Collective Classification of Email


1
SIGIR, August 2005, Salvador, BrazilOn the
Collective Classification of Email Speech Acts
Vitor R. Carvalho William W. Cohen Carnegie
Mellon University
2
Outline
  1. Email Speech Acts and Applications
  2. Sequential Nature of Negotiations
  3. Collective Classification and Results

3
Classifying Email into Acts Cohen, Carvalho
Mitchell, EMNLP-04
  • An Act is a verb-noun pair (e.g., propose
    meeting)
  • One single email message may contain multiple
    acts. Not all pairs make sense.
  • Try to describe commonly observed behaviors,
    rather than all possible speech acts.
  • Also include non-linguistic usage of email
    (delivery of files)
  • Most of the acts can be learned (EMNLP-04)

Verbs
Nouns
4
Email Acts - Applications
  • Email overload improved email clients.
  • Negotiating/managing shared tasks is a central
    use of email
  • Tracking commitments, delegations, pending
    answers
  • integrating to-do/task lists to email, etc.
  • Iterative Learning of Email Tasks and Speech Acts
    Kushmerick Khoussainov, 2005
  • Predicting Social Roles and Group Leadership.
  • Leuski, 2004Carvalho et al., in progress

5
Idea Predicting Acts from Surrounding Acts
Example of Email Thread Sequence
  • Strong correlation with previous and next
    messages acts

Delivery
Request
Request
Proposal
Delivery
Commit
Commit
Delivery
  • Act has little or no correlation with other acts
    of same message

ltltIn-ReplyTogtgt
Commit
6
Related work on the Sequential Nature of
Negotiations
  • Winograd and Flores,1986 Conversation for
    Action Structure
  • Murakoshi et al., 1999 Construction of
    Deliberation Structure in Email

7
Related work on the Sequential Nature of
Negotiations
  • Kushmerick Lau, 2005 Learning the structure
    of interactions between buyers and e-commerce
    vendors

8
Data CSPACE Corpus
  • Few large, free, natural email corpora are
    available
  • CSPACE corpus (Kraut Fussell)
  • Emails associated with a semester-long project
    for Carnegie Mellon MBA students in 1997
  • 15,000 messages from 277 students, divided in 50
    teams (4 to 6 students/team)
  • Rich in task negotiation.
  • 1500 messages (4 teams) had their Speech Acts
    labeled.
  • One of the teams was double labeled, and the
    inter-annotator agreement ranges from 72 to 83
    (Kappa) for the most frequent acts.

9
Evidence of Sequential Correlation of Acts
  • Transition diagram for most common verbs from
    CSPACE corpus
  • It is NOT a Probabilistic DFA
  • Act sequence patterns (Request, Deliver),
    (Propose, Commit, Deliver), (Propose,
    Deliver), most common act was Deliver
  • Less regularity than the expected (considering
    previous deterministic negotiation state diagrams)

10
Content versus Context
  • Content Bag of Words features only
  • Context Parent and Child Features only ( table
    below)
  • 8 MaxEnt classifiers, trained on 3F2 and tested
    on 1F3 team dataset
  • Only 1st child message was considered (vast
    majority more than 95)

Request
Request
Proposal
???
Delivery
Commit
Parent message
Child message
Parent Boolean Features Child Boolean Features
Parent_Request, Parent_Deliver, Parent_Commit, Parent_Propose, Parent_Directive, Parent_Commissive Parent_Meeting, Parent_dData Child_Request, Child_Deliver, Child_Commit, Child_Propose, Child_Directive, Child_Commissive, Child_Meeting, Child_dData
Kappa Values on 1F3 using Relational (Context)
features and Textual (Content) features.
Set of Context Features (Relational)
11
Dependency Network
  • Dependency networks are probabilistic graphical
    models in which the full joint distribution of
    the network is approximated with a set of
    conditional distributions that can be learned
    independently. The conditional probability
    distributions in a DN are calculated for each
    node given its neighboring nodes (its Markov
    blanket).
  • Approx inference (Gibbs sampling)
  • Markov blanket parent message and child
    message
  • Heckerman et al., JMLR-2000. Neville Jensen,
    KDD-MRDM-2003.

12
Collective Classification Procedure (based on
Dependency Networks Model)
13
Improvement over Content-only baseline
Kappa often improves after iteration
Kappa unchanged for deliver
14
Leave-one-team-out Experiments
Kappa Values
  • 4 teams
  • 1f3(170 msgs)
  • 2f2(137 msgs)
  • 3f2(249 msgs)
  • 4f4(165 msgs)
  • (x axis) Bag-of-words only
  • (y-axis) Collective classification results
  • Different teams present different styles for
    negotiations and task delegation.

15
Leave-one-team-out Experiments
Kappa Values
  • Consistent improvement of Commissive, Commit and
    Meet acts

16
Leave-one-team-out Experiments
  • Deliver and dData performance usually decreases
  • Associated with data distribution, FYI, file
    sharing, etc.
  • For non-delivery, improvement in avg. Kappa is
    statistically significant (p0.01 on a two-tailed
    T-test)

Kappa Values
17
Act by Act Comparative Results
Kappa values with and without collective
classification, averaged over the four test sets
in the leave-one-team out experiment.
18
Conclusion
  • Sequential patterns of email acts were studied in
    the CSPACE corpus. Less regularity than expected.
  • We proposed a collective classification procedure
    for Email Speech Acts based on a Dependency Net
    model.
  • Modest improvements over the baseline on acts
    related to negotiation (Request, Commit, Propose,
    Meet, etc) . No improvement/deterioration was
    observed for Deliver/dData (acts less associated
    with negotiations)
  • Degree of linkage in our dataset is small which
    makes the observed results encouraging.

19
Thank you!
20
Thank you!
21
Inter-Annotator Agreement
  • Kappa Statistic
  • A probability of agreement in a category
  • R prob. of agreement for 2 annotators labeling
    at random
  • Kappa range -11

Inter-Annotator Agreement Inter-Annotator Agreement
Email Act Kappa
Deliver 0.75
Commit 0.72
Request 0.81
Amend 0.83
Propose 0.72
Write a Comment
User Comments (0)
About PowerShow.com