Collective%20Classification%20%20A%20brief%20overview%20and%20possible%20connections%20to%20email-acts%20classification - PowerPoint PPT Presentation

About This Presentation
Title:

Collective%20Classification%20%20A%20brief%20overview%20and%20possible%20connections%20to%20email-acts%20classification

Description:

J. Neville et al., 2003. Relational Data and Collective Classification ... Slattery & Mitchell, ICML-2000;Neville & Jensen, AAAI-2000; Chakrabarti et al. ACM-SIGMOD-98 ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 14
Provided by: sarahej
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Collective%20Classification%20%20A%20brief%20overview%20and%20possible%20connections%20to%20email-acts%20classification


1
Collective Classification A brief overview and
possible connections to email-acts classification
  • Vitor R. Carvalho
  • Text Learning Group Meetings,
  • Carnegie Mellon University
  • November 10th 2004

2
Data Representation
spam
Not spam
  • Flat Data
  • Object email msgs
  • Attributes words, sender, etc
  • Class spam/not spam
  • Usually assumed IID
  • Sequential Data
  • Object words in text
  • Attr capitalized, number, dict
  • Class POS (or name/not)
  • Relational Data
  • classattributes
  • links(relations)
  • Example webpages

spam
spam
Not spam
pron
name
det
name
verb
3
J. Neville et al., 2003
4
Relational Data and Collective Classification
  • Different objects interact
  • Different types of relations (links)
  • Attributes may be correlated
  • Examples
  • actors, directors, movies, companies
  • papers, authors, conferences, citations
  • company, employee, customer,

Classify objects collectively
Use prediction on some objects to improve
prediction on related objects
5
Collective Classification Methods
  • Relational Probability Trees (RPT)
  • Iterative methods (Relaxation-based Methods)
  • Relational Dependency Networks (RDN)
  • Relational Bayesian Networks (RBN/PRM)
  • Relational Markov Networks (RMN)
  • Other models (ILP based, Vector Space based,
    etc)
  • Overall
  • Lack of direct comparison among methods
  • Results are usually compared to flat model
  • Splitting data into train/test sets can be an
    issue

6
Relational Probability Trees
  • Decision Trees applied to Relational data
  • Predicts the target class label based on
  • same object attributes
  • attributes links in relational neighborhood
    (one link away)
  • counts of attributes and links in the
    neighborhood
  • Enhanced feature selection (Chi-square, pruning,
    randomization tests)
  • Results were not exciting
  • Neville et al. KDD2003, related work from
    Blockeel et al. (Artificial Intelligence, 1998),
    Kramer AAAI-96

7
Iterative Methods
  • Predicts the target class label based on
  • Same object attributes
  • Attributes and links of relational neighborhood
  • CLASS LABEL of neighborhood
  • Features derived from CLASS LABELS
  • Different update strategies
  • By threshold in prediction confidence
  • By top-N most confident predictions
  • Heuristic-based
  • Slattery Mitchell, ICML-2000Neville Jensen,
    AAAI-2000 Chakrabarti et al. ACM-SIGMOD-98
  • Some results with Email-acts

8
Relational Bayesian Networks (RBN/PRM)
  • Bayes Net extended to Relational domain
  • Given an instantiation, it induces a bayes-net
    that specifies a joint probability distribution
    over all attributes of all entities
  • Directed graphical model, with acyclicity
    constraint.
  • Exact model - Closed form for parameter
    estimation Products of conditional
    probabilities
  • Was applied to simple domains, since the
    acyclicity constraints is very restrictive to
    most relational applications
  • Friedman et al, IJCAI-99 Getoor et al.,
    ICML-2001 Taskar et al. IJCAI-2001

9
Relational Markov Networks (RMN)
  • Extension of CRF idea to Relational Domain
  • Given an instantiation, it induces a Markov
    Network that specifies a probability distribution
    of labels, given links and attributes
  • Undirected, Discriminative model
  • Parameter estimation is expensive, requires
    approximate probabilistic inference (belief
    propagation)
  • Taskar et al., UAI2002

10
Relational Dependency Networks (RDN)
  • Dependency Networks extended to Relational
    domain
  • P(X) p Prob (Xi Neighbor(Xi))
  • Given an instantiation, it induces a DN that
    specifies an approximate joint probability
    distribution over all attributes of all objects
  • Undirected graphical model, no acyclicity
    constraint.
  • Approximate model - Simple parameter estimation
    approximate inference (Gibbs sampling)
  • Neville Jensen, KDD-MRDM-2003

11
Other Models
From Neville et al., 2003
12
Comparing Some Results
PRM
  • Comparing PRM, RMN, SVM and M3N
  • Diff PRM and RMN
  • Diff mSVM and RMN
  • RN (Relational Neighbor) is a very simple
    Relational Classifier
  • RN (Macskassy et al., 2003)
  • M3N(Taskar et al., 2003)

RMN
13
End of overviewnow, the email-act problem
  • Strong correlation with previous and next message

Commit
Proposal
Request
Request
Request
Proposal
Delivery
Commit
Request
Acknowled
Delivery
  • A verb has little or no correlation with other
    verbs of same message
  • Flat data?
  • Sequential data?

Commit
Delivery
Time
Write a Comment
User Comments (0)
About PowerShow.com