Heterogeneous Consensus Learning via Decision Propagation and Negotiation - PowerPoint PPT Presentation

About This Presentation

Title:

Heterogeneous Consensus Learning via Decision Propagation and Negotiation

Description:

KDD 09 Paris, France Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao Wei Fan Yizhou Sun Jiawei Han – PowerPoint PPT presentation

Number of Views:151

Avg rating:3.0/5.0

Slides: 25

Provided by: jing125

Learn more at: https://cse.buffalo.edu

Category:

more less

Transcript and Presenter's Notes

Title: Heterogeneous Consensus Learning via Decision Propagation and Negotiation

1
Heterogeneous Consensus Learning via Decision
Propagation and Negotiation
KDD09 Paris, France

Jing Gao Wei Fan Yizhou SunJiawei Han
University of Illinois at Urbana-Champaign
IBM T. J. Watson Research Center

2
Information Explosion
Not only at scale, but also at available sources!
Descriptions
Videos
Fan Site
Pictures
descriptions
reviews
Blogs
3
Multiple Source Classification
Image Categorization
Like? Dislike?
Research Area
movie genres, cast, director, plots. users
viewing history, movie ratings
publication and co-authorship network, published
papers, .
images, descriptions, notes, comments, albums,
tags.
4
Model Combination helps!
Supervised or unsupervised
supervised
Some areas share similar keywords
People may publish in relevant but different areas
There may be cross-discipline co-operations
unsupervised
5
Motivation

Multiple sources provide complementary
information
We may want to use all of them to derive better
classification solution
Concatenation of information sources is
impossible
Information sources have different formats
May only have access to classification or
clustering results due to privacy issues
Ensemble of supervised and unsupervised models
Combine their outputs on the same set of objects
Derive a consolidated solution
Reduce errors made by individual models
More robust and stable

6
Consensus Learning
7
Related Work

Ensemble of Classification Models
Bagging, boosting,
Focus on how to construct and combine weak
classifiers
Ensemble of Clustering Models
Derive a consolidated clustering solution
Semi-supervised (transductive) learning
Link-based classification
Use link or manifold structure to help
classification
One unlabeled source
Multi-view learning
Construct a classifier from multiple sources

8
Problem Formulation

Principles
Consensus maximize agreement among supervised
and unsupervised models
Constraints Label predictions should be close to
the outputs of the supervised models
Objective function

NP-hard!
Consensus
Constraints
9
Methodology
Step 1 Group-level predictions
How to propagate and negotiate?
Step 2 Combine multiple models using local
weights
How to compute local model weights?
10
Group-level Predictions (1)

Groups
similarity percentage of common members
initial labeling category information from
supervised models

11
Group-level Predictions (2)
Unlabeled nodes
Labeled nodes
0.16 0.16 0.98
0.93 0.07 0

Principles
Conditional probability estimates smooth over the
graph
Not deviate too much from the initial labeling

12
Local Weighting Scheme (1)

Principles
If M makes more accurate prediction on x, Ms
weight on x should be higher
Difficulties
unsupervised model combinationcannot use
cross-validation

13
Local Weighting Scheme (2)

Method
Consensus
To compute Mis weight on x, use M1,, Mi-1,
Mi1, ,Mr as the true model, and compute the
average accuracy
Use consistency in xs neighbors label
predictions between two models to approximate
accuracy
Random
Assign equal weights to all the models

consensus
random
14
Algorithm and Time Complexity
for each pairs of groups
O(s2)
Compute similarity and local consistency
iterate f steps
for each group
Compute probability estimates based on the
weighted average of neighbors
O(fcs2)
linear in the number of examples!
for each example
for each model
Compute local weights
O(rn)
Combine models predictions using local weights
15
Experiments-Data Sets

20 Newsgroup
newsgroup messages categorization
only text information available
Cora
research paper area categorization
paper abstracts and citation information
available
DBLP
researchers area prediction
publication and co-authorship network, and
publication content
conferences areas are known
Yahoo! Movie
user viewing interest analysis (favored movie
types)
movie ratings and synopses
movie genres are known

16
Experiments-Baseline Methods

Single models
20 Newsgroup
logistic regression, SVM, K-means, min-cut
Cora
abstracts, citations (with or without a labeled
set)
DBLP
publication titles, links (with or without labels
from conferences)
Yahoo! Movies
Movie ratings and synopses (with or without
labels from movies)
Ensemble approaches
majority-voting classification ensemble
majority-voting clustering ensemble
clustering ensemble on all of the four models

17
Experiments-Evaluation Measures

Classification Accuracy
Clustering algorithms map each cluster to the
best possible class label (should get the best
accuracy the algorithm can achieve)
Clustering quality
Normalized mutual information
Get a true model from the groudtruth labels
Compute the shared information between the true
model and each algorithm

18
Empirical Results -Accuracy
19
Empirical Results-NMI
20
Empirical Results-DBLP data
21
Empirical Results-Yahoo! Movies
22
Empirical Results-Scalability
23
Conclusions

Summary
We propose to integrate multiple information
sources for better classification
We study the problem of consolidating outputs
from multiple supervised and unsupervised models
The proposed two-step algorithm solve the problem
by propagating and negotiating among multiple
models
The algorithm runs in linear time.
Results on various data sets show the
improvements
Follow-up Work
Algorithm and theory
Applications

24
Thanks!