Title: Semisupervised Classification
1Semi-supervised Classification
Jieping Ye Department of Computer Science and
Engineering Arizona State University http//www.pu
blic.asu.edu/jye02
2Outline of lecture
- Overview of semi-supervised clustering
- What is semi-supervised classification?
- Algorithms for semi-supervised classification
- Graph mincuts
- Harmonic approach
- Consistency approach
- Transductive SVM
3Overview of semi-supervised clustering
- Domain knowledge
- Partial label information is given
- Apply some constraints (must-links and
cannot-links) - Approaches
- Search-based Semi-Supervised Clustering
- Alter the clustering algorithm using the
constraints - Similarity-based Semi-Supervised Clustering
- Alter the similarity measure based on the
constraints - Combination of both
4What is semi-supervised classification?
- Use small number of labeled data to label large
amount of unlabeled data. - Labeling is expensive
- Basic idea
- Similar data should have the same class label.
- Typical example
- Web page classification
- Document classification
- Protein classification
5Problem setting
- X x1..xl,xl1..xn ? Rm
- Label set L 0,1
- The first l points have been labeled yi ?0,1,
i1,..,l. - For points with il, yi is unknown.
- The error is checked on the unlabeled examples
only.
6Problem setting
Labeled data
Labels of labeled data (0 or 1)
Goal predict the labels of unlabeled data
X
y
unlabeled data
7The cluster assumption
- The basic assumption of most Semi-Supervised
learning algorithms - Nearby points are likely to have the same label.
- Two points that are connected by a path going
through high density regions should have the same
label.
8Cluster Assumption Example
9Cluster Assumption Example
10Cluster Assumption Example
11Semi-supervised classification algorithms
- Semi-supervised EM NigamML00
- Co-training BlumCOLT98
- Graph based algorithms BlumICML01,
JoachimsICML03,ZhuICML03,ZHOUNIPS03 - Transductive SVM Vapnik98,JoachimsICML99
12Graph based approaches
- Construct the graph G (V,E) corresponding to
the n data points. - The nxn weight matrix W on this graph is given as
13Construct the graph
W is the similarity matrix
14Graph based algorithms
- Graph based algorithms for semi-supervised
classification - Graph mincuts
- Harmonic approach
- Consistency approach
- Many others
- Basics
- Build the weighted graph
- Solve an optimization problem
- Use objective function based on cluster
assumption
15Graph mincuts
- Paper Learning from Labeled and Unlabeled Data
using Graph Mincuts. Blum and Chawla, ICML2001. - Build a weighted graph G (V,W), where
- Set
- Determine a minimum cut for
the graph - Use the max-flow algorithm in which is
the source and is the sink, and the
weights are treated as capacities.
16Graph mincuts
Solved by linear programming
17Harmonic approach
- Paper Semi-Supervised Learning Using Gaussian
Fields and Harmonic functions. Zhu and et al. - Basics
- Build the weighted graph
- The labels on the labeled data are fixed
- Determine the labels of the unlabeled data based
on the cluster Assumption
18Intuition
Non-differentiable
f discrete
Determine the labels via thresholding
The values of f on labeled data are fixed.
19Main idea
- Define a real-valued function f V ? R on G with
certain properties. - Goal determine the label of unlabeled data by f.
- Intuition Nearby points in the graph have the
same label.
Optimization problem Compute optimal f such that
E(f) is minimized, subject to the constraint that
the values of f on labeled data are fixed.
20Harmonic function
- The optimization problem
- The optimal solution f is harmonic
on unlabeled points
21Optimal solution in matrix form
22Comparison with graph mincuts
- Label function f
- Continuous versus discrete
- Objective function
- Similar cluster assumption
- Difference differentiable versus
non-differentiable - Computation
- Matrix computations versus linear programming
(max-flow algorithm)
23Consistency approach
- Paper Learning with local and global
consistency. Zhou and et al. - Key ideas
- Use the labeled points as sources to pump the
different classes labels via the graph, and use
the new labeled points as additional source until
a stable stage has been reached. - The label of each unlabeled point is set to be
the class of which it has received most
information during the iteration process.
24Notations
- X x1..xl,xl1..xn ? Rm and Label set L
0,1 - The first l points have been labeled yi ?0,1.
For points with il, yi is unknown. - The classification will be presented on a
non-negative vector F. - yi 0 if Fi 0.5.
- Let Y be a vector with elements Yi 1 if point i
has a label yi 1 or 0 otherwise. - For multi-class problem, the classification will
be presented on an n x k non-negative matrix
F. - The classification of point xi will be yi
argmax j -
25Main idea
26The Main Algorithm
- Form the affinity matrix W defined by
Wij exp(-xi-xj2 /2?2) if i ?j
and Wii 0. - Compute the matrix S defined by S D-½ W D- ½
- D is a diagonal matrix with its (i,i) element
equal to the sum of the i-th row of W. The
eigenvalues of S represents the spectral clusters
of the data. - Iterate F(t1) ?SF(t) (1-?)Y until
convergence. ??(0, 1). - Let F denote the limit of the sequence
F(t). - Label the unlabeled point xi by
yi 0 if Fi 0.5.
27Consistency Algorithm Convergence
- Show the algorithm convergence to
F (1-?)(I -?S)-1Y - Without loss of generality, let F(0) Y.
- F(t1) ?SF(t) (1-?)Y
- And therefore F(t) (?S)tY (1-?)?t-1i0
(?S)iY. -
28Consistency Algorithm Convergence
- Show the algorithm convergence to F
(1-?)(I -?S)-1Y - F(t) (?S)tY (1-?)?t-1i0 (?S)iY.
- Since 01
- lim t?? (?S)t-1 0
- lim t?? ?i0t-1 (?S)i (I -?S)-1
- Hence F lim t?? F(t) (1-?)(I -?S)-1Y
29Regularization Framework
- Define a cost function for the iteration stage
-
- The classifiying function is
-
- smoothness constraint a good classifying
function should not change too much between
nearby points. -
30Regularization Framework
- fitting constraint a good classifying function
should not change too much from the initial label
assignment. - ? 0 Trade-off between constraints
31Regularization Framework
32Results Two Moon Toy Problem
33Results Two Moon Toy Problem
34Results Two Moon Toy Problem
35Results Two Moon Toy Problem
36Experiments
Source Learning with Local and Global
Consistency. Zhou and et al.
37Discussion
- Graph mincuts
- Harmonic approach
- Consistency approach
- Objective function
- Graph mincuts and harmonic approach preserve
labels of the labeled data - Labels are preserved
- Consistency applies a penalty term for labeled
data - Labels may change
38Semi-supervised classification algorithms
- Semi-supervised EM NigamML00
- Co-training BlumCOLT98
- Graph based algorithms BlumICML2001,
JoachimsICML2003,ZhuICML2003,ZHOUNIPS2003 - Transductive SVM Vapnik98,JoachimsICML99
39Transductive SVM Formulation
40Transductive SVM Intuition
41Transductive SVM An extension
42Reference
- Learning from Labeled and Unlabeled Data using
Graph Mincuts - http//www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/av
rim/Papers/mincut.ps - Learning with Local and Global Consistency
- http//www.kyb.mpg.de/publications/pdfs/pdf2333.pd
f - Semi-Supervised Learning Using Gaussian Fields
and Harmonic Functions - http//www.hpl.hp.com/conferences/icml2003/papers/
132.pdf - Transductive Inference for Text Classification
using Support Vector Machines - http//www.cs.cornell.edu/People/tj/publications/j
oachims_99c.pdf - Semi-Supervised Classification by Low Density
Separation - http//eprints.pascal-network.org/archive/00000388
/01/pdf2899.pdf
43Next class
- Topics
- Feature reduction (PCA, CCA)
- Readings
- Geometric Methods for Feature Extraction and
Dimensional Reduction - http//www.public.asu.edu/jye02/CLASSES/Fall-2005
/PAPERS/Burge-featureextraction.pdf