Title: Semi-supervised%20Learning
1Semi-supervised Learning
2Semi-supervised learning
- Label propagation
- Transductive learning
- Co-training
- Active learning
3Label Propagation
Two labeled examples
- A toy problem
- Each node in the graph is an example
- Two examples are labeled
- Most examples are unlabeled
- Compute the similarity between examples Sij
- Connect examples to their most similar examples
- How to predicate labels for unlabeled nodes using
this graph?
wij
Unlabeled example
4Label Propagation
5Label Propagation
- Forward propagation
- Forward propagation
6Label Propagation
- Forward propagation
- Forward propagation
- Forward propagation
- How to resolve conflicting cases
What label should be given to this node ?
7Label Propagation
- Let S be the similarity matrix SSi,jnxn
- Let D be a diagonal matrix where Di åi ¹ j Si,j
- Compute normalized similarity matrix S
- SD-1/2SD-1/2
- Let Y be the initial assignment of class labels
- Yi 1 when the i-th node is assigned to the
positive class - Yi -1 when the i-th node is assigned to the
negative class - Yi 0 when the I-th node is not initially
labeled - Let F be the predicted class labels
- The i-th node is assigned to the positive class
if Fi gt0 - The i-th node is assigned to the negative class
if Fi lt 0
8Label Propagation
- Let S be the similarity matrix SSi,jnxn
- Let D be a diagonal matrix where Di åi ¹ j Si,j
- Compute normalized similarity matrix S
- SD-1/2SD-1/2
- Let Y be the initial assignment of class labels
- Yi 1 when the i-th node is assigned to the
positive class - Yi -1 when the i-th node is assigned to the
negative class - Yi 0 when the i-th node is not initially
labeled - Let F be the predicted class labels
- The i-th node is assigned to the positive class
if Fi gt0 - The i-th node is assigned to the negative class
if Fi lt 0
9Label Propagation
- One iteration
- F Y aSY (I aS)Y
- a weights the propagation values
- Two iteration
- F Y aSY a2S2Y (I aS a2S2)Y
- How about the infinite iteration
- F (ån01anSn)Y (I - a S)-1Y
- Any problems with such an approach?
10Label Consistency Problem
- Predicted vector F may not be consistent with the
initially assigned class labels Y
11Energy Minimization
- Using the same notation
- Si,j similarity between the I-th node and j-th
node - Y initially assigned class labels
- F predicted class labels
- Energy E(F) åi,jSi,j(Fi Fj)2
- Goal find label assignment F that is consistent
with labeled examples Y and meanwhile minimizes
the energy function E(F)
12Harmonic Function
- E(F) åi,jSi,j (Fi Fj)2 FT(D-S)F
- Thus, the minimizer for E(F) should be (D-S)F
0, and meanwhile F should be consistent with Y. - FT (FlT, FuT), YT (YlT, YuT)
- Fl Yl
-
-
13Optical Character Recognition
- Given an image of a digit letter, determine its
value
14Optical Character Recognition
- Labeled_ExamplesUnlabeled_Examples 4000
- CMN label propagation
- 1NN for each unlabeled example, using the label
of its closest neighbor
15Spectral Graph Transducer
- Problem with harmonic function
- Why this could happen ?
- The condition (D-S)F 0 does not hold for
constrained cases
16Spectral Graph Transducer
- Problem with harmonic function
- Why this could happen ?
- The condition (D-S)F 0 does not hold for
constrained cases
17Spectral Graph Transducer
- minF FTLF c (F-Y)TC(F-Y)
- s.t. FTFn, FTe 0
- C is the diagonal cost matrix, Ci,i 1 if the
i-th node is initially labeled, zero otherwise - Parameter c controls the balance between the
consistency requirement and the requirement of
energy minimization - Can be solved efficiently through the computation
of eigenvector
18Empirical Studies
19Problems with Spectral Graph Transducer
- minF FTLF c (F-Y)TC(F-Y)
- s.t. FTFn, FTe 0
- The obtained solution is different from the
desirable one minimize the energy function and
meanwhile is consistent with labeled examples Y - It is difficult to extend the approach to
multi-class classification
20Greens Function
- The problem of minimizing energy and meanwhile
being consistent with initially assigned class
labels can be formulated into Greens function
problem - Minimizing E(F) FTLF ? LF 0
- Turns out L can be viewed as Laplacian operator
in the discrete case - LF 0 ? r2F0
- Thus, our problem is find solution F
- r2F0, s.t. F Y for labeled examples
- We can treat the constraint that F Y for
labeled examples as boundary condition (Von
Neumann boundary condition) - A standard Green function problem
21Why Energy Minimization?
Final classification results
22Label Propagation
- How the unlabeled data help classification?
23Label Propagation
- How the unlabeled data help classification?
- Consider a smaller number of unlabeled example
- Classification results can be very different
24Cluster Assumption
- Cluster assumption
- Decision boundary should pass low density area
- Unlabeled data provide more accurate estimation
of local density
25Cluster Assumption vs. Maximum Margin
- Maximum margin classifier (e.g. SVM)
w?xb
- Maximum margin
- ? low density around decision boundary
- ? Cluster assumption
- Any thought about utilizing the unlabeled data in
support vector machine?
26Transductive SVM
- Decision boundary given a small number of labeled
examples
27Transductive SVM
- Decision boundary given a small number of labeled
examples - How will the decision boundary change given both
labeled and unlabeled examples?
28Transductive SVM
- Decision boundary given a small number of labeled
examples - Move the decision boundary to place with low
local density
29Transductive SVM
- Decision boundary given a small number of labeled
examples - Move the decision boundary to place with low
local density - Classification results
- How to formulate this idea?
30Transductive SVM Formulation
- Labeled data L
- Unlabeled data D
- Maximum margin principle for mixture of labeled
and unlabeled data - For each label assignment of unlabeled data,
compute its maximum margin - Find the label assignment whose maximum margin is
maximized
31Tranductive SVM
Different label assignment for unlabeled data ?
different maximum margin
32Transductive SVM Formulation
33Computational Issue
- No longer convex optimization problem. (why?)
- How to optimize transductive SVM?
- Alternating optimization
34Alternating Optimization
- Step 1 fix yn1,, ynm, learn weights w
- Step 2 fix weights w, try to predict yn1,,
ynm (How?)
35Empirical Study with Transductive SVM
- 10 categories from the Reuter collection
- 3299 test documents
- 1000 informative words selected using MI criterion
36Co-training for Semi-supervised Learning
- Consider the task of classifying web pages into
two categories category for students and
category for professors - Two aspects of web pages should be considered
- Content of web pages
- I am currently the second year Ph.D. student
- Hyperlinks
- My advisor is
- Students
37Co-training for Semi-Supervised Learning
38Co-training for Semi-Supervised Learning
It is easier to classify this web page using
hyperlinks
It is easy to classify the type of this web page
based on its content
39Co-training
- Two representation for each web page
Content representation (doctoral, student,
computer, university)
Hyperlink representation Inlinks Prof.
Cheng Oulinks Prof. Cheng
40Co-training Classification Scheme
- Train a content-based classifier using labeled
web pages - Apply the content-based classifier to classify
unlabeled web pages - Label the web pages that have been confidently
classified - Train a hyperlink based classifier using the web
pages that are initially labeled and labeled by
the classifier - Apply the hyperlink-based classifier to classify
the unlabeled web pages - Label the web pages that have been confidently
classified
41Co-training
- Train a content-based classifier
42Co-training
- Train a content-based classifier using labeled
examples - Label the unlabeled examples that are confidently
classified
43Co-training
- Train a content-based classifier using labeled
examples - Label the unlabeled examples that are confidently
classified - Train a hyperlink-based classifier
- Prof. outlinks to students
44Co-training
- Train a content-based classifier using labeled
examples - Label the unlabeled examples that are confidently
classified - Train a hyperlink-based classifier
- Prof. outlinks to students
- Label the unlabeled examples that are confidently
classified
45Co-training
- Train a content-based classifier using labeled
examples - Label the unlabeled examples that are confidently
classified - Train a hyperlink-based classifier
- Prof. outlinks to
- Label the unlabeled examples that are confidently
classified