Semi-supervised%20Learning - PowerPoint PPT Presentation

About This Presentation
Title:

Semi-supervised%20Learning

Description:

Semi-supervised Learning Rong Jin – PowerPoint PPT presentation

Number of Views:334
Avg rating:3.0/5.0
Slides: 43
Provided by: rongjin
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Semi-supervised%20Learning


1
Semi-supervised Learning
  • Rong Jin

2
Semi-supervised learning
  • Label propagation
  • Transductive learning
  • Co-training
  • Active learning

3
Label Propagation
Two labeled examples
  • A toy problem
  • Each node in the graph is an example
  • Two examples are labeled
  • Most examples are unlabeled
  • Compute the similarity between examples Sij
  • Connect examples to their most similar examples
  • How to predicate labels for unlabeled nodes using
    this graph?

wij
Unlabeled example
4
Label Propagation
  • Forward propagation

5
Label Propagation
  • Forward propagation
  • Forward propagation

6
Label Propagation
  • Forward propagation
  • Forward propagation
  • Forward propagation
  • How to resolve conflicting cases

What label should be given to this node ?
7
Label Propagation
  • Let S be the similarity matrix SSi,jnxn
  • Let D be a diagonal matrix where Di åi ¹ j Si,j
  • Compute normalized similarity matrix S
  • SD-1/2SD-1/2
  • Let Y be the initial assignment of class labels
  • Yi 1 when the i-th node is assigned to the
    positive class
  • Yi -1 when the i-th node is assigned to the
    negative class
  • Yi 0 when the I-th node is not initially
    labeled
  • Let F be the predicted class labels
  • The i-th node is assigned to the positive class
    if Fi gt0
  • The i-th node is assigned to the negative class
    if Fi lt 0

8
Label Propagation
  • Let S be the similarity matrix SSi,jnxn
  • Let D be a diagonal matrix where Di åi ¹ j Si,j
  • Compute normalized similarity matrix S
  • SD-1/2SD-1/2
  • Let Y be the initial assignment of class labels
  • Yi 1 when the i-th node is assigned to the
    positive class
  • Yi -1 when the i-th node is assigned to the
    negative class
  • Yi 0 when the i-th node is not initially
    labeled
  • Let F be the predicted class labels
  • The i-th node is assigned to the positive class
    if Fi gt0
  • The i-th node is assigned to the negative class
    if Fi lt 0

9
Label Propagation
  • One iteration
  • F Y aSY (I aS)Y
  • a weights the propagation values
  • Two iteration
  • F Y aSY a2S2Y (I aS a2S2)Y
  • How about the infinite iteration
  • F (ån01anSn)Y (I - a S)-1Y
  • Any problems with such an approach?

10
Label Consistency Problem
  • Predicted vector F may not be consistent with the
    initially assigned class labels Y

11
Energy Minimization
  • Using the same notation
  • Si,j similarity between the I-th node and j-th
    node
  • Y initially assigned class labels
  • F predicted class labels
  • Energy E(F) åi,jSi,j(Fi Fj)2
  • Goal find label assignment F that is consistent
    with labeled examples Y and meanwhile minimizes
    the energy function E(F)

12
Harmonic Function
  • E(F) åi,jSi,j (Fi Fj)2 FT(D-S)F
  • Thus, the minimizer for E(F) should be (D-S)F
    0, and meanwhile F should be consistent with Y.
  • FT (FlT, FuT), YT (YlT, YuT)
  • Fl Yl

13
Optical Character Recognition
  • Given an image of a digit letter, determine its
    value

14
Optical Character Recognition
  • Labeled_ExamplesUnlabeled_Examples 4000
  • CMN label propagation
  • 1NN for each unlabeled example, using the label
    of its closest neighbor

15
Spectral Graph Transducer
  • Problem with harmonic function
  • Why this could happen ?
  • The condition (D-S)F 0 does not hold for
    constrained cases

16
Spectral Graph Transducer
  • Problem with harmonic function
  • Why this could happen ?
  • The condition (D-S)F 0 does not hold for
    constrained cases

17
Spectral Graph Transducer
  • minF FTLF c (F-Y)TC(F-Y)
  • s.t. FTFn, FTe 0
  • C is the diagonal cost matrix, Ci,i 1 if the
    i-th node is initially labeled, zero otherwise
  • Parameter c controls the balance between the
    consistency requirement and the requirement of
    energy minimization
  • Can be solved efficiently through the computation
    of eigenvector

18
Empirical Studies
19
Problems with Spectral Graph Transducer
  • minF FTLF c (F-Y)TC(F-Y)
  • s.t. FTFn, FTe 0
  • The obtained solution is different from the
    desirable one minimize the energy function and
    meanwhile is consistent with labeled examples Y
  • It is difficult to extend the approach to
    multi-class classification

20
Greens Function
  • The problem of minimizing energy and meanwhile
    being consistent with initially assigned class
    labels can be formulated into Greens function
    problem
  • Minimizing E(F) FTLF ? LF 0
  • Turns out L can be viewed as Laplacian operator
    in the discrete case
  • LF 0 ? r2F0
  • Thus, our problem is find solution F
  • r2F0, s.t. F Y for labeled examples
  • We can treat the constraint that F Y for
    labeled examples as boundary condition (Von
    Neumann boundary condition)
  • A standard Green function problem

21
Why Energy Minimization?
Final classification results
22
Label Propagation
  • How the unlabeled data help classification?

23
Label Propagation
  • How the unlabeled data help classification?
  • Consider a smaller number of unlabeled example
  • Classification results can be very different

24
Cluster Assumption
  • Cluster assumption
  • Decision boundary should pass low density area
  • Unlabeled data provide more accurate estimation
    of local density

25
Cluster Assumption vs. Maximum Margin
  • Maximum margin classifier (e.g. SVM)

w?xb
  • Maximum margin
  • ? low density around decision boundary
  • ? Cluster assumption
  • Any thought about utilizing the unlabeled data in
    support vector machine?

26
Transductive SVM
  • Decision boundary given a small number of labeled
    examples

27
Transductive SVM
  • Decision boundary given a small number of labeled
    examples
  • How will the decision boundary change given both
    labeled and unlabeled examples?

28
Transductive SVM
  • Decision boundary given a small number of labeled
    examples
  • Move the decision boundary to place with low
    local density

29
Transductive SVM
  • Decision boundary given a small number of labeled
    examples
  • Move the decision boundary to place with low
    local density
  • Classification results
  • How to formulate this idea?

30
Transductive SVM Formulation
  • Labeled data L
  • Unlabeled data D
  • Maximum margin principle for mixture of labeled
    and unlabeled data
  • For each label assignment of unlabeled data,
    compute its maximum margin
  • Find the label assignment whose maximum margin is
    maximized

31
Tranductive SVM
Different label assignment for unlabeled data ?
different maximum margin
32
Transductive SVM Formulation
33
Computational Issue
  • No longer convex optimization problem. (why?)
  • How to optimize transductive SVM?
  • Alternating optimization

34
Alternating Optimization
  • Step 1 fix yn1,, ynm, learn weights w
  • Step 2 fix weights w, try to predict yn1,,
    ynm (How?)

35
Empirical Study with Transductive SVM
  • 10 categories from the Reuter collection
  • 3299 test documents
  • 1000 informative words selected using MI criterion

36
Co-training for Semi-supervised Learning
  • Consider the task of classifying web pages into
    two categories category for students and
    category for professors
  • Two aspects of web pages should be considered
  • Content of web pages
  • I am currently the second year Ph.D. student
  • Hyperlinks
  • My advisor is
  • Students

37
Co-training for Semi-Supervised Learning
38
Co-training for Semi-Supervised Learning
It is easier to classify this web page using
hyperlinks
It is easy to classify the type of this web page
based on its content
39
Co-training
  • Two representation for each web page

Content representation (doctoral, student,
computer, university)
Hyperlink representation Inlinks Prof.
Cheng Oulinks Prof. Cheng
40
Co-training Classification Scheme
  1. Train a content-based classifier using labeled
    web pages
  2. Apply the content-based classifier to classify
    unlabeled web pages
  3. Label the web pages that have been confidently
    classified
  4. Train a hyperlink based classifier using the web
    pages that are initially labeled and labeled by
    the classifier
  5. Apply the hyperlink-based classifier to classify
    the unlabeled web pages
  6. Label the web pages that have been confidently
    classified

41
Co-training
  • Train a content-based classifier

42
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified

43
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified
  • Train a hyperlink-based classifier
  • Prof. outlinks to students

44
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified
  • Train a hyperlink-based classifier
  • Prof. outlinks to students
  • Label the unlabeled examples that are confidently
    classified

45
Co-training
  • Train a content-based classifier using labeled
    examples
  • Label the unlabeled examples that are confidently
    classified
  • Train a hyperlink-based classifier
  • Prof. outlinks to
  • Label the unlabeled examples that are confidently
    classified
Write a Comment
User Comments (0)
About PowerShow.com