Semisupervised Classification - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Semisupervised Classification

Description:

... the affinity matrix W defined by Wij = exp(-||xi-xj||2 /2 2) if i j and Wii = 0. ... Not all unlabeled data fit in one class. Reference ... – PowerPoint PPT presentation

Number of Views:350
Avg rating:3.0/5.0
Slides: 44
Provided by: publi
Category:

less

Transcript and Presenter's Notes

Title: Semisupervised Classification


1
Semi-supervised Classification
Jieping Ye Department of Computer Science and
Engineering Arizona State University http//www.pu
blic.asu.edu/jye02
2
Outline of lecture
  • Overview of semi-supervised clustering
  • What is semi-supervised classification?
  • Algorithms for semi-supervised classification
  • Graph mincuts
  • Harmonic approach
  • Consistency approach
  • Transductive SVM

3
Overview of semi-supervised clustering
  • Domain knowledge
  • Partial label information is given
  • Apply some constraints (must-links and
    cannot-links)
  • Approaches
  • Search-based Semi-Supervised Clustering
  • Alter the clustering algorithm using the
    constraints
  • Similarity-based Semi-Supervised Clustering
  • Alter the similarity measure based on the
    constraints
  • Combination of both

4
What is semi-supervised classification?
  • Use small number of labeled data to label large
    amount of unlabeled data.
  • Labeling is expensive
  • Basic idea
  • Similar data should have the same class label.
  • Typical example
  • Web page classification
  • Document classification
  • Protein classification

5
Problem setting
  • X x1..xl,xl1..xn ? Rm
  • Label set L 0,1
  • The first l points have been labeled yi ?0,1,
    i1,..,l.
  • For points with il, yi is unknown.
  • The error is checked on the unlabeled examples
    only.

6
Problem setting
Labeled data
Labels of labeled data (0 or 1)
Goal predict the labels of unlabeled data
X
y
unlabeled data
7
The cluster assumption
  • The basic assumption of most Semi-Supervised
    learning algorithms
  • Nearby points are likely to have the same label.
  • Two points that are connected by a path going
    through high density regions should have the same
    label.

8
Cluster Assumption Example
9
Cluster Assumption Example
10
Cluster Assumption Example
11
Semi-supervised classification algorithms
  • Semi-supervised EM NigamML00
  • Co-training BlumCOLT98
  • Graph based algorithms BlumICML01,
    JoachimsICML03,ZhuICML03,ZHOUNIPS03
  • Transductive SVM Vapnik98,JoachimsICML99

12
Graph based approaches
  • Construct the graph G (V,E) corresponding to
    the n data points.
  • The nxn weight matrix W on this graph is given as

13
Construct the graph
W is the similarity matrix
14
Graph based algorithms
  • Graph based algorithms for semi-supervised
    classification
  • Graph mincuts
  • Harmonic approach
  • Consistency approach
  • Many others
  • Basics
  • Build the weighted graph
  • Solve an optimization problem
  • Use objective function based on cluster
    assumption

15
Graph mincuts
  • Paper Learning from Labeled and Unlabeled Data
    using Graph Mincuts. Blum and Chawla, ICML2001.
  • Build a weighted graph G (V,W), where
  • Set
  • Determine a minimum cut for
    the graph
  • Use the max-flow algorithm in which is
    the source and is the sink, and the
    weights are treated as capacities.

16
Graph mincuts
Solved by linear programming
17
Harmonic approach
  • Paper Semi-Supervised Learning Using Gaussian
    Fields and Harmonic functions. Zhu and et al.
  • Basics
  • Build the weighted graph
  • The labels on the labeled data are fixed
  • Determine the labels of the unlabeled data based
    on the cluster Assumption

18
Intuition
Non-differentiable
f discrete
Determine the labels via thresholding
The values of f on labeled data are fixed.
19
Main idea
  • Define a real-valued function f V ? R on G with
    certain properties.
  • Goal determine the label of unlabeled data by f.
  • Intuition Nearby points in the graph have the
    same label.

Optimization problem Compute optimal f such that
E(f) is minimized, subject to the constraint that
the values of f on labeled data are fixed.
20
Harmonic function
  • The optimization problem
  • The optimal solution f is harmonic

on unlabeled points
21
Optimal solution in matrix form
22
Comparison with graph mincuts
  • Label function f
  • Continuous versus discrete
  • Objective function
  • Similar cluster assumption
  • Difference differentiable versus
    non-differentiable
  • Computation
  • Matrix computations versus linear programming
    (max-flow algorithm)

23
Consistency approach
  • Paper Learning with local and global
    consistency. Zhou and et al.
  • Key ideas
  • Use the labeled points as sources to pump the
    different classes labels via the graph, and use
    the new labeled points as additional source until
    a stable stage has been reached.
  • The label of each unlabeled point is set to be
    the class of which it has received most
    information during the iteration process.

24
Notations
  • X x1..xl,xl1..xn ? Rm and Label set L
    0,1
  • The first l points have been labeled yi ?0,1.
    For points with il, yi is unknown.
  • The classification will be presented on a
    non-negative vector F.
  • yi 0 if Fi 0.5.
  • Let Y be a vector with elements Yi 1 if point i
    has a label yi 1 or 0 otherwise.
  • For multi-class problem, the classification will
    be presented on an n x k non-negative matrix
    F.
  • The classification of point xi will be yi
    argmax j

25
Main idea
26
The Main Algorithm
  • Form the affinity matrix W defined by
    Wij exp(-xi-xj2 /2?2) if i ?j
    and Wii 0.
  • Compute the matrix S defined by S D-½ W D- ½
  • D is a diagonal matrix with its (i,i) element
    equal to the sum of the i-th row of W. The
    eigenvalues of S represents the spectral clusters
    of the data.
  • Iterate F(t1) ?SF(t) (1-?)Y until
    convergence. ??(0, 1).
  • Let F denote the limit of the sequence
    F(t).
  • Label the unlabeled point xi by
    yi 0 if Fi 0.5.

27
Consistency Algorithm Convergence
  • Show the algorithm convergence to
    F (1-?)(I -?S)-1Y
  • Without loss of generality, let F(0) Y.
  • F(t1) ?SF(t) (1-?)Y
  • And therefore F(t) (?S)tY (1-?)?t-1i0
    (?S)iY.

28
Consistency Algorithm Convergence
  • Show the algorithm convergence to F
    (1-?)(I -?S)-1Y
  • F(t) (?S)tY (1-?)?t-1i0 (?S)iY.
  • Since 01
  • lim t?? (?S)t-1 0
  • lim t?? ?i0t-1 (?S)i (I -?S)-1
  • Hence F lim t?? F(t) (1-?)(I -?S)-1Y

29
Regularization Framework
  • Define a cost function for the iteration stage
  • The classifiying function is
  • smoothness constraint a good classifying
    function should not change too much between
    nearby points.

30
Regularization Framework
  • fitting constraint a good classifying function
    should not change too much from the initial label
    assignment.
  • ? 0 Trade-off between constraints

31
Regularization Framework
32
Results Two Moon Toy Problem
33
Results Two Moon Toy Problem
34
Results Two Moon Toy Problem
35
Results Two Moon Toy Problem
36
Experiments
Source Learning with Local and Global
Consistency. Zhou and et al.
37
Discussion
  • Graph mincuts
  • Harmonic approach
  • Consistency approach
  • Objective function
  • Graph mincuts and harmonic approach preserve
    labels of the labeled data
  • Labels are preserved
  • Consistency applies a penalty term for labeled
    data
  • Labels may change

38
Semi-supervised classification algorithms
  • Semi-supervised EM NigamML00
  • Co-training BlumCOLT98
  • Graph based algorithms BlumICML2001,
    JoachimsICML2003,ZhuICML2003,ZHOUNIPS2003
  • Transductive SVM Vapnik98,JoachimsICML99

39
Transductive SVM Formulation
40
Transductive SVM Intuition
41
Transductive SVM An extension
42
Reference
  • Learning from Labeled and Unlabeled Data using
    Graph Mincuts
  • http//www.cs.cmu.edu/afs/cs.cmu.edu/Web/People/av
    rim/Papers/mincut.ps
  • Learning with Local and Global Consistency
  • http//www.kyb.mpg.de/publications/pdfs/pdf2333.pd
    f
  • Semi-Supervised Learning Using Gaussian Fields
    and Harmonic Functions
  • http//www.hpl.hp.com/conferences/icml2003/papers/
    132.pdf
  • Transductive Inference for Text Classification
    using Support Vector Machines
  • http//www.cs.cornell.edu/People/tj/publications/j
    oachims_99c.pdf
  • Semi-Supervised Classification by Low Density
    Separation
  • http//eprints.pascal-network.org/archive/00000388
    /01/pdf2899.pdf

43
Next class
  • Topics
  • Feature reduction (PCA, CCA)
  • Readings
  • Geometric Methods for Feature Extraction and
    Dimensional Reduction
  • http//www.public.asu.edu/jye02/CLASSES/Fall-2005
    /PAPERS/Burge-featureextraction.pdf
Write a Comment
User Comments (0)
About PowerShow.com