An introduction to selftaught learning - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

An introduction to selftaught learning

Description:

Except the training data (labeled), a large set of test data ... Unlabeled data can be assigned with supervised learning task's class ... Multitask ... – PowerPoint PPT presentation

Number of Views:97
Avg rating:3.0/5.0
Slides: 31
Provided by: shankar2
Category:

less

Transcript and Presenter's Notes

Title: An introduction to selftaught learning


1
An introduction to self-taught learning
Raina et. al, 2007 Self-taught Learning
Transfer Learning from Unlabeled Data
  • Presented by Zenglin Xu
  • 10-09-2007

2
Outline
  • Related learning paradigms
  • A self-taught learning algorithm

3
Related learning paradigms
  • Semi-supervised learning
  • Transfer learning
  • Multi-task learning
  • Domain adaptation
  • Biased sample selection
  • Self-taught learning

4
Semi-supervised learning
  • Except the training data (labeled), a large set
    of test data (unlabeled) are available
  • The training data and test data are drawn from
    the same distribution
  • Unlabeled data can be assigned with supervised
    learning tasks class labels
  • Reference
  • Chapelle, et. al, 2006 Semi-supervised
    learning
  • Zhu, 2005 Semi-supervised learning literature
    survey

5
Transfer learning
  • Transfer Learning
  • The Theory of Transfer of Learning was introduced
    by Thorndike and Woodworth (1901). They explored
    how individuals would transfer learning in one
    context to another context that shared similar
    characteristics
  • Transfer of knowledge from one supervised task to
    other Requires labeled data from a different but
    related task
  • E.g., transferring the knowledge from Newsgroup
    data to Reuters data
  • Related work in computer science
  • Thrun Mitchell, 1995 Learning one more thing
  • Ando Zhang, 2005 A framework for learning
    predictive structures from multiple tasks and
    unlabeled data

6
Multi-task learning
  • It learns a problem together with other related
    problems at the same time, using a shared
    representation.
  • This often leads to a better model for the main
    task, because it allows the learner to use the
    commonality among the tasks.
  • Multi-task learning is a kind of inductive
    transfer.
  • It does this by learning tasks in parallel while
    using a shared representation what is learned
    for each task can help other tasks be learned
    better.
  • Reference,
  • Caruana, 1997 Multitask Learning
  • Ben-David Schuller, 2003 Exploiting task
    relatedness for multiple task learning

7
Domain adaptation
  • A term hot in language processing
  • Indeed, it can be called transfer learning
  • The supervised setting is usually like
  • A large pool of out-of-domain labeled data
  • A small pool of in-domain labeled data
  • Reference
  • Daume III, 2007 Frustratingly Easy Domain
    Adaptation
  • Daume III Marcu , 2006 Domain Adaptation for
    Statistical Classifiers
  • Ben-David et. al, 2006 Analysis of
    Representations for Domain Adaptation

8
Biased sample selection
  • Also called Covariance Shift
  • It deals with the case that the training data and
    test data are selected from different
    distributions in the same domain
  • The objective is to correct the bias
  • Reference
  • Shimodaira, 2000 Improving predictive inference
    under covariate shift
  • Zadrozny, 2004 Learning and evaluating
    classifiers under sample selection bias
  • Bickel et. al, 2007 Discriminative learning for
    differing training and test distributions

9
Self-taught learning
  • Self-taught learning
  • Uses unlabeled data
  • Does not require unlabeled data to have same
    generative distribution
  • The unlabeled data can have different labels as
    those of the supervised learning tasks data.
  • Reference
  • Raina et. al Self-taught learning transfer
    learning from unlabeled data

10
(No Transcript)
11
Outline
  • Related learning paradigms
  • A self-taught learning algorithm
  • Algorithm
  • Experiment

12
Sparse coding a self-taught learning algorithm
  • Learn high level feature representation using
    unlabeled data
  • E. g. random unlabeled images usually contain
    basic visual patterns (like edges) that are
    similar to images (like that of elephant) which
    needs to be classified
  • Apply the representation to the labeled data and
    use it for classification

13
Step 1 learning higher level representations
Given unlabeled data Optimize the
following where
are the basis
vectors and
are the activations
14
Bases learned from image patches and speech data
15
Step 2 apply the representation to the labeled
data and use it for classification
16
High-level features computed
Using a set of 512 learned image bases (Fig 2
left), Figure 3 illustrates a solution to the
previous optimization problem
17
High-level features computed
18
High-level features computed
19
(No Transcript)
20
Connection to PCA
21
Connection to PCA
  • PCA results in linear feature extraction, in that
    the features a(i)j are simply a linear function
    of the input.
  • The bases bj should be orthogonal, thus the
    number of PCA features cannot be greater than the
    dimension n of the input. Sparse coding does not
    have either of these limitations

22
Outline
  • Related Learning paradigms
  • A self-taught learning algorithm
  • Algorithm
  • Experiment

23
Experiment setting
24
Experiment setting
25
Experimental results on image
26
Experimental results on characters
27
Experimental results on music data
28
Experimental results on text data
29
Compare with results using features trained on
labeled data
Table 7. Accuracy on the self-taught learning
tasks when sparse coding bases are learned on
unlabeled data (third column), or when principal
components/sparse coding bases are learned on the
labeled training set (fourth/fth column).
30
Discussion
  • Is it useful to learn a high-level feature
    representation in a unified process using both
    the labeled data and the unlabeled data?
  • How the similarity between the labeled data and
    the unlabeled data affect the performance?
  • And more?
Write a Comment
User Comments (0)
About PowerShow.com