Machine Learning Tutorial - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Machine Learning Tutorial

Description:

Goal: Automatically distinguish between spam and non-spam email messages. Spam message 1 ... Discriminative Approach ... Discriminative Learning ... – PowerPoint PPT presentation

Number of Views:231
Avg rating:3.0/5.0
Slides: 27
Provided by: Ami89
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning Tutorial


1
Machine Learning Tutorial
  • Amit Gruber
  • The Hebrew University of Jerusalem

2
Example Spam Filter
  • Spam message unwanted email message
  • Dozens or even hundreds per day
  • Goal Automatically distinguish between spam and
    non-spam email messages

3
Spam message 1
4
Spam message 2
5
Spam message 3
6
Spam message 4
7
How to Distinguish ?
  • Message contents ?
  • Automatic semantic analysis is yet to be solved
  • Message sender ?
  • What about unfamiliar senders or fake senders ?
  • Collection of keywords ?
  • Message Length ?
  • Mail server ? Time of delivery ?

8
How to Distinguish ?
  • Its hard to define an explicit set of rules to
    distinguish between spam and non-spam
  • Learn the concept of spam from examples !

Machine Learning !
9
Example Gender Classification
10
The Power of LearningReal Life example
  • How much time does it take you to get to work ?
  • First approach Analyze your route
  • Distance, traffic lights, traffic, etc
  • Can be quite complicated
  • Second Approach how much time does it usually
    take ?
  • Despite of some variance, works remarkably well!
  • Requires training for different times
  • May fail in special cases

11
Machine Translation
12
Collaborative Filtering
  • Collaborative Filtering Prediction of user
    ratings based on the ratings of other users
  • Examples
  • Movie ratings
  • Product recommendation
  • Is this of merely theoretical interest ??

13
Netflix Prize
Over 100 million ratings from 480 thousand
customers over 17000 movie titles (sparsity
0.0123)
14
Recommendation system
15
Machine Learning Applications
  • Search Engines
  • Collaborative Filtering (Netflix, Amazon)
  • Face, speech and pattern Recognition
  • Machine Translation
  • Natural language processing
  • Medical diagnosis and treatment
  • Bioinformatics
  • Computer games
  • Many more !

16
Generalization Train vs. Test
  • The central assumption we make is that the train
    set and the new examples are similar
  • Formally, the assumption is that samples are
    drawn from the same distribution
  • Is this assumption realistic ?

17
Train vs. Test Might Fail to Generalize
18
Acquiring a good train set
  • Have a huge train set
  • Train data might be available on the web
  • Use humans to collect data
  • Collect results (or aggregations thereof) of user
    actions
  • Unsupervised methods require only raw data, no
    need for labels !

19
Machine Learning Strategies
  • Discriminative Approach
  • Feature selection find the features that carry
    the most information for separation
  • Generative Approach
  • Model the data using a generative process
  • Estimate the parameters of the model

20
Supervised vs. Unsupervised
  • Supervised Machine Learning
  • Classification (learning)
  • Collection of large representative train set
    might not be simple
  • Unsupervised Machine Learning
  • Clustering
  • The number of clusters may be known or unknown
  • Usually plenty of train data is available

21
Discriminative Learning
  • Data representation and Feature selection What
    is relevant for classification ?
  • Gender classification hair, ears, make up,
    beard, moustache, etc.
  • Linear Separation
  • SVM, Fisher LDA, Perceptron and more
  • Different criteria for separation what would
    generalize well ?
  • Non-linear separation

22
Linear Separation
23
Nonlinear Separation(Kernel Trick)
24
Generative Approach
  • Model the observations using a generative process
  • The generative process induces a distribution
    over the observations
  • Learn a set of parameters

25
Statistical Approach Real Life Example
  • Youre stuck in traffic. Which Lane is faster?
  • The complicated approach
  • Consider the traffic, trucks, merging lanes, etc.
  • The statistical (Bayesian) Approach
  • Which lane is usually faster ? (prior)
  • What are you seeing ? (evidence)

26
Summary
  • Machine Learning Learn a concept from examples
  • For good generalization, train data has to
    faithfully represent test data
  • Many potential applications
  • Already in use and works remarkably well
Write a Comment
User Comments (0)
About PowerShow.com