Machine Learning Tutorial

About This Presentation

Title:

Machine Learning Tutorial

Description:

Goal: Automatically distinguish between spam and non-spam email messages. Spam message 1 ... Discriminative Approach ... Discriminative Learning ... – PowerPoint PPT presentation

Number of Views:231

Avg rating:3.0/5.0

Slides: 27

Provided by: Ami89

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning Tutorial

1
Machine Learning Tutorial

Amit Gruber
The Hebrew University of Jerusalem

2
Example Spam Filter

Spam message unwanted email message
Dozens or even hundreds per day
Goal Automatically distinguish between spam and
non-spam email messages

3
Spam message 1
4
Spam message 2
5
Spam message 3
6
Spam message 4
7
How to Distinguish ?

Message contents ?
Automatic semantic analysis is yet to be solved
Message sender ?
What about unfamiliar senders or fake senders ?
Collection of keywords ?
Message Length ?
Mail server ? Time of delivery ?

8
How to Distinguish ?

Its hard to define an explicit set of rules to
distinguish between spam and non-spam
Learn the concept of spam from examples !

Machine Learning !
9
Example Gender Classification
10
The Power of LearningReal Life example

How much time does it take you to get to work ?
First approach Analyze your route
Distance, traffic lights, traffic, etc
Can be quite complicated
Second Approach how much time does it usually
take ?
Despite of some variance, works remarkably well!
Requires training for different times
May fail in special cases

11
Machine Translation
12
Collaborative Filtering

Collaborative Filtering Prediction of user
ratings based on the ratings of other users
Examples
Movie ratings
Product recommendation
Is this of merely theoretical interest ??

13
Netflix Prize
Over 100 million ratings from 480 thousand
customers over 17000 movie titles (sparsity
0.0123)
14
Recommendation system
15
Machine Learning Applications

Search Engines
Collaborative Filtering (Netflix, Amazon)
Face, speech and pattern Recognition
Machine Translation
Natural language processing
Medical diagnosis and treatment
Bioinformatics
Computer games
Many more !

16
Generalization Train vs. Test

The central assumption we make is that the train
set and the new examples are similar
Formally, the assumption is that samples are
drawn from the same distribution
Is this assumption realistic ?

17
Train vs. Test Might Fail to Generalize
18
Acquiring a good train set

Have a huge train set
Train data might be available on the web
Use humans to collect data
Collect results (or aggregations thereof) of user
actions
Unsupervised methods require only raw data, no
need for labels !

19
Machine Learning Strategies

Discriminative Approach
Feature selection find the features that carry
the most information for separation
Generative Approach
Model the data using a generative process
Estimate the parameters of the model

20
Supervised vs. Unsupervised