CS 60050 Machine Learning - PowerPoint PPT Presentation

About This Presentation
Title:

CS 60050 Machine Learning

Description:

Humans are unable to explain their expertise (speech recognition) ... Data representation: Greyscale images. Task: Classification (0,1,2,3.9) Problem features: ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 22
Provided by: facwebIit
Category:

less

Transcript and Presenter's Notes

Title: CS 60050 Machine Learning


1
CS 60050 Machine Learning
2
What is Machine Learning?
  • Adapt to / learn from data
  • To optimize a performance function
  • Can be used to
  • Extract knowledge from data
  • Learn tasks that are difficult to formalise
  • Create software that improves over time

3
  • When to learn
  • Human expertise does not exist (navigating on
    Mars)
  • Humans are unable to explain their expertise
    (speech recognition)
  • Solution changes in time (routing on a computer
    network)
  • Solution needs to be adapted to particular cases
    (user biometrics)
  • Learning involves
  • Learning general models from data
  • Data is cheap and abundant. Knowledge is
    expensive and scarce
  • Customer transactions to computer behaviour
  • Build a model that is a good and useful
    approximation to the data

4
Applications
  • Speech and hand-writing recognition
  • Autonomous robot control
  • Data mining and bioinformatics motifs,
    alignment,
  • Playing games
  • Fault detection
  • Clinical diagnosis
  • Spam email detection
  • Credit scoring, fraud detection
  • Web mining search engines
  • Market basket analysis,
  • Applications are diverse but methods are generic

5
Generic methods
  • Learning from labelled data (supervised learning)
  • Eg. Classification, regression, prediction,
    function approx.
  • Learning from unlabelled data (unsupervised
    learning)
  • Eg. Clustering, visualisation, dimensionality
    reduction
  • Learning from sequential data
  • Eg. Speech recognition, DNA data analysis
  • Associations
  • Reinforcement Learning

6
Statistical Learning
  • Machine learning methods can be unified within
    the framework of statistical learning
  • Data is considered to be a sample from a
    probability distribution.
  • Typically, we dont expect perfect learning but
    only probably correct learning.
  • Statistical concepts are the key to measuring our
    expected performance on novel problem instances.

7
Induction and inference
  • Induction Generalizing from specific examples.
  • Inference Drawing conclusions from possibly
    incomplete knowledge.
  • Learning machines need to do both.

8
Inductive learning
  • Data produced by target.
  • Hypothesis learned from data in order to
    explain, predict,model or control target.
  • Generalisation ability is essential.
  • Inductive learning hypothesis
  • If the hypothesis works for enough data
  • then it will work on new examples.

9
Example 1 Hand-written digits
  • Data representation Greyscale images
  • Task Classification (0,1,2,3..9)
  • Problem features
  • Highly variable inputs from same class including
    some weird inputs,
  • imperfect human classification,
  • high cost associated with errors so dont know
    may be useful.

10
(No Transcript)
11
Example 2 Speech recognition
  • Data representation features from spectral
    analysis of speech signals (two in this simple
    example).
  • Task Classification of vowel sounds in words of
    the form h-?-d
  • Problem features
  • Highly variable data with same classification.
  • Good feature selection is very important.
  • Speech recognition is often broken into a number
    of smaller tasks like this.

12
(No Transcript)
13
Example 3 DNA microarrays
  • DNA from 10000 genes attached to a glass slide
    (the microarray).
  • Green and red labels attached to mRNA from two
    different samples.
  • mRNA is hybridized (stuck) to the DNA on the chip
    and green/red ratio is used to measure relative
    abundance of gene products.

14
(No Transcript)
15
DNA microarrays
  • Data representation 10000 Green/red intensity
    levels ranging from 10-10000.
  • Tasks Sample classification, gene
    classification, visualisation and clustering of
    genes/samples.
  • Problem features
  • High-dimensional data but relatively small number
    of examples.
  • Extremely noisy data (noise signal).
  • Lack of good domain knowledge.

16
Projection of 10000 dimensional data onto 2D
using PCA effectively separates cancer subtypes.
17
Probabilistic models
  • A large part of the module will deal with methods
  • that have an explicit probabilistic
    interpretation
  • Good for dealing with uncertainty
  • eg. is a handwritten digit a three or an eight ?
  • Provides interpretable results
  • Unifies methods from different fields

18
Relevant disciplines
  • Algorithms
  • Artificial intelligence
  • Control
  • Statistics
  • Information theory
  • Dynamical systems
  • Neurobiology
  • Signal processing
  • Linear algebra
  • Etc, etc ..
  • Researchers in machine
  • learning come from
  • a variety of backgrounds.

19
Text books
  • E. Alpaydins Introduction to Machine Learning
  • T. Mitchells Machine Learning

20
Supervised Learning Uses
  • Prediction of future cases
  • Knowledge extraction
  • Compression
  • Outlier detection

21
Unsupervised Learning
  • Clustering grouping similar instances
  • Example applications
  • Customer segmentation in CRM
  • Learning motifs in bioinformatics
  • Clustering items based on similarity
  • Clustering users based on interests

22
Reinforcement Learning
  • Learning a policy A sequence of outputs
  • No supervised output but delayed reward
  • Credit assignment problem
  • Game playing
  • Robot in a maze
  • Multiple agnts, partial observability
Write a Comment
User Comments (0)
About PowerShow.com