Multimedia search: From Lab to Web - PowerPoint PPT Presentation

About This Presentation
Title:

Multimedia search: From Lab to Web

Description:

Learning is useful as a system construction method ... Sigmoid. Implicit mapping to a higher dimensional space where linear separation is possible. ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 52
Provided by: LambertS7
Category:

less

Transcript and Presenter's Notes

Title: Multimedia search: From Lab to Web


1
KI2 - 5
  • Computational Learning Theory
  • PAC
  • IID
  • VC Dimension
  • SVM

Marius Bulacu prof. dr. Lambert Schomaker
Kunstmatige Intelligentie / RuG
2
Learning
  • Learning is essential for unknown environments
  • i.e., when designer lacks omniscience
  • Learning is useful as a system construction
    method
  • i.e., expose the agent to reality rather than
    trying to write it down
  • Learning modifies the agent's decision
    mechanisms to improve performance

3
Learning Agents
4
Learning Element
  • Design of a learning element is affected by
  • Which components of the performance element are
    to be learned
  • What feedback is available to learn these
    components
  • What representation is used for the components
  • Type of feedback
  • Supervised learning correct answers for each
    example
  • Unsupervised learning correct answers not given
  • Reinforcement learning occasional rewards

5
Inductive Learning
  • Simplest form learn a function from examples
  • - f is the target function
  • - an example is a pair (x, f(x))
  • Problem find a hypothesis h
  • such that h f
  • given a training set of examples
  • This is a highly simplified model of real
    learning
  • - ignores prior knowledge
  • - assumes examples are given

6
Inductive Learning Method
  • Construct/adjust h to agree with f on training
    set
  • (h is consistent if it agrees with f on all
    examples)
  • E.g., curve fitting

7
Inductive Learning Method
  • Construct/adjust h to agree with f on training
    set
  • (h is consistent if it agrees with f on all
    examples)
  • E.g., curve fitting

8
Inductive Learning Method
  • Construct/adjust h to agree with f on training
    set
  • (h is consistent if it agrees with f on all
    examples)
  • E.g., curve fitting

9
Inductive Learning Method
  • Construct/adjust h to agree with f on training
    set
  • (h is consistent if it agrees with f on all
    examples)
  • E.g., curve fitting

10
Inductive Learning Method
  • Construct/adjust h to agree with f on training
    set
  • (h is consistent if it agrees with f on all
    examples)
  • E.g., curve fitting

11
Inductive Learning Method
  • Construct/adjust h to agree with f on training
    set
  • (h is consistent if it agrees with f on all
    examples)
  • E.g., curve fitting

Occams razor prefer the simplest hypothesis
consistent with data
12
Occams Razor
  • If two theories explain the facts equally well,
    then the simpler theory is to be preferred.
  • Rationale
  • There are fewer short hypotheses than long
    hypotheses.
  • A short hypothesis that fits the data is
    unlikely to be a coincidence.
  • A long hypothesis that fits the data may be a
    coincidence.
  • Formal treatment in computational learning theory

William of Occam (1285-1349, England)
13
The Problem
  • Why does learning work?
  • How do we know that the learned hypothesis h is
    close to the target function f if we do not know
    what f is?

answer provided by computational learning theory
14
The Answer
  • Any hypothesis h that is consistent with a
    sufficiently large number of training examples is
    unlikely to be seriously wrong.

Therefore it must be Probably Approximately
Correct PAC
15
The Stationarity Assumption
  • The training and test sets are drawn randomly
    from the same population of examples using the
    same probability distribution.

Therefore training and test data
are Independently and Identically
Distributed IID the future is like the past
16
How many examples are needed?
Probability of existence of a wrong hypothesis
consistent with all examples
Size of hypothesis space
Number of examples
Probability that h and f disagree on an example
Sample complexity
17
Formal Derivation
H (the set of all possible hypothese)
HBAD (the set of wrong hypotheses)
e
f
18
What if hypothesis space is infinite?
  • Cant use our result for finite H
  • Need some other measure of complexity for H
  • Vapnik-Chervonenkis dimension

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Shattering two binary dimensionsover a number of
classes
  • In order to understand the principle of
    shattering sample points into classes we will
    look at the simple case of
  • two dimensions
  • of binary value

23
2-D feature space
1
f2
0
1
0
f1
24
2-D feature space, 2 classes
1
f2
0
1
0
f1
25
the other class
1
f2
0
1
0
f1
26
2 left vs 2 right
1
f2
0
1
0
f1
27
top vs bottom
1
f2
0
1
0
f1
28
right vs left
1
f2
0
1
0
f1
29
bottom vs top
1
f2
0
1
0
f1
30
lower-right outlier
1
f2
0
1
0
f1
31
lower-left outlier
1
f2
0
1
0
f1
32
upper-left outlier
1
f2
0
1
0
f1
33
upper-right outlier
1
f2
0
1
0
f1
34
etc.
1
f2
0
1
0
f1
35
2-D feature space
1
f2
0
1
0
f1
36
2-D feature space
1
f2
0
1
0
f1
37
2-D feature space
1
f2
0
1
0
f1
38
XOR configuration A
1
f2
0
1
0
f1
39
XOR configuration B
1
f2
0
1
0
f1
40
2-D feature space, two classes 16 hypotheses
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
f10 f11 f20 f21
hypothesis possible class partioning of all
data samples
41
2-D feature space, two classes, 16 hypotheses
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
f10 f11 f20 f21
two XOR class configurations 2/16 of hypotheses
requires a non-linear separatrix
42
XOR, a possible non-linear separation
1
f2
0
1
0
f1
43
XOR, a possible non-linear separation
1
f2
0
1
0
f1
44
2-D feature space, three classes, hypotheses?
0 1 2 3 4 5 6 7 8
f10 f11 f20 f21

45
2-D feature space, three classes, hypotheses?
0 1 2 3 4 5 6 7 8
f10 f11 f20 f21

34 81 possible hypotheses
46
Maximum, discrete space
  • Four classes 44 256 hypotheses
  • Assume that there are no more classes than
    discrete cells
  • Nhypmax ncellsnclasses

47
2-D feature space, three classes
In this example, ? is linearly separatable from
the rest, as is ?. But ? is not linearly
separatable from the rest of the classes.
1
f2
0
1
0
f1
48
2-D feature space, four classes
1
f2
Minsky Papert simple table lookup or logic
will do nicely.
0
1
0
f1
49
2-D feature space, four classes
1
Spheres or radial-basis functions may offer a
compact class encapsulation in case of limited
noise and limited overlap (but in the end the
data will tell experimentation required!)
f2
0
1
0
f1
50
SVM (1) Kernels
  • Implicit mapping to a higher dimensional space
    where linear separation is possible.

f3
f2
f2
f1
f1
Complicated separation boundary
Simple separation boundary Hyperplane
51
SVM (2) Max Margin
Support vectors
f2
Good generalization
Best Separating Hyperplane
f1
Max Margin
  • From all the possible separating hyperplanes,
    select the one that gives Max Margin.
  • Solution found by Quadratic Optimization
    Learning.
Write a Comment
User Comments (0)
About PowerShow.com