Multimedia search: From Lab to Web - PowerPoint PPT Presentation

About This Presentation

Title:

Multimedia search: From Lab to Web

Description:

Learning is useful as a system construction method ... Sigmoid. Implicit mapping to a higher dimensional space where linear separation is possible. ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 52

Provided by: LambertS7

Category:

more less

Transcript and Presenter's Notes

Title: Multimedia search: From Lab to Web

1
KI2 - 5

Computational Learning Theory
PAC
IID
VC Dimension
SVM

Marius Bulacu prof. dr. Lambert Schomaker
Kunstmatige Intelligentie / RuG
2
Learning

Learning is essential for unknown environments
i.e., when designer lacks omniscience
Learning is useful as a system construction
method
i.e., expose the agent to reality rather than
trying to write it down
Learning modifies the agent's decision
mechanisms to improve performance

3
Learning Agents
4
Learning Element

Design of a learning element is affected by
Which components of the performance element are
to be learned
What feedback is available to learn these
components
What representation is used for the components
Type of feedback
Supervised learning correct answers for each
example
Unsupervised learning correct answers not given
Reinforcement learning occasional rewards

5
Inductive Learning

Simplest form learn a function from examples
- f is the target function
- an example is a pair (x, f(x))
Problem find a hypothesis h
such that h f
given a training set of examples
This is a highly simplified model of real
learning
- ignores prior knowledge
- assumes examples are given

6
Inductive Learning Method

Construct/adjust h to agree with f on training
set
(h is consistent if it agrees with f on all
examples)
E.g., curve fitting

7
Inductive Learning Method

Construct/adjust h to agree with f on training
set
(h is consistent if it agrees with f on all
examples)
E.g., curve fitting

8
Inductive Learning Method

Construct/adjust h to agree with f on training
set
(h is consistent if it agrees with f on all
examples)
E.g., curve fitting

9
Inductive Learning Method

Construct/adjust h to agree with f on training
set
(h is consistent if it agrees with f on all
examples)
E.g., curve fitting

10
Inductive Learning Method

Construct/adjust h to agree with f on training
set
(h is consistent if it agrees with f on all
examples)
E.g., curve fitting

11
Inductive Learning Method

Construct/adjust h to agree with f on training
set
(h is consistent if it agrees with f on all
examples)
E.g., curve fitting

Occams razor prefer the simplest hypothesis
consistent with data
12
Occams Razor

If two theories explain the facts equally well,
then the simpler theory is to be preferred.
Rationale
There are fewer short hypotheses than long
hypotheses.
A short hypothesis that fits the data is
unlikely to be a coincidence.
A long hypothesis that fits the data may be a
coincidence.
Formal treatment in computational learning theory

William of Occam (1285-1349, England)
13
The Problem

Why does learning work?
How do we know that the learned hypothesis h is
close to the target function f if we do not know
what f is?

answer provided by computational learning theory
14
The Answer

Any hypothesis h that is consistent with a
sufficiently large number of training examples is
unlikely to be seriously wrong.

Therefore it must be Probably Approximately
Correct PAC
15
The Stationarity Assumption

The training and test sets are drawn randomly
from the same population of examples using the
same probability distribution.

Therefore training and test data
are Independently and Identically
Distributed IID the future is like the past
16
How many examples are needed?
Probability of existence of a wrong hypothesis
consistent with all examples
Size of hypothesis space
Number of examples
Probability that h and f disagree on an example
Sample complexity
17
Formal Derivation
H (the set of all possible hypothese)
HBAD (the set of wrong hypotheses)
e
f
18
What if hypothesis space is infinite?

Cant use our result for finite H
Need some other measure of complexity for H
Vapnik-Chervonenkis dimension

19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
Shattering two binary dimensionsover a number of
classes

In order to understand the principle of
shattering sample points into classes we will
look at the simple case of
two dimensions
of binary value

23
2-D feature space
1
f2
0
1
0
f1
24
2-D feature space, 2 classes
1
f2
0
1
0
f1
25
the other class
1
f2
0
1
0
f1
26
2 left vs 2 right
1
f2
0
1
0
f1
27
top vs bottom
1
f2
0
1
0
f1
28
right vs left
1
f2
0
1
0
f1
29
bottom vs top
1
f2
0
1
0
f1
30
lower-right outlier
1
f2
0
1
0
f1
31
lower-left outlier
1
f2
0
1
0
f1
32
upper-left outlier
1
f2
0
1
0
f1
33
upper-right outlier
1
f2
0
1
0
f1
34
etc.
1
f2
0
1
0
f1
35
2-D feature space
1
f2
0
1
0
f1
36
2-D feature space
1
f2
0
1
0
f1
37
2-D feature space
1
f2
0
1
0
f1
38
XOR configuration A
1
f2
0
1
0
f1
39
XOR configuration B
1
f2
0
1
0
f1
40
2-D feature space, two classes 16 hypotheses
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
f10 f11 f20 f21
hypothesis possible class partioning of all
data samples
41
2-D feature space, two classes, 16 hypotheses
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
f10 f11 f20 f21
two XOR class configurations 2/16 of hypotheses
requires a non-linear separatrix
42
XOR, a possible non-linear separation
1
f2
0
1
0
f1
43
XOR, a possible non-linear separation
1
f2
0
1
0
f1
44
2-D feature space, three classes, hypotheses?
0 1 2 3 4 5 6 7 8
f10 f11 f20 f21

45
2-D feature space, three classes, hypotheses?
0 1 2 3 4 5 6 7 8
f10 f11 f20 f21

34 81 possible hypotheses
46
Maximum, discrete space

Four classes 44 256 hypotheses
Assume that there are no more classes than
discrete cells
Nhypmax ncellsnclasses

47
2-D feature space, three classes
In this example, ? is linearly separatable from
the rest, as is ?. But ? is not linearly
separatable from the rest of the classes.
1
f2
0
1
0
f1
48
2-D feature space, four classes
1
f2
Minsky Papert simple table lookup or logic
will do nicely.
0
1
0
f1
49
2-D feature space, four classes
1
Spheres or radial-basis functions may offer a
compact class encapsulation in case of limited
noise and limited overlap (but in the end the
data will tell experimentation required!)
f2
0
1
0
f1
50
SVM (1) Kernels

Implicit mapping to a higher dimensional space
where linear separation is possible.

f3
f2
f2
f1
f1
Complicated separation boundary
Simple separation boundary Hyperplane
51
SVM (2) Max Margin
Support vectors
f2
Good generalization
Best Separating Hyperplane
f1
Max Margin