Chapter 6 The Structural Risk Minimization Principle - PowerPoint PPT Presentation

1 / 108
About This Presentation
Title:

Chapter 6 The Structural Risk Minimization Principle

Description:

Given l pairs containing the vector x and the binary value ? ... For a given set of functions, how can we construct a code book with a small ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 109
Provided by: zjp
Category:

less

Transcript and Presenter's Notes

Title: Chapter 6 The Structural Risk Minimization Principle


1
Chapter 6 The Structural Risk Minimization
Principle
  • Junping Zhang
  • jpzhang_at_fudan.edu.cn
  • Intelligent Information Processing Laboratory,
    Fudan University
  • March 23, 2004

2
Objectives
3
Structural risk minimization
4
Two other induction principles
5
The Scheme of the SRM induction principle
6
Real-Valued functions
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Principle of SRM
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
SRM
22
(No Transcript)
23
Minimum Description Length and SRM inductive
principles
  • The idea about the Nature of Random Phenomena
  • Minimum Description Length Principle for the
    Pattern Recognition Problem
  • Bounds for the MDL
  • SRM for the simplest Model and MDL
  • The Shortcoming of the MDL

24
The idea about the Nature of Random Phenomena
  • Probability theory (1930s, Kolmogrov)
  • Formal inference
  • Axiomatization hasnt considered nature of
    randomness
  • Axioms given probability measures

25
The idea about the Nature of Random Phenomena
  • The model of randomness
  • Solomonoff (1965), Kolmogrov (1965), Chaitin
    (1966).
  • Algorithm (descriptive) complexity
  • The length of the shortest binary computer
    program
  • Up to an additive constant does not depend on the
    type of computer.
  • Universal characteristic of the object.

26
  • A relatively large string describing an object
    is random
  • If algorithm complexity of an object is high
  • If the given description of an object cannot be
    compressed significantly.
  • MML (Wallace and Boulton, 1968) MDL (Rissanen,
    1978)
  • Algorithm Complexity as a main tool of induction
    inference of learning machines

27
Minimum Description Length Principle for the
Pattern Recognition Problem
  • Given l pairs containing the vector x and the
    binary value ?
  • Consider two strings the binary string

28
Question
  • Q Given (147), is the string (146) a random
    object?
  • A to analyze the complexity of the string (146)
    in the spirit of Solomonoff-Kolmogorov-Chaitin
    ideas

29
Compress its description
  • Since ?i i1,l are binary values, the string
    (146) is described by l bits.
  • Since training pairs were drawn randomly and
    independently.
  • The value ?i depend on the vector xi but not on
    the vector xj.

30
Model
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
General Case not contain the perfect table.
35
(No Transcript)
36
Randomness
37
Bounds for the MDL
  • Q
  • Does the compression coefficient K(T) determine
    the probability of the test error in
    classification (decoding) vectors x by the table
    T?
  • A
  • Yes

38
Comparison between the MDL and ERM in the
simplest model
39
(No Transcript)
40
(No Transcript)
41
SRM for the simplest Model and MDL
42
SRM for the simplest Model and MDL
43
(No Transcript)
44
The power of compression coefficient
  • To obtain bound for the probability of error
  • Only information about the coefficient need to be
    known.

45
The power of compression coefficient
  • How many examples we used
  • How the structure of code books was organized
  • Which code book was used and how many tables were
    in this code book.
  • How many errors were made by the table from the
    code book we used.

46
MDL principle
  • To minimize the probability of error
  • One has to minimize the coefficient of compression

47
The shortcoming of the MDL
  • MDL uses code books with a finite number of
    tables.
  • Continuously depends on parameters, one has to
    first quantize that set to make the tables.

48
Quantization
  • How do we make the smart quantization for a
    given number of observations.
  • For a given set of functions, how can we
    construct a code book with a small number of
    tables but with good approximation ability?

49
The shortcoming of the MDL
  • Finding a good quantization is extremely
    difficult and determines the main shortcoming of
    MDL principle.
  • The MDL principle works well when the problem of
    constructing reasonable code books has a good
    solution.

50
Consistency of the SRM principle and asymptotic
bounds on the rate of convergence
  • Q
  • Is the SRM consistent?
  • What is the bound on the (asymptotic) rate of
    convergence?

51
(No Transcript)
52
(No Transcript)
53
Consistency of the SRM principle.
54
Simplification version
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
Remark
  • To avoid choosing the minimum of functional (156)
    over the infinite number of elements of the
    structure.
  • Additional constraint
  • Choose the minimum from the first l elements of
    the structure where l is equal to the number of
    observations.

60
(No Transcript)
61
(No Transcript)
62
Discussions and Example
63
  • The rate of convergence is determined by two
    contradictory requirements on the rule nn(l).
  • The first summand The larger nn(l) , the
    smaller is the deviation
  • The second summand The larger nn(l), the larger
    deviation
  • For structures with a known bound on the rate of
    approximation, select the rule that assures the
    largest rate of convergence.

64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
(No Transcript)
69
(No Transcript)
70
Bounds for the regression estimation problem
71
The model of regression estimation by series
expansion
72
(No Transcript)
73
(No Transcript)
74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
Example
78
(No Transcript)
79
(No Transcript)
80
The problem of approximating functions
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
  • To get high asymptotic rate of approximation
  • the only constraint is that
  • the kernel should be a bounded function which can
    be described as a family of functions possessing
    finite VC dimension.

92
Problem of local risk minimization
93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
Local Risk Minimization Model
97
(No Transcript)
98
(No Transcript)
99
(No Transcript)
100
(No Transcript)
101
(No Transcript)
102
(No Transcript)
103
(No Transcript)
104
(No Transcript)
105
(No Transcript)
106
Note
  • Using local risk minimization methods, one
    probably does not need rich sets of approximating
    functions.
  • Whereas the classical semi-local methods are
    based on using a set of constant functions.

107
Note
  • For local estimation functions in the
    one-dimensional case, it is probably enough to
    consider elements Sk, k0,1,2,3 containing the
    polynomials of degree 0,1,2,3

108
Summary
  • MDL
  • SRM
  • Local Risk Functional
Write a Comment
User Comments (0)
About PowerShow.com