Part II: Practical Implementations. - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Part II: Practical Implementations.

Description:

Part II: Practical Implementations. – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 79
Provided by: spra153
Category:

less

Transcript and Presenter's Notes

Title: Part II: Practical Implementations.


1
Part II Practical Implementations.
2
Modeling the Classes
Stochastic Discrimination
3
Algorithm for Training a SD Classifier
Generate projectable weak model
Evaluate model w.r.t. training set, check
enrichment
Check uniformity w.r.t. existing collection
Add to discriminant
4
Dealing with Data GeometrySD in Practice
5
2D Example
  • Adapted from Kleinberg, PAMI, May 2000

6
  • An r1/2 random subset in the feature space
    that covers ½ of all the points

7
  • Watch how many such subsets cover a particular
    point, say, (2,17)

(2,17)
8
In
Out
In
Its in 1/2 models Y ½ 0.5
Its in 2/3 models Y 2/3 0.67
Its in 0/1 models Y 0/1 0.0
In
In
In
Its in 3/4 models Y ¾ 0.75
Its in 4/5 models Y 4/5 0.8
Its in 5/6 models Y 5/6 0.83
9
In
In
Out
Its in 6/8 models Y 6/8 0.75
Its in 7/9 models Y 7/9 0.77
Its in 5/7 models Y 5/7 0.72
In
Out
Out
Its in 8/10 models Y 8/10 0.8
Its in 8/11 models Y 8/11 0.73
Its in 8/12 models Y 8/12 0.67
10
  • Fraction of r1/2 random subsets covering point
    (2,17) as more such subsets are generated

11
  • Fractions of r1/2 random subsets covering
    several selected points as more such subsets are
    generated

12
  • Distribution of model coverage for all points in
    space, with 100 models

13
  • Distribution of model coverage for all points in
    space, with 200 models

14
  • Distribution of model coverage for all points in
    space, with 300 models

15
  • Distribution of model coverage for all points in
    space, with 400 models

16
  • Distribution of model coverage for all points in
    space, with 500 models

17
  • Distribution of model coverage for all points in
    space, with 1000 models

18
  • Distribution of model coverage for all points in
    space, with 2000 models

19
  • Distribution of model coverage for all points in
    space, with 5000 models

20
  • Introducing enrichment
  • For any discrimination to happen, the models
    must have some difference in coverage for
    different classes.

21
  • Enforcing enrichment (adding in a bias) require
    each subset to cover more points of one class
    than another

Class distribution
A biased (enriched) weak model
22
  • Distribution of model coverage for points in each
    class, with 100 enriched weak models

23
  • Distribution of model coverage for points in each
    class, with 200 enriched weak models

24
  • Distribution of model coverage for points in each
    class, with 300 enriched weak models

25
  • Distribution of model coverage for points in each
    class, with 400 enriched weak models

26
  • Distribution of model coverage for points in each
    class, with 500 enriched weak models

27
  • Distribution of model coverage for points in each
    class, with 1000 enriched weak models

28
  • Distribution of model coverage for points in each
    class, with 2000 enriched weak models

29
  • Distribution of model coverage for points in each
    class, with 5000 enriched weak models

30
  • Error rate decreases as number of models
    increases
  • Decision rule if Y lt 0.5 then class 2
    else class 1

31
  • Sparse Training Data
  • Incomplete knowledge about class distributions

Training Set
Test Set
32
  • Distribution of model coverage for points in each
    class, with 100 enriched weak models

Training Set
Test Set
33
  • Distribution of model coverage for points in each
    class, with 200 enriched weak models

Training Set
Test Set
34
  • Distribution of model coverage for points in each
    class, with 300 enriched weak models

Training Set
Test Set
35
  • Distribution of model coverage for points in each
    class, with 400 enriched weak models

Training Set
Test Set
36
  • Distribution of model coverage for points in each
    class, with 500 enriched weak models

Training Set
Test Set
37
  • Distribution of model coverage for points in each
    class, with 1000 enriched weak models

Training Set
Test Set
38
  • Distribution of model coverage for points in each
    class, with 2000 enriched weak models

Training Set
Test Set
39
  • Distribution of model coverage for points in each
    class, with 5000 enriched weak models

No discrimination!
Training Set
Test Set
40
  • Models of this type, when enriched for training
    set, are not necessarily enriched for test set

Training Set
Test Set
Random model with 50 coverage of space
41
  • Introducing projectability
  • Maintain local continuity of class
    interpretations.
  • Neighboring points of the same class should
    share similar model coverage.

42
  • Allow some local continuity in model membership,
    so that interpretation of a training point can
    generalize to its immediate neighborhood

Class distribution
A projectable model
43
  • Distribution of model coverage for points in each
    class, with 100 enriched, projectable weak models

Training Set
Test Set
44
  • Distribution of model coverage for points in each
    class, with 300 enriched, projectable weak models

Training Set
Test Set
45
  • Distribution of model coverage for points in each
    class, with 400 enriched, projectable weak models

Training Set
Test Set
46
  • Distribution of model coverage for points in each
    class, with 500 enriched, projectable weak models

Training Set
Test Set
47
  • Distribution of model coverage for points in each
    class, with 1000 enriched, projectable weak
    models

Training Set
Test Set
48
  • Distribution of model coverage for points in each
    class, with 2000 enriched, projectable weak
    models

Training Set
Test Set
49
  • Distribution of model coverage for points in each
    class, with 5000 enriched, projectable weak
    models

Training Set
Test Set
50
  • Promoting uniformity
  • All points in the same class should have equal
    likelihood to be covered by a model of each
    particular rating.
  • Retain models that cover the points whose
    coverage by current collection is less

51
  • Distribution of model coverage for points in each
    class, with 100 enriched, projectable, uniform
    weak models

Training Set
Test Set
52
  • Distribution of model coverage for points in each
    class, with 1000 enriched, projectable, uniform
    weak models

Training Set
Test Set
53
  • Distribution of model coverage for points in each
    class, with 5000 enriched, projectable, uniform
    weak models

Training Set
Test Set
54
  • Distribution of model coverage for points in each
    class, with 10000 enriched, projectable, uniform
    weak models

Training Set
Test Set
55
  • Distribution of model coverage for points in each
    class, with 50000 enriched, projectable, uniform
    weak models

Training Set
Test Set
56
The 3 necessary conditions
Discriminating Power
Enrichment
Uniformity
Projectability
Complementary Information
Generalization Power
57
Extensions and Comparisons
58
Alternative Discriminants
  • Berlind 1994
  • Different discriminants for N-class problems
  • Additional condition on symmetry
  • Approximate uniformity
  • Hierarchy of indiscernibility

59
Estimates of Classification Accuracies
  • Chen 1997
  • Statistical estimate of classification accuracy
  • under weaker conditions
  • Approximate uniformity
  • Approximate indiscernibility

60
Multi-class Problems
  • For n classes, define n discriminants Yi, one
    for each class i vs the others
  • Classify an unknown point to the class i for
    which the computed Yi is the largest

61
Ho Kleinberg ICPR 1996
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
Open Problems
  • Algorithm for uniformity enforcement
  • Deterministic methods?
  • Desirable form of weak models
  • Fewer, more sophisticated classifiers?
  • Other ways to address the 3-way trade-off
  • Enrichment / Uniformity / Projectability

66
Random Decision Forest
  • Ho 1995, 1998
  • A structured way to create models fully split a
    tree, use leaves as models
  • Perfect enrichment and uniformity for TR
  • Promote projectability by subspace projection

67
Compact Distribution Maps
  • Ho Baird 1993, 1997
  • Another structured way to create models
  • Start with projectable models by coarse
    quantization of feature value range
  • Seek enrichment and uniformity

68
SD Other Ensemble Methods
  • Ensemble learning via boosting
  • A sequential way to promote uniformity of
    ensemble element coverage
  • XCS (a genetic algorithm)
  • A way to create, filter, and use stochastic
    models that are regions in feature space

69
XCS Classifier System
  • Wilson,95
  • Recent focus of GA community
  • Good performance
  • Reinforcement Learning Genetic Algorithms
  • Model set of rules

if (shapesquare and numbergt10) then classred if
(shapecircle and numberlt5) then classyellow
input
class
Set of Rules
update
search
Reinforcement Learning
Genetic Algorithms
reward
Environment
70
Multiple Classifier SystemsExamples in Word
Image Recognition
71
Complementary Strengths of Classifiers
Rank of true class out of a lexicon of 1091
words, by 10 classifiers for 20 images
  • The case for classifier combination
  • decision fusion
  • mixture of experts
  • committee decision making

72
Classifier Combination Methods
  • Decision Optimization
  • find consensus among a given set of classifiers
  • Coverage Optimization
  • create a set of classifiers that work best with
    a given decision combination function

73
Decision Optimization
  • Develop classifiers with expert knowledge
  • Try to make the best use of their decisions
  • via majority/plurality vote, sum/product rule,
    probabilistic methods, Bayesian methods,
    rank/confidence score combination
  • The joint capability of the classifiers set an
    intrinsic limit on the combined accuracy
  • There is no way to handle the blind spots

74
Difficulties in Decision Optimization
  • Reliability versus overall accuracy
  • Fixed or trainable combination function
  • Simple models or combinatorial estimates
  • How to model complementary behavior

75
Coverage Optimization
  • Fix a decision combination function
  • Generate classifiers automatically and
    systematically
  • via training set sub-sampling (stacking,
    bagging, boosting),
  • subspace projection (RSM),
  • superclass/subclass decomposition (ECOC),
  • random perturbation of training processes, noise
    injection
  • Need enough classifiers to cover all blind spots
  • (how many are enough?)
  • What else is critical?

76
Difficulties inCoverage Optimization
  • What kind of differences to introduce
  • Subsamples? Subspaces? Super/Subclasses?
  • Training parameters?
  • Model geometry?
  • 3-way tradeoff
  • discrimination diversity generalization
  • Effects of the form of component classifiers

77
Dilemmas and Paradoxes in Classifier Combination
  • Weaken individuals for a stronger whole?
  • Sacrifice known samples for unseen cases?
  • Seek agreements or differences?

78
Stochastic Discrimination
  • A mathematical theory that relates several key
    concepts in pattern recognition
  • Discriminative power enrichment
  • Complementary information uniformity
  • Generalization power projectability
  • It offers a way to describe complementary
    behavior of classifiers
  • It offers guidelines to design multiple
    classifier systems (classifier ensembles)
Write a Comment
User Comments (0)
About PowerShow.com