Nonparametric Techniques - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Nonparametric Techniques

Description:

Common parametric forms rarely fit the densities actually ... Voronoi Tessellation. 34. Probability of Error. 35. Convergence of Nearest Neighbor. 36 ... – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 75
Provided by: shyhka
Category:

less

Transcript and Presenter's Notes

Title: Nonparametric Techniques


1
Nonparametric Techniques
  • Shyh-Kang Jeng
  • Department of Electrical Engineering/
  • Graduate Institute of Communication/
  • Graduate Institute of Networking and Multimedia,
    National Taiwan University

2
Problems of Parameter Estimation Approaches
  • Common parametric forms rarely fit the densities
    actually encountered in practice
  • All of the classical parametric densities are
    unimodal
  • Many practical problems involve multimodal
    densities
  • High-dimensional density not often be represented
    as the product of one-dimensional functions

3
Nonparametric Methods
  • Can be used with arbitrary distributions
  • Need no assumption on the forms of the underlying
    densities
  • Basic types
  • Estimating the density function p(xwj) from
    samples
  • Directly estimating the a posteriori
    probabilities p(wjx)
  • Bypass probability distribution and go directly
    to decision function

4
Density Estimation Naïve Approach
5
Density Estimation Naïve Approach
Pk/Pk,max
6
Problems of Naïve Approach
  • If volume is fixed and take more samples, we get
    only a space-averaged value of p(x)
  • If volume approaches zero and fix n, then
    estimated p(x) will be close to zero or infinity
    and useless

7
Better Approaches
8
Hypercube Parzen Windows
9
Parzen Windows for Interpolation
10
Examples of Parzen Windows
11
Convergence Considerations
12
Convergence of the Mean
13
Convergence of the Variance
14
Convergence of the Variance
15
Illustration 1 1D Gaussian
16
Illustration 1 1D Gaussian
17
Illustration 2 2D Gaussian
18
Illustration 3 Uniform and Triangular
19
Classification Examples
20
Pros and Cons of Nonparametric Methods
  • Generality
  • Number of samples needed may be very large
  • Much larger than required if we know the form of
    the density
  • Curse of dimensionality
  • High-dimensional functions are much more
    complicated and harder to discern
  • Better to incorporate knowledge about the data
    that is correct

21
Probabilistic Neural Networks (PNN)
22
PNN Training
23
Activation Function
24
PNN Classification
25
kn-Nearest-Neighbor Estimation
  • Let cell volume be a function of the test data
  • Prototypes
  • Training samples
  • Estimate p(x)
  • Center a cell about x
  • Let the cell grow until it captures kn samples

26
kn-Nearest-Neighbor Estimation
27
kn-Nearest-Neighbor Estimation
28
kn-Nearest-Neighbor Estimation
  • Sufficient and necessary conditions for pn(x)
    converge to p(x)
  • Example

29
kn-Nearest-Neighbor Estimation
30
Estimation of A Posteriori Probabilities
  • Place a cell of volume V (Parzen or
    kn-nearest-neighbor) around x
  • capture k samples
  • ki of them are labeled wi
  • Estimate of Joint probability p(x,wi)
  • Estimate for p(wix)

31
Nearest-Neighbor Rule
  • Dn x1, . . ., xn denote a set of n labeled
    prototypes
  • x in Dn is the prototype nearest to a test
    point x
  • Assign x the label associated with x
  • Suboptimal
  • Lead to an error rate greater than the Bayes rate
  • but never worse than twice the Bayes rate

32
Heuristic Understanding
  • Label q associated with the nearest neighbor is
    a random number
  • P(qwix)P(wix)
  • When number of samples is very large, assume that
    x is close to x that P(wix) approximately
    equals P(wix)

33
Voronoi Tessellation
34
Probability of Error
35
Convergence of Nearest Neighbor
36
Error Rate for Nearest-Neighbor Rule
37
Error Rate for Nearest-Neighbor Rule
38
Approximate Error Bound
39
A More Rigorous Approach
40
A More Rigorous Approach
41
A More Rigorous Approach
42
A More Rigorous Approach
  • The upper bound is achieved in the
    zero-information case
  • p(xwi) are identical
  • P(wix) P(wi)
  • P(ex) is independent of x
  • P is between 0 and (c-1)/c

43
Bounds of Nearest-Neighbor Error Rate
44
Convergence Speed
  • Convergence can be arbitrarily slow
  • Pn(e) need not even decrease monotonically with n

45
k-Nearest-Neighbor Rule
46
Simplified Analysis Results
  • Two-class case with k odd
  • Labels on each of the k-nearest-neighbors are
    random variables
  • Independently assume the values wi with
    probabilities P(wix)
  • Select wm if a majority of the k nearest
    neighbors are labeled wm, with probability

47
Simplified Analysis Results
  • The larger the value of k, the greater the
    probability that wm will be selected
  • Large-sample two-class error rate is bounded
    above by Ck(P) defined to be the smallest
    concave function of P greater than

48
Upper Bounds of Error Rate for Two-Class Cases
49
More Comments on k-Nearest-Neighbor Rule
  • As an attempt to estimate P(wix) from samples
  • Needs larger k to obtain a reliable estimate
  • Want all k nearest neighbors x to be very near x
    to ensure P(wix) is approximately the same as
    P(wix)
  • k should be a small fraction of n
  • Only when n goes to infinity that we can be
    assured of nearly optimal behavior

50
Computational Complexity of the Nearest-Neighbor
Rule
  • Inspect each stored point in turn
  • O(n)
  • Calculate its Euclidean distance to x
  • Each calculation O(d)
  • Returning identity only of the current closest
    one
  • Total complexity O(dn)

51
A Parallel Nearest-Neighbor Circuit
52
Reducing Computational Burden in Nearest-Neighbor
Search
  • Computing partial distance
  • Prestructuring
  • Create some form of search tree , e.g., a
    quad-tree with representative points
  • Prototypes are selectively linked
  • Not guaranteed to find the closest prototype
  • Editing the stored prototypes

53
Nearest-Neighbor Editing
  • Initialize j?0, D?data set, n?prototypes
  • construct full Voronoi diagram of D
  • do j ? j 1 for each prototype xj
  • find Voronoi neighbors of xj
  • if any neighbor not from the same class
  • as xj, then mark xj
  • until jn
  • discard all points that are not marked
  • construct the Voronoi diagram of the
  • remaining (marked) prototypes
  • end

54
Nearest-Neighbor Editing
  • Complexity
  • Not guarantee the minimum set of points
  • Reduce complexity without affecting the accuracy
  • Generally can not add training data later
  • Can be combined with prestructuring and partial
    distance

55
Properties of Metrics
56
Effect of Scaling in Euclidean Distance
57
Minkowski Metric (Lk Norm)
58
Tanimoto Metric
  • Finds most use in taxonomy
  • When two patterns or features are either the same
    or different
  • Distance between two sets

59
Uncritical Use of Euclidean Metric
60
A Naïve Approach
  • Compute distance only when the patterns have been
    transformed to be as similar to one another as
    possible
  • The computational burden is prohibitive
  • Do not know the proper parameters for the
    transformation ahead of time
  • More serious if several transformations for each
    stored prototype during classification are to be
    considered

61
Tangent Vector
  • r transformations are applicable
  • e.g., horizontal translation, shear, line
    thinning for hand-written images
  • For each prototype x, perform each of the
    transformation Fi(xai)
  • Tangent vector for each transformation

62
Linear Combination of Tangent Vectors
63
Tangent Distance
  • Construct an r by d matrix T through tangent
    vectors at x
  • Tangent distance from x to x

64
Concept of Tangent Distance
65
Category Membership Functions in Fuzzy Logic
Medium-light
Dark
Light
Medium-dark
66
Conjunction Rule and Discriminant Function
67
Cox-Jaynes Axioms for Category Membership
Functions
68
Contribution and Limitation
  • Guiding the steps by which one takes knowledge in
    a linguistic form and casts it into discriminant
    functions
  • Do not rely on data

69
Reduced Coulomb Energy (RCE) Networks
70
RCE Training
71
RCE Training
72
RCE Classification
73
Approximations by Series Expansions
74
One-Dimensional Example
Write a Comment
User Comments (0)
About PowerShow.com