Nonparametric Techniques

About This Presentation

Title:

Nonparametric Techniques

Description:

Common parametric forms rarely fit the densities actually ... Voronoi Tessellation. 34. Probability of Error. 35. Convergence of Nearest Neighbor. 36 ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 75

Provided by: shyhka

Category:

more less

Transcript and Presenter's Notes

Title: Nonparametric Techniques

1
Nonparametric Techniques

Shyh-Kang Jeng
Department of Electrical Engineering/
Graduate Institute of Communication/
Graduate Institute of Networking and Multimedia,
National Taiwan University

2
Problems of Parameter Estimation Approaches

Common parametric forms rarely fit the densities
actually encountered in practice
All of the classical parametric densities are
unimodal
Many practical problems involve multimodal
densities
High-dimensional density not often be represented
as the product of one-dimensional functions

3
Nonparametric Methods

Can be used with arbitrary distributions
Need no assumption on the forms of the underlying
densities
Basic types
Estimating the density function p(xwj) from
samples
Directly estimating the a posteriori
probabilities p(wjx)
Bypass probability distribution and go directly
to decision function

4
Density Estimation Naïve Approach
5
Density Estimation Naïve Approach
Pk/Pk,max
6
Problems of Naïve Approach

If volume is fixed and take more samples, we get
only a space-averaged value of p(x)
If volume approaches zero and fix n, then
estimated p(x) will be close to zero or infinity
and useless

7
Better Approaches
8
Hypercube Parzen Windows
9
Parzen Windows for Interpolation
10
Examples of Parzen Windows
11
Convergence Considerations
12
Convergence of the Mean
13
Convergence of the Variance
14
Convergence of the Variance
15
Illustration 1 1D Gaussian
16
Illustration 1 1D Gaussian
17
Illustration 2 2D Gaussian
18
Illustration 3 Uniform and Triangular
19
Classification Examples
20
Pros and Cons of Nonparametric Methods

Generality
Number of samples needed may be very large
Much larger than required if we know the form of
the density
Curse of dimensionality
High-dimensional functions are much more
complicated and harder to discern
Better to incorporate knowledge about the data
that is correct

21
Probabilistic Neural Networks (PNN)
22
PNN Training
23
Activation Function
24
PNN Classification
25
kn-Nearest-Neighbor Estimation

Let cell volume be a function of the test data
Prototypes
Training samples
Estimate p(x)
Center a cell about x
Let the cell grow until it captures kn samples

26
kn-Nearest-Neighbor Estimation
27
kn-Nearest-Neighbor Estimation
28
kn-Nearest-Neighbor Estimation

Sufficient and necessary conditions for pn(x)
converge to p(x)
Example

29
kn-Nearest-Neighbor Estimation
30
Estimation of A Posteriori Probabilities

Place a cell of volume V (Parzen or
kn-nearest-neighbor) around x
capture k samples
ki of them are labeled wi
Estimate of Joint probability p(x,wi)
Estimate for p(wix)

31
Nearest-Neighbor Rule

Dn x1, . . ., xn denote a set of n labeled
prototypes
x in Dn is the prototype nearest to a test
point x
Assign x the label associated with x
Suboptimal
Lead to an error rate greater than the Bayes rate
but never worse than twice the Bayes rate

32
Heuristic Understanding

Label q associated with the nearest neighbor is
a random number
P(qwix)P(wix)
When number of samples is very large, assume that
x is close to x that P(wix) approximately
equals P(wix)

33
Voronoi Tessellation
34
Probability of Error
35
Convergence of Nearest Neighbor
36
Error Rate for Nearest-Neighbor Rule
37
Error Rate for Nearest-Neighbor Rule
38
Approximate Error Bound
39
A More Rigorous Approach
40
A More Rigorous Approach
41
A More Rigorous Approach
42
A More Rigorous Approach

The upper bound is achieved in the
zero-information case
p(xwi) are identical
P(wix) P(wi)
P(ex) is independent of x
P is between 0 and (c-1)/c

43
Bounds of Nearest-Neighbor Error Rate
44
Convergence Speed

Convergence can be arbitrarily slow
Pn(e) need not even decrease monotonically with n

45
k-Nearest-Neighbor Rule
46
Simplified Analysis Results

Two-class case with k odd
Labels on each of the k-nearest-neighbors are
random variables
Independently assume the values wi with
probabilities P(wix)
Select wm if a majority of the k nearest
neighbors are labeled wm, with probability

47
Simplified Analysis Results

The larger the value of k, the greater the
probability that wm will be selected
Large-sample two-class error rate is bounded
above by Ck(P) defined to be the smallest
concave function of P greater than

48
Upper Bounds of Error Rate for Two-Class Cases
49
More Comments on k-Nearest-Neighbor Rule

As an attempt to estimate P(wix) from samples
Needs larger k to obtain a reliable estimate
Want all k nearest neighbors x to be very near x
to ensure P(wix) is approximately the same as
P(wix)
k should be a small fraction of n
Only when n goes to infinity that we can be
assured of nearly optimal behavior

50
Computational Complexity of the Nearest-Neighbor
Rule

Inspect each stored point in turn
O(n)
Calculate its Euclidean distance to x
Each calculation O(d)
Returning identity only of the current closest
one
Total complexity O(dn)

51
A Parallel Nearest-Neighbor Circuit
52
Reducing Computational Burden in Nearest-Neighbor
Search

Computing partial distance
Prestructuring
Create some form of search tree , e.g., a
quad-tree with representative points
Prototypes are selectively linked
Not guaranteed to find the closest prototype
Editing the stored prototypes

53
Nearest-Neighbor Editing

Initialize j?0, D?data set, n?prototypes
construct full Voronoi diagram of D
do j ? j 1 for each prototype xj
find Voronoi neighbors of xj
if any neighbor not from the same class
as xj, then mark xj
until jn
discard all points that are not marked
construct the Voronoi diagram of the
remaining (marked) prototypes
end

54
Nearest-Neighbor Editing

Complexity
Not guarantee the minimum set of points
Reduce complexity without affecting the accuracy
Generally can not add training data later
Can be combined with prestructuring and partial
distance

55
Properties of Metrics
56
Effect of Scaling in Euclidean Distance
57
Minkowski Metric (Lk Norm)
58
Tanimoto Metric

Finds most use in taxonomy
When two patterns or features are either the same
or different
Distance between two sets

59
Uncritical Use of Euclidean Metric
60
A Naïve Approach

Compute distance only when the patterns have been
transformed to be as similar to one another as
possible
The computational burden is prohibitive
Do not know the proper parameters for the
transformation ahead of time
More serious if several transformations for each
stored prototype during classification are to be
considered

61
Tangent Vector

r transformations are applicable
e.g., horizontal translation, shear, line
thinning for hand-written images
For each prototype x, perform each of the
transformation Fi(xai)
Tangent vector for each transformation

62
Linear Combination of Tangent Vectors
63
Tangent Distance

Construct an r by d matrix T through tangent
vectors at x
Tangent distance from x to x

64
Concept of Tangent Distance
65
Category Membership Functions in Fuzzy Logic
Medium-light
Dark
Light
Medium-dark
66
Conjunction Rule and Discriminant Function
67
Cox-Jaynes Axioms for Category Membership
Functions
68
Contribution and Limitation

Guiding the steps by which one takes knowledge in
a linguistic form and casts it into discriminant
functions
Do not rely on data

69
Reduced Coulomb Energy (RCE) Networks
70
RCE Training
71
RCE Training
72
RCE Classification
73
Approximations by Series Expansions
74
One-Dimensional Example

Write a Comment

User Comments (0)

About PowerShow.com

Nonparametric Techniques - PowerPoint PPT Presentation

Nonparametric Techniques

Common parametric forms rarely fit the densities actually ... Voronoi Tessellation. 34. Probability of Error. 35. Convergence of Nearest Neighbor. 36 ... – PowerPoint PPT presentation