The Maturation - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

The Maturation

Description:

Title: Slide 1 Author: Ron McRoberts Last modified by: Greg Johnson Created Date: 9/10/2004 7:03:02 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 30

Provided by: RonM82

Category:

more less

Transcript and Presenter's Notes

Title: The Maturation

1
The Maturation of Nearest Neighbors Techniques
Ronald E. McRoberts Northern Research
Station U.S. Forest Service St. Paul, Minnesota
Western Mensurationists Meeting 22-23 Jun
2009 Vancouver, WA
2
Some nearest neighbors terminology ? Response
variable variable for which predictions are
desired ? Feature space variable ancillary
variable with observation available for every
population unit ? Reference set population
units with observations of both response and
feature space variables ? Target set
population units for which predictions of
response variables are desired
3
The k Nearest Neighbors (k-NN) technique
4
k-Nearest Neighbors
yji j1,,k is the set of k reference pixels
nearest to the ith pixel in feature space with
respect to a distance metric, d,
Primary parameters k, t, M
5
Two primary applications ? Filling holes in
databases (classic imputation) - target set lt
reference set ? Spatial estimation - map-based
inference - target set gtgt reference set
6
Issues in Nearest Neighbors prediction ? Search
for the nearest neighbors ? Search for
parameter values - optimal distance
metric - optimal weights wji - optimal
k ? Inference ? Diagnostic tools
7
Searching for nearest neighbors k-d tree searching
8
(No Transcript)
9
Diagnostics ? Extrapolations ? Influential
observations ? Preserving covariances
10
Diagnostic tools Ranges of feature space variables
11
(No Transcript)
12
(No Transcript)
13
Influential reference elements
Particularly relevant when combining information
from different sources e.g., registering plot
data to remotely sensed data
14
Diagnostic tools Preserving covariances
Reference set Target
set FOR VOL BA TD FOR VOL BA TD k1 FOR 0.96 0.
95 0.95 0.96 0.85 0.91 0.92 0.93 VOL 0.98 0.98 0
.98 1.07 1.07 1.06 BA 0.97 0.98 1.09 1.08 T
D 0.99 1.08 k5 FOR 0.65 0.72 0.73 0.74 0.
56 0.65 0.66 0.67 VOL 0.39 0.41 0.48 0.40 0.42
0.48 BA 0.43 0.48 0.44 0.49 TD 0.47 0.
49
15
Map-based scientific inference Probability-
(design-based) inference - validity based on
randomization in sampling design - one and only
one value for each population unit
True Predicted Total C1 Cp C1 n11 n1p
n1? Cp np1 npp np? Total n?1 n?p
16
Inference ? Complete enumeration ? Sample-base
d - expression of results in probabilistic
manner - typically a confidence
interval - requires bias assessment - requir
es variance estimate
17
Map-based scientific inference Probability-
(design-based) inference - validity based on
randomization in sampling design - one and only
one value for each population unit
Difference estimator
18
Map-based scientific inference Model-based
inference - validity based on model - an entire
distribution of possible values for
each population unit
19
Bias assessment ? Bootstrap ? Compare to
estimates that are unbiased in expectation and
asympotically unbiased
20
Tree count (count/ha)
Tree density
21
Optimal distance matrix, M Find a positive,
semi-definite matrix M that minimizes where
nearest is defined as
22
Approaches ? Canonical correlation
analysis (Moeur, Stage, et al.) ? Canonical
correspondence analysis (Ohmann et
al.) ? Mahalanobis ? Genetic algorithm for
weighted Euclidean (Tomppo et al.) ? Bayesian
for full matrix (Finley et al.) ? Steepest
descent (nonlinear regression) (McRoberts et
al.)
23
Steepest descent
24
Steepest descent
such that m12m21 and M0
25
Steepest descent
26
Consequences for finding an optimal distance
matrix ? surface has many local minima and
maxima ? surface is very rough ? surface
dependent on reference set ? consequences
similar for any approach
27
Synthetic dataset Dataset Weighted
Full matrix Euclidean m
22 m12m21 m22 1 0.61 0.73 0.66 2 1.50 0.92 0.8
7 3 0.45 0.46 0.50 4 0.40 0.98 0.95 5 0.60 1.02
1.05
28
Conclusions ? k-NN is a powerful multivariate,
non-parametric technique ? efficient
algorithms required for selecting parameter
values ? diagnostic tools required for
evaluating underlying assumptions,
unbiasedness, homogeneity of variance, influenti
al reference elements ? inferential methods
required ? new thinking required for optimal
distance matrix
29
South Savoy, Finland k Can Mah Euc Opt Cor 1
125.6 87.1 89.1 75.2 5 95.0 70.2 67.2 64.3 10 9
1.1 68.5 66.0 62.5 15 90.2 68.1 65.7 62.9 20 88.9
68.2 65.3 62.4 30 88.5 68.0 65.1 61.0

Write a Comment

User Comments (0)