Title: The Maturation
1The Maturation of Nearest Neighbors Techniques
Ronald E. McRoberts Northern Research
Station U.S. Forest Service St. Paul, Minnesota
Western Mensurationists Meeting 22-23 Jun
2009 Vancouver, WA
2Some nearest neighbors terminology ? Response
variable variable for which predictions are
desired ? Feature space variable ancillary
variable with observation available for every
population unit ? Reference set population
units with observations of both response and
feature space variables ? Target set
population units for which predictions of
response variables are desired
3The k Nearest Neighbors (k-NN) technique
4k-Nearest Neighbors
yji j1,,k is the set of k reference pixels
nearest to the ith pixel in feature space with
respect to a distance metric, d,
Primary parameters k, t, M
5Two primary applications ? Filling holes in
databases (classic imputation) - target set lt
reference set ? Spatial estimation - map-based
inference - target set gtgt reference set
6Issues in Nearest Neighbors prediction ? Search
for the nearest neighbors ? Search for
parameter values - optimal distance
metric - optimal weights wji - optimal
k ? Inference ? Diagnostic tools
7Searching for nearest neighbors k-d tree searching
8(No Transcript)
9Diagnostics ? Extrapolations ? Influential
observations ? Preserving covariances
10Diagnostic tools Ranges of feature space variables
11(No Transcript)
12(No Transcript)
13Influential reference elements
Particularly relevant when combining information
from different sources e.g., registering plot
data to remotely sensed data
14Diagnostic tools Preserving covariances
Reference set Target
set FOR VOL BA TD FOR VOL BA TD k1 FOR 0.96 0.
95 0.95 0.96 0.85 0.91 0.92 0.93 VOL 0.98 0.98 0
.98 1.07 1.07 1.06 BA 0.97 0.98 1.09 1.08 T
D 0.99 1.08 k5 FOR 0.65 0.72 0.73 0.74 0.
56 0.65 0.66 0.67 VOL 0.39 0.41 0.48 0.40 0.42
0.48 BA 0.43 0.48 0.44 0.49 TD 0.47 0.
49
15Map-based scientific inference Probability-
(design-based) inference - validity based on
randomization in sampling design - one and only
one value for each population unit
True Predicted Total C1 Cp C1 n11 n1p
n1? Cp np1 npp np? Total n?1 n?p
16Inference ? Complete enumeration ? Sample-base
d - expression of results in probabilistic
manner - typically a confidence
interval - requires bias assessment - requir
es variance estimate
17Map-based scientific inference Probability-
(design-based) inference - validity based on
randomization in sampling design - one and only
one value for each population unit
Difference estimator
18Map-based scientific inference Model-based
inference - validity based on model - an entire
distribution of possible values for
each population unit
19Bias assessment ? Bootstrap ? Compare to
estimates that are unbiased in expectation and
asympotically unbiased
20Tree count (count/ha)
Tree density
21Optimal distance matrix, M Find a positive,
semi-definite matrix M that minimizes where
nearest is defined as
22Approaches ? Canonical correlation
analysis (Moeur, Stage, et al.) ? Canonical
correspondence analysis (Ohmann et
al.) ? Mahalanobis ? Genetic algorithm for
weighted Euclidean (Tomppo et al.) ? Bayesian
for full matrix (Finley et al.) ? Steepest
descent (nonlinear regression) (McRoberts et
al.)
23Steepest descent
24Steepest descent
such that m12m21 and M0
25Steepest descent
26Consequences for finding an optimal distance
matrix ? surface has many local minima and
maxima ? surface is very rough ? surface
dependent on reference set ? consequences
similar for any approach
27Synthetic dataset Dataset Weighted
Full matrix Euclidean m
22 m12m21 m22 1 0.61 0.73 0.66 2 1.50 0.92 0.8
7 3 0.45 0.46 0.50 4 0.40 0.98 0.95 5 0.60 1.02
1.05
28Conclusions ? k-NN is a powerful multivariate,
non-parametric technique ? efficient
algorithms required for selecting parameter
values ? diagnostic tools required for
evaluating underlying assumptions,
unbiasedness, homogeneity of variance, influenti
al reference elements ? inferential methods
required ? new thinking required for optimal
distance matrix
29South Savoy, Finland k Can Mah Euc Opt Cor 1
125.6 87.1 89.1 75.2 5 95.0 70.2 67.2 64.3 10 9
1.1 68.5 66.0 62.5 15 90.2 68.1 65.7 62.9 20 88.9
68.2 65.3 62.4 30 88.5 68.0 65.1 61.0