Use of Knearest Neighbor Imputation for Modeling Forest Inventory Data - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Use of Knearest Neighbor Imputation for Modeling Forest Inventory Data

Description:

Use of Knearest Neighbor Imputation for Modeling Forest Inventory Data – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 29
Provided by: alis95
Category:

less

Transcript and Presenter's Notes

Title: Use of Knearest Neighbor Imputation for Modeling Forest Inventory Data


1
Use of K-nearest Neighbor Imputation for Modeling
Forest Inventory Data
Andrew J. Lister USDA Forest Service Northern
Research Station Northern Monitoring
Program Forest Inventory and Analysis
2
Original rationale for needing classified maps
plot stratification for variance reduction
gt 400,000 photointerpretation points in 13
Northeastern states alone
3
Instead of photos, use classified imagery to
group and weight plots
96.3 accurate with a kappa statistic of 0.81
4
Better hardware, software and skill ?
sophisticated, regional and national maps could
be produced ? growth in demand.
5
After 2000, the advent of the MODIS sensor made
regional mapping even more feasible.
Concurrently, computer capabilities and software
advances made this type of modeling more
accessible to natural resource agencies.
6
The USDA Forest Services Remote Sensing
Application Center and its Forest Inventory and
Analysis Unit assembled a set of several dozen
GIS and imagery data layers that could be used
for a first iteration of national mapping efforts.
ETC.
7
With these new software and hardware solutions,
FIA produced a first iterations of a national
product Forest Biomass of the United States.
8
The problem with the regression tree approach is
that every map requires a separate modeling
effort. The ideal approach would be to make one
map with entire plot records imputed to pixels.
9
  • Approach A mapping project in support of the PA
    state report
  • Clean data remove whacko plots, plots with any
    nonforest on them, and categorical predictors.
    Good idea?

2. Cull confounding predictor data by choosing a
subset of variables that effectively group forest
inventory data into homogeneous groups.
First use k-means clustering to group the FIA
data into 10 clusters using, e.g., total volume
per species as the distance defining variables ?
each plot assigned a species composition group
(cluster)
10
Next, run a feature selection algorithm
iteratively assess every predictor for its
ability to improve the classification of training
data into their correct species composition
clusters, and rank them based on this ability
Why not do data reduction by creating composite
variables, e.g., principal components or
canonical variates?
Good question..
Intuitively feel that adding another layer of
modeled data to the modeling process further
dissociates the phenomenon being predicted from
the measurements that were taken.
Im probably wrong!
11
Next, associate the training plots with the
subset of standardized predictor data
Perform fuzzy classification, whereby each
unknown pixel is given the plot id number of the
known plot that is most similar to it (simple
Euclidean distance).
Finally, recode the plot id image with a lookup
table created by summarizing the FIA database to
the plot level. For example, plot 17 has 1500
cubic feet of volume/acre, so every pixel that
was assigned plot 17 as its nearest neighbor gets
a value of 1500.
The principle advantage you do the
classification once, and then simply recode the
plotid map for every attribute of interest.
12
Results of feature selection procedure which
layers best put plots into homogeneous groups?
13
x x2 Soil ph coarse fraction 1 August
precip coarse fraction 2 y2 y xy November
precip September precip June precip total annual
precip December precip plasticity
Ranks of tree volume-related variables
14
Ranks of species composition-related variables
15
MODIS May 9 b2 MODIS May 9 b5 MODIS May 9
b6 MODIS July 12 b2 percent conifer forest MODIS
June 10 b6 percent conifer forest MODIS September
14 b2 MODIS April 7 b2 MODIS June 10 b2 MODIS
June 10 b7 MODIS Aug 13 b2 MODIS June 10 b1 MODIS
NDVI August 13 EVI
Ranks of total green biomass variables
16
Plot id map note spatial clustering of similar
values
17
A small number of final recoded, masked maps
18
A couple more..
19
Just one more a wafer thin map..
20
Are we there yet?
21
Examples of final product Maple Beech Birch
Forest Type
22
Examples of final product Chestnut Oak
23
Examples of final product Eastern Hemlock
24
Examples of final product Red Maple
25
Quality assurance
Compute map- and plot-based totals per window,
and assess quality of map
26
Examples of quality assurance
Percent Dead White Pine
Board Foot Volume Sweet Birch
Percent Basal Area Red Oak
Percent Damage Red Maple
27
Next steps
See if data reduction via creation of composite
variables improves quality over using a subset of
raw variables
Try different distance metrics
Assess varying levels of k could use same
recoding approach
Evaluate hybrid unsupervised-supervised approach,
similar to classical remote sensing
Automate and generalize process feature
selection, clustering, etc. Ideally, user could
submit a training data set, and map would be
created via a set of heuristics.
28
Why worry so much about mapping approaches for
forestry data?
alister_at_fs.fed.us
Write a Comment
User Comments (0)
About PowerShow.com