Title: Predictive Models I
1Predictive Models I
- RNR/Geog 420/520
- Spring 2000
2Predictive Models
- Important to understand what we are attempting to
predict - These models predict location
- This prediction is based on reasoned or measured
relationships - No predictive model is perfect
- Some are more efficient than others
3Broad Model Types
- Deductive models are based on reasoning in which
the conclusion follows necessarily the presented
premises - Inductive models base validity on observations
about part of a class as evidence for a
proposition about the whole class
4The Goal of Predictive Models
- Models in a GIS should maintain a high percentage
of correct predictions while decreasing the area
needed to obtain these predictions
5Why Make Predictive Models
- Resource management
- Reduce costs but maintain service
- Planning decisions
- Discover favored habitats
- Understand behavior
- Discover preferences
- Prove theories
- Disprove theories
6Components of a Model
- Variables
- Study Group
- Control Group
- Suitable Statistical model/test
7Variable Selection
- Variables are usually selected because they are
thought to exert influence on the phenomena being
studied - The data type (i.e. nominal vs. continuous) of a
variable can restrict the types of models it is
possible to make - GIS allow researcher to control continuous data
8Study Group
- Locations where phenomena being investigated are
located - A good study group requires good collection
strategy - SAMPLE(ltmask_gridgt, grid, ..., grid)
- zone x y cellvalue1 cellvaluen
9Control Group
- Often necessary to discover the significance of
spatial patterns - For example, is it significant if 75 of coyote
dens are located within 50 meters of houses?
10Trend Surface Models
- Based strictly on location
- Trend surface models use a polynomial regression
to fit a least-squares surface to input points - As the order of the polynomial is increased, the
surface being fitted becomes progressively more
complex - Two variations Linear and Logistic
- TREND(ltpoint_cover point_filegt, spot_item,
order,LINEAR LOGISTIC, cellsize, xmin,
ymin, xmax, ymax)
11Linear/Logistic Trend Surfaces
- Linear Trend Surfaces
- Useful for continuous data
- Uses x, y, and z values to model data trends
- Z values are continuous
- Creates smooth surfaces
- Surface complexity increases as order of
polynomial increases
- Logistic Trend Surfaces
- Useful for binary types of data (e.g. yes/no)
- Uses x, y, and z to model trends.
- Z values are 0 or 1
- Creates smooth surfaces
- Surface complexity increases as order of
polynomial increases
12First Order Trend Surfaces
Based on Site Locations (Logistic)
Based on Pottery Counts (Linear)
13Third Order Trend Surfaces
Based on Site Locations (Logistic)
Based on Pottery Counts (Linear)
14Locational Characteristic Models
- Based on values of variables at the study group
locations - Univariate Analysis (Kolmogorov-Smirnov)
- Multivariate Analysis (multiple regression,
cluster, classification, principle component)
15Kolmogorov Smirnov Test
- Statistic is the maximum difference between
cumulative proportions of two samples, usually
study group and control group - Use GRIDs SAMPLE command to extract values for
both groups - Preference can be seen graphically
Significance at 5 level reached if
16Multiple Regression Models
- Regression models measure relationship between
dependent and independent variables - The dependent variable in linear regression is
generally a real number - The dependent variable in logistic regression is
either a 1 or a 0
17Data Used in Regressions
- Linear
- Dep var1 var2
- 43 840 149
- 22 852 155
- 69.4 854 151
- 15 805 134
- 46 853 062
- Logistic
- Dep var1 var2
- 1 840 149
- 0 852 155
- 1 854 151
- 0 805 134
- 1 853 062
18Creating Multiple Regression Models in GRID
- Subject SAMPLE results to regression
- Statistics Software
- GRIDs REGRESSION command
- Results of the regression include coefficients
and a constant, or y-intercept - Model made by multiplying variables by
coefficients - surface 1.250 (-0.029 x img1) (0.263 x img2)
19Results of GRIDs Regression
- Grid gt regression hsam.txt logistic brief lt
- coef coef
- ------ ----------------
- 0 -3.797
- 1 -0.001
- 2 0.014
- 3 0.006
- 4 0.000
- 5 0.055
- ------ ----------------
- RMS Error 0.393
- Chi-Square 51.608
20Results of STATAs Regression
--------------------------------------------------
---------------------------- Logit Estimates
Number of obs
383
chi2(13) 53.88
Prob
( chi2 0.0000 Log Likelihood -220.37417
Pseudo R2
0.1089 -----------------------------------------
------------------------------------- site
Coef. Std. Err. t P(t
95 Conf. Interval ----------------------------
-------------------------------------------------
aspew .0004698 .002341 0.201
0.841 -.0041336 .0050731 aspns
.0021229 .0023099 0.919 0.359
-.0024193 .006665 elev -.0038272
.0042056 -0.910 0.363 -.0120971
.0044428 relfa -.1647048 .0695988
-2.366 0.018 -.3015648 -.0278448
relfm .2218111 .0720802 3.077 0.002
.0800717 .3635505 texture -.5435591
.2748572 -1.978 0.049 -1.084042
-.0030762 ridge .0014501 .0032384
0.448 0.655 -.0049179 .0078182 sd1
.0001864 .0004607 0.405 0.686
-.0007195 .0010924 sd2 -.0001118
.0012555 -0.089 0.929 -.0025806
.0023571 sd3 -.0052209 .0021802
-2.395 0.017 -.009508 -.0009337
shelter -.0012764 .0015435 -0.827
0.409 -.0043115 .0017587 slope
.0752924 .0386194 1.950 0.052
-.0006493 .1512342 wadist .0007286
.0007215 1.010 0.313 -.0006902
.0021474 _cons 3.439327 4.429328
0.776 0.438 -5.270564
12.14922 -----------------------------------------
------------------------------------- The
corrected Y-intercept constant 4.0704389
21Regression Model
22Probability Models
Group 1
Group 2
23Model Strength