Title: Model Selection and Inference:
1Model Selection and Inference Motivation,
Mechanics, and Interpretation
Gail Olson and Dan Rosenberg Department of
Fisheries and Wildlife Oregon State University
www.oregonstate.edu/rosenbed/workshop.htm
2Goal of Workshop
- Provide motivation for a conceptually simple
approach - for the analysis of data using multiple models
- emphasizing an a priori approach
- Provide the mechanics of how to use AIC
- Guidance on how to interpret results from an AIC
approach - Discuss how this may benefit your research
3Starters
We assume
The research started with an intriguing and
important question AND You used a proper
experimental or probability-based sampling
design Analytical strategies can not account
for the failure of these points
The research started with an intriguing and
important question You used a proper experimental
or probability-based sampling design
4 Goal of Research in Management of Natural
Resources
- understand nature and how it reacts to
perturbations - make predictions based on
inferences from analysis of
empirical data
5Steps in Making Reliable Inferences
- Inference from Sample to the Population
- Identify and understand patterns and mechanisms
- Statistical models to aid detection and
interpretation
Pr(use)
distance
All models are wrong, but some are useful Box
(1976)
6What is Meant by Model?
- 1. Theory A hypothesis that has survived
repeated efforts to falsify it -
- Hypothesis a story about how the world works
- Model an abstraction or simplification of the
real world models as tools for the evaluation of
hypotheses -
- Statistical models separate noise from
information inherent in data
This is particularly important in the model
selection framework recognition that there is
not necessarily a single model appropriate for
inference
7Single vs Multiple Models
Traditional Hypothesis Testing (Single Model)
8Traditional Hypothesis Testing (Single Model)
All we typically learn is that the sample sizes
were not large enough to detect differences
9Single vs Multiple Models
- Probability of use is
- unrelated to distance from a nest
- related linearly
- related exponentially
-
All hypotheses receive equal initial weight in
evaluation, and all models can be used in
inference so one does not have to select a
single model
10Emphasis on an a priori Model Set
11Hypotheses Expressed as Statistical Models
- A Global Model
- has many parameters representing plausible
effects and the state of the science, as well as
relevant study design issues most complex model
of set - Subsets
- can be considered special cases of the global
model fewer parameters, not necessarily nested
always of same response variable and estimated
from the same set of data
12Developing an a priori Model Set
- Have the question crystal-clear
- Bring in your (teams) understanding of the
problem - Incorporate past research via literature review
4. Understand the expectation of the process
based on theory and include this
expectation in your model set
- 5. Include models of opposing views
- 6. Should be subjective bring in various views
and thoughts - 7. Avoid all possibilities just because you
can - Number of parameters must be considered in terms
of sample size
9. Number of models should be a balance between
small number of biologically plausible models
and not excluding potentially important models
13A Model of Habitat Selection
N
Per unit area, Pr (use) f(dist. to focal site)
barriers attractants
14Hypotheses and their Rationale
A. Hypotheses related to distance effects
Pr (Use)
Distance from the Nest
15Hypotheses and their Rationale
16The Set of Candidate Models
- Global Model The most complex model
- Pr(use) distance (polynomial), crop types, patch
type, - distance to perennial crop, dominant in home
range -
- Model Subsets Includes one or more parameters
- distance (linear)
- distance (log)
- distance (polynomial)
- Crop-Only models
- includes parameter for each crop type
- Crop types combined into structure classes
- Best distance model crop parameters
- Best Distance model structure parameters
- No effects model
- Best distance cover or crop model patch type
- Etc.
17(No Transcript)
18Conformity of Burrowing Owl Space-Use Patterns
to the Central-Place Model
Large individual (and/or sampling) variation
Percent Locations
Agriculture
Fragmented
Distance (km) from Nest
19Summary Motivation for an a priori Model
Selection Approach
Statistical models to separate pattern from
noise Single vs. multiple model
approaches Insignificance of Statistical
Significance Testing (Johnson 1999) Emphasis on
parameter estimation and uncertainty Ranking
and evaluating competing hypotheses Inference
from multiple models often difficult to identify
the best model
20Akaikes Information Criterion (AIC)
- Metric to rank and compare models
- Hirotugu Akaike (1973)
- An Information Criterion
- Simple metric with DEEP theory
- Boltzmanns entropy Physics
- Kullback-Leibler discrepancy Information
theory - Maximum Likelihood Theory - Statistics
21Kullback-Leibler Discrepancy
22Maximum Likelihood (ML)
- Good statistical properties
- Unbiased
- Minimum variance
- Links models, parameters, data
- L (parameters model, data)
- Usually expressed as a log value
- log (L (qg(y),y))
- Aim is to maximize the log value
23ML Example
- Binomial model
- L (p binomial, y)
For n11 and y7
24Model over-fitting
25Principle of Parsimony
26AIC Basics
AIC -2logL 2k
27AICc for small sample sizes
- Less biased
- Use when n/k lt 40
- Better, use all the time!
28Model Selection
- Compute AICc for each model
- Rank lowest to highest
- Lowest AICc best model
- Example
- Northern Spotted Owl Survival Analysis
- Effects of Seasonal Climate covariates
- (Precipitation and Temperature)
29Model ranking by AICc
30DAICc
- DAICc AICc(model) AICc(min)
- Compare model relative to best model
- Rules of Thumb (BA)
- 0-2 Competing, substantial support
- 4-7 Less supported
- 10 Essentially no support
-
31Relative rankings
32Akaike weights
- Relative likelihood of each model
- Specific to model set (Swi1)
33Model weights
S
34Model weights
35Fun things to do with weights
- Evidence ratios
- Compare one model to another
- Confidence sets
- What models are more likely?
- Importance values
- What variables are most important?
36Evidence Ratios
Compare best model (Pen) with no climate model
Wpen 0.3318 , Wno climate 0.1040 ER
0.3318/0.1040 3.19 Pen model 3X more likely
than no climate model
37Confidence Set
95
38Importance values
- Cement Hardening Example (BA)
- Time to hardening based (y) on composition of 4
different ingredients (xi) - Regression
- y b0b1(x1)b2(x2)b3(X3)b4(x4)
39AIC in regression analyses
- Number of parameters
- k number of variables (xi)
- intercept (if used)
- error variance (s2)
- AIC may be calculated from (s2) as
- AIC nlog (s2) 2k
-
40Multi-model inferenceModel Averaging
- Incorporates model selection uncertainty
- Used for parameter estimation
- Directly estimated or not
- E.g. Regression coefficients, predicted values
41Pitfalls to avoid
- Use same data set for all models
- Caution missing values
- Transform Xs but not Y
- Number of parameters known?
- hidden parameters
- lost parameters
- Bottom line
- Know what you are doing!
42Interpreting Results
- Some issues
- Models differing by 1 parameter
- Model ambiguity
- Null model best
- Model redundancy
43(No Transcript)
44Model Ambiguity
NSO Productivity Modeled as function of Habitat
covariates