STATISTICAL MODELING PROCEDURES Chapter 2 - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

STATISTICAL MODELING PROCEDURES Chapter 2

Description:

'Modelling is an art, not a science' ... Alder Flycatcher. Blackpoll Warbler. Savannah Sparrow. river. 0.7 ( 0.09 ( 0. lake. 0. 0 ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 41

Provided by: wallyer

Category:

more less

Transcript and Presenter's Notes

Title: STATISTICAL MODELING PROCEDURES Chapter 2

1
STATISTICAL MODELING PROCEDURESChapter 2

RSF CONFERENCE
JANUARY 10, 2003

2
Outline

Introduction to Model Building
Simple Comparisons/Graphical Methods
Statistical Models in RSF(e.g., linear
regression)
Hypothesis Tests
Model Selection
Multiple Testing
Bootstrapping

3
Dr. George Box Quotes

Modelling is an art, not a science,
All models are wrong, some are useful, and we
should seek out those.

4
Dr. Fisher discussing model specification

as for problems of specification, these are
entirely a matter for the practical
statistician.

5
General Principles of Modeling(McCullagh and
Nelder 1989)

Search for useful models, and know that eternal
truth is not within our grasp.
Do not fall in love with a single model, to the
exclusion of alternatives.
Thoroughly check the fit of the model.

6
John Stuart Mill (1879) writing in his System of
Logic

The guesses which served to give mental unity
and wholeness to a chaos of scattered
particulars, are accidents which rarely occur to
any minds but those abounding in knowledge and
disciplined in intellectual combinations
INTERPRETATION MODELING IS NOT FOR THE MENTALLY
CHALLENGED

7
Modeling Approaches

simple sample comparisons/graphical displays
linear regression
logistic regression
log-linear models
proportional hazard models
generalized linear models

8
T-tests
9
Graphing Example Chipping sparrow RSF
40
Unused
Used
30
mean
20
10
0
CANOPY
DEBRIS
LIVE TREE
SAPLINGS
SHRUBS
10
Simple Sample ComparisonsGraphical Example
11
Linear RegressionAnalysis of Continuous Measures
of the Amount of Use

Assume the amount of use of a resource unit is a
continuous variable Y.
Standard statistical methods should be
sufficient.
The linear regression model
Y ßo ß1X1 ß2X2 ... ßpXp ?, (2.1)
where ß0 to ßp are constants to be estimated from
data,
? N(0, ?2)

12
Linear Regression/RSF Example

Y Biomass of eelgrass in 1 m x 1 m quadrats
X1 depth

13
(No Transcript)
14
Logistic Regression

An assumption for this type of model is that the
probability of a success is given by the equation
exp(ß0 ß1X1 ß2X2 ... ßpXp)
? ----------------------------, (2.2)
1 exp(ß0 ß1X1 ß2X2 ... ßpXp)
where ß0 to ßp are constants to be estimated from
the available data,
X1 to Xp are the variables that the probability
of a success is to be related to.
number of successes observed in n trials follows
a binomial distribution with mean n? and variance
n?(1 - ?)

15
Logistic Regression Example

Chipping sparrow resource selection
Used and unused determined based on
presence/absence on point count stations
Design I, sampling protocol D

16
Resulting Model

exp(3.2150.088canopy0.053see
dling0.019nsapling0.676grndb)
w(x) __________________________________
____________________
1exp(3.2150.088canopy0.053s
eedling0.019nsapling0.676grndb)

17
Log-linear Model

Y are counts of the number of occurrences of a
certain event under different conditions
Natural assumption are that the counts follow a
Poission Distribution
E(Y) µ exp(ß0 ß1X1 ß2X2 ... ßpXp).
(2.3)
Examples, number of animals observed within
blocks of land, with covariates measured on those
blocks

18
Example
19
Generalized Linear Models (McCullagh and Nelder,
1989).
E(Y) f(ß0 ß1X1 ß2X2 ...
ßpXp), (2.7) with the distribution of Y being
suitably defined With f(z) z and YNormal
gives ordinary linear regression f(z)
exp(z)/1 exp(z) Ybinomial gives logistic
regression f(z) exp(z) and YPoission gives
the log-linear model f(z) 1 - exp-exp(z)g(t)
and Ybinomial gives the proportional hazards
model.
20
Statistical Software

Fitting log-linear models, and other generalized
linear model requires a suitable computer
program.
Many Poisson regression programs are now
available, including
SASs Proc Genmod
GLIM
S-Pluss glm()
SPSS
SYSTAT
Quattro or Excel can also be used.

21
Tests Used in Modeling

Tests of ßi 0 can be tested by comparing

with critical values from a standard
normal. Approximate confidence intervals for ßi
of the form
22
Deviance

Deviance measures closeness of model to data
Analogous to Residual Sum-of-Squares
D -2loge(LM) - loge(LF), (2.8)
LM likelihood of the fitted model
LF likelihood of the full model
Can be used as a general measure of fit by
comparing the observed value to chi-square
distribution with df( observations -
parameters) ALTHOUGH, NOT VERY ROBUST
General rule counts in observed cells gt5

23
Difference in Deviance

Difference in deviance for nested models
approximated by a chi-square distribution with p2
p1 degrees of freedom
?D12 -2loge(L1) - loge(L2), (2.8)
model 1 subset of model 2
Model selection tool for GLIM
Overall test of selection (no selection or null
model versus full model)

24
Model Selection

Art and not a science
Most RSF analyses are based on observational data
and are exploratory in nature
Limit number of variables based on professional
judgement/knowledge of issues.
Do not limit yourself to one model unless
obvious. Make sure statistical inference is
understood. Replicate study when possible.

25
Model Selection Criteria

Nested models analysis of deviance
Stepwise
AIC
AICC
BIC

26
Akaikes Information Criteria

Burnham and Anderson (1999)
AIC -2loge(LM) 2p, (2.9)
where p is the number of unknown parameters in
the model that must be estimated
Small values of AIC suggest better model

27
Other Measures

Corrected AIC
AICc -2loge(LM) 2p n/(n - p - 1), (2.10)
Useful when sample sizes are relatively small
Bayesian information criterion (BIC)
BIC -2logc(LM) p loge(n). (2.11)

28
Model Selection Simulation

case of 4 variables, 2 proportions, and 2
continuous
-simulated no selection case, and selection for
one categorical variable (R.75/.25).
-assumed 50 used units, and either 10000 or 1000
available units.
-looked at which models are selected as best by
AIC.
-MODEL 0 - no selection
-MODEL 1 - P1
-MODEL 2 - D1
-MODEL 3- D2
-MODEL 4 - P1 D1
-MODEL 5 - P1 D2
-MODEL 6 - D1 D2
-MODEL 7 P1 D1 D2

29
Simulation Using AIC
30
Model Averaging

Do not fall in love with a single model, to the
exclusion of alternatives.
Using information from multiple models to improve
inference and interpretation
Has been shown to improve prediction
Allows for assessing importance of individual
variables

31
Process
AIC WEIGHTS
32
Example
33
Importance Values
Variable

Alder Flycatcher

Blackpoll Warbler

Savannah Sparrow

river

0.7 (
-
)

0.09 (
-
)

0

lake

0

0

1 (
-
)

band 1

0

0.93 (
-
)

0.95 ( )

band 4

1 ( )

1 ( )

0.13 ( )

band 5

0

0

0.06 ( )

S
td band 1

0.28 (
-
)

0

0.94 ( )

S
td band 3

0.89 ( )

0.72 ( )

0.05 (
-
)

S
td band 4

0.61 ( )

0

0.06 ( )

S
td band 7

0.39 ( )

0.
20 ( )

0.06 ( )

elevation

1 (
-
)

0.93 (
-
)

1 (
-
)

slope

0

0

0

aspect

0

0

0

34
Multiple Testing

Inflated experiment-wise Type I error can occur
when several significance tests or several
confidence intervals conducted at once
Example 10 independent tests carried out at the
5 level with null true, probability of one or
more results significant
1 0.95100.40

35
Approaches to Address Multiple Testing

Bonferroni procedure conservative approach
Test each comparison at the 100(?/k), with k
being the number of comparisons
For example, 3 discrete habitat types, n3
comparisons, test each at 100(0.05/3)1.67 level
10 comparisons, adjusted level is 0.005

36
Holms Method

Decide on overall ? level
Calculate p-values
Sort the p-values in ascending order
See if p1lt ?/k
If no stop, if yes, determine if p2lt ?/(k-1)
If no stop, if yes, determine if p3lt ?/(k-2)
If no stop, if yes,

37
Example ? ?
38
Bootstrap Methods

IDEA when only information available about a
statistical population consists of a random
sample from that population, best guide as to
what might happen by resampling population is by
resampling the sample
The sample is assumed to represent the population
well

39
Applications

Variance of a complicated sample statistic
? probability of use of a unit with certain
characteristics
Model weights in model averaging
Importance values for variables

40
Applications

Incorporation of between animal variability or
between true experimental unit variability
Radiod animals and logistic regression
Transects used to gather use information (walking
transects)

Write a Comment

User Comments (0)