STATISTICAL MODELING PROCEDURES Chapter 2 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

STATISTICAL MODELING PROCEDURES Chapter 2

Description:

'Modelling is an art, not a science' ... Alder Flycatcher. Blackpoll Warbler. Savannah Sparrow. river. 0.7 ( 0.09 ( 0. lake. 0. 0 ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 41
Provided by: wallyer
Category:

less

Transcript and Presenter's Notes

Title: STATISTICAL MODELING PROCEDURES Chapter 2


1
STATISTICAL MODELING PROCEDURESChapter 2
  • RSF CONFERENCE
  • JANUARY 10, 2003

2
Outline
  • Introduction to Model Building
  • Simple Comparisons/Graphical Methods
  • Statistical Models in RSF(e.g., linear
    regression)
  • Hypothesis Tests
  • Model Selection
  • Multiple Testing
  • Bootstrapping

3
Dr. George Box Quotes
  • Modelling is an art, not a science,
  • All models are wrong, some are useful, and we
    should seek out those.

4
Dr. Fisher discussing model specification
  • as for problems of specification, these are
    entirely a matter for the practical
    statistician.

5
General Principles of Modeling(McCullagh and
Nelder 1989)
  • Search for useful models, and know that eternal
    truth is not within our grasp.
  • Do not fall in love with a single model, to the
    exclusion of alternatives.
  • Thoroughly check the fit of the model.

6
John Stuart Mill (1879) writing in his System of
Logic
  • The guesses which served to give mental unity
    and wholeness to a chaos of scattered
    particulars, are accidents which rarely occur to
    any minds but those abounding in knowledge and
    disciplined in intellectual combinations
  • INTERPRETATION MODELING IS NOT FOR THE MENTALLY
    CHALLENGED

7
Modeling Approaches
  • simple sample comparisons/graphical displays
  • linear regression
  • logistic regression
  • log-linear models
  • proportional hazard models
  • generalized linear models

8
T-tests
9
Graphing Example Chipping sparrow RSF
40
Unused
Used
30
mean
20
10
0
CANOPY
DEBRIS
LIVE TREE
SAPLINGS
SHRUBS
10
Simple Sample ComparisonsGraphical Example
11
Linear RegressionAnalysis of Continuous Measures
of the Amount of Use
  • Assume the amount of use of a resource unit is a
    continuous variable Y.
  • Standard statistical methods should be
    sufficient.
  • The linear regression model
  • Y ßo ß1X1 ß2X2 ... ßpXp ?, (2.1)
  • where ß0 to ßp are constants to be estimated from
    data,
  • ? N(0, ?2)

12
Linear Regression/RSF Example
  • Y Biomass of eelgrass in 1 m x 1 m quadrats
  • X1 depth

13
(No Transcript)
14
Logistic Regression
  • An assumption for this type of model is that the
    probability of a success is given by the equation
  •  
  • exp(ß0 ß1X1 ß2X2 ... ßpXp)
  • ? ----------------------------, (2.2)
  • 1 exp(ß0 ß1X1 ß2X2 ... ßpXp)
  •  
  • where ß0 to ßp are constants to be estimated from
    the available data,
  • X1 to Xp are the variables that the probability
    of a success is to be related to.
  • number of successes observed in n trials follows
    a binomial distribution with mean n? and variance
    n?(1 - ?)

15
Logistic Regression Example
  • Chipping sparrow resource selection
  • Used and unused determined based on
    presence/absence on point count stations
  • Design I, sampling protocol D

16
Resulting Model
  • exp(3.2150.088canopy0.053see
    dling0.019nsapling0.676grndb)
  • w(x) __________________________________
    ____________________

  • 1exp(3.2150.088canopy0.053s
    eedling0.019nsapling0.676grndb)

17
Log-linear Model
  • Y are counts of the number of occurrences of a
    certain event under different conditions
  • Natural assumption are that the counts follow a
    Poission Distribution
  • E(Y) µ exp(ß0 ß1X1 ß2X2 ... ßpXp).
    (2.3)
  • Examples, number of animals observed within
    blocks of land, with covariates measured on those
    blocks

18
Example
19
Generalized Linear Models (McCullagh and Nelder,
1989).
  E(Y) f(ß0 ß1X1 ß2X2 ...
ßpXp), (2.7)   with the distribution of Y being
suitably defined With  f(z) z and YNormal
gives ordinary linear regression  f(z)
exp(z)/1 exp(z) Ybinomial gives logistic
regression  f(z) exp(z) and YPoission gives
the log-linear model f(z) 1 - exp-exp(z)g(t)
and Ybinomial gives the proportional hazards
model.  
20
Statistical Software
  • Fitting log-linear models, and other generalized
    linear model requires a suitable computer
    program.
  • Many Poisson regression programs are now
    available, including
  • SASs Proc Genmod
  • GLIM
  • S-Pluss glm()
  • SPSS
  • SYSTAT
  • Quattro or Excel can also be used.

21
Tests Used in Modeling
  • Tests of ßi 0 can be tested by comparing

with critical values from a standard
normal. Approximate confidence intervals for ßi
of the form
22
Deviance
  • Deviance measures closeness of model to data
  • Analogous to Residual Sum-of-Squares
  • D -2loge(LM) - loge(LF), (2.8)
  • LM likelihood of the fitted model
  • LF likelihood of the full model
  • Can be used as a general measure of fit by
    comparing the observed value to chi-square
    distribution with df( observations -
    parameters) ALTHOUGH, NOT VERY ROBUST
  • General rule counts in observed cells gt5

23
Difference in Deviance
  • Difference in deviance for nested models
    approximated by a chi-square distribution with p2
    p1 degrees of freedom
  • ?D12 -2loge(L1) - loge(L2), (2.8)
  • model 1 subset of model 2
  • Model selection tool for GLIM
  • Overall test of selection (no selection or null
    model versus full model)

24
Model Selection
  • Art and not a science
  • Most RSF analyses are based on observational data
    and are exploratory in nature
  • Limit number of variables based on professional
    judgement/knowledge of issues.
  • Do not limit yourself to one model unless
    obvious. Make sure statistical inference is
    understood. Replicate study when possible.

25
Model Selection Criteria
  • Nested models analysis of deviance
  • Stepwise
  • AIC
  • AICC
  • BIC

26
Akaikes Information Criteria
  • Burnham and Anderson (1999)
  • AIC -2loge(LM) 2p, (2.9)
  • where p is the number of unknown parameters in
    the model that must be estimated
  • Small values of AIC suggest better model

27
Other Measures
  • Corrected AIC
  • AICc -2loge(LM) 2p n/(n - p - 1), (2.10)
  • Useful when sample sizes are relatively small
  • Bayesian information criterion (BIC)
  • BIC -2logc(LM) p loge(n). (2.11)

28
Model Selection Simulation
  • case of 4 variables, 2 proportions, and 2
    continuous
  • -simulated no selection case, and selection for
    one categorical variable (R.75/.25).
  • -assumed 50 used units, and either 10000 or 1000
    available units.
  • -looked at which models are selected as best by
    AIC.
  • -MODEL 0 - no selection
  • -MODEL 1 - P1
  • -MODEL 2 - D1
  • -MODEL 3- D2
  • -MODEL 4 - P1 D1
  • -MODEL 5 - P1 D2
  • -MODEL 6 - D1 D2
  • -MODEL 7 P1 D1 D2

29
Simulation Using AIC
30
Model Averaging
  • Do not fall in love with a single model, to the
    exclusion of alternatives.
  • Using information from multiple models to improve
    inference and interpretation
  • Has been shown to improve prediction
  • Allows for assessing importance of individual
    variables

31
Process
AIC WEIGHTS
32
Example
33
Importance Values
Variable

Alder Flycatcher

Blackpoll Warbler

Savannah Sparrow

river

0.7 (
-
)

0.09 (
-
)

0

lake

0

0

1 (
-
)

band 1

0

0.93 (
-
)

0.95 ( )

band 4

1 ( )

1 ( )

0.13 ( )

band 5

0

0

0.06 ( )

S
td band 1

0.28 (
-
)

0

0.94 ( )

S
td band 3

0.89 ( )

0.72 ( )

0.05 (
-
)

S
td band 4

0.61 ( )

0

0.06 ( )

S
td band 7

0.39 ( )

0.
20 ( )

0.06 ( )

elevation

1 (
-
)

0.93 (
-
)

1 (
-
)

slope

0

0

0

aspect

0

0

0

34
Multiple Testing
  • Inflated experiment-wise Type I error can occur
    when several significance tests or several
    confidence intervals conducted at once
  • Example 10 independent tests carried out at the
    5 level with null true, probability of one or
    more results significant
  • 1 0.95100.40

35
Approaches to Address Multiple Testing
  • Bonferroni procedure conservative approach
  • Test each comparison at the 100(?/k), with k
    being the number of comparisons
  • For example, 3 discrete habitat types, n3
    comparisons, test each at 100(0.05/3)1.67 level
  • 10 comparisons, adjusted level is 0.005

36
Holms Method
  • Decide on overall ? level
  • Calculate p-values
  • Sort the p-values in ascending order
  • See if p1lt ?/k
  • If no stop, if yes, determine if p2lt ?/(k-1)
  • If no stop, if yes, determine if p3lt ?/(k-2)
  • If no stop, if yes,

37
Example ? ?
38
Bootstrap Methods
  • IDEA when only information available about a
    statistical population consists of a random
    sample from that population, best guide as to
    what might happen by resampling population is by
    resampling the sample
  • The sample is assumed to represent the population
    well

39
Applications
  • Variance of a complicated sample statistic
  • ? probability of use of a unit with certain
    characteristics
  • Model weights in model averaging
  • Importance values for variables

40
Applications
  • Incorporation of between animal variability or
    between true experimental unit variability
  • Radiod animals and logistic regression
  • Transects used to gather use information (walking
    transects)
Write a Comment
User Comments (0)
About PowerShow.com