Model Evaluation and Selection - PowerPoint PPT Presentation

About This Presentation
Title:

Model Evaluation and Selection

Description:

Evaluating the Sufficiency of a Single ... This is the 'single-degree-of-freedom chi-square test' ... Abelson, RP (1995) Statistics as Principled Argument. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 18
Provided by: grac150
Category:

less

Transcript and Presenter's Notes

Title: Model Evaluation and Selection


1
Model Evaluation and Selection
2
Example Objective Demonstrate how to evaluate a
single model and how to compare alternative
models.
3
Evaluating the Sufficiency of a Single Model
(followup to example of Mediation Test)
When this model is run, a variety of measures of
model fit will be generated. A question of
importance is, "Is the fit of the model
sufficiently good to yield reliable results?"
The alternative model is one in which there is
also an arrow from s_age to tcov. In other words,
does fire severity explain the effect of stand
age on cover, or, is there another pathway of
influence independent of fire severity?
4
Finding Measures of Model Fit in Amos I
The model chi-square is the most commonly used
measure of absolute model fit.
It is always good to check the section of the
output called Notes for Model. Here we can see
that a minimum was achieved and the full p-value
for the chi-square. P-value greater than 0.05
suggests that we could accept this model (it
indicates no major deviations between data and
model).
5
Further Considerations of Model Chi-square
It is well known that model p-values are not
always the best way to decide if a model is
adequate (in an absolute sense) or the best model
(in a relative sense). This is a complex topic
and one that lacks complete consensus. What is
generally agreed upon is
(1) Chi-squares automatically increase with
increasing sample size and p-values reflect
increasing power for detecting deviations.
(2) P-values for model chi-squares are pretty
useful when sample sizes are less than 200,
especially for models that do not include latent
variables possessing multiple indicators.
(3) It is recommended that folks look at multiple
measures.
6
Further Considerations (cont.)
One useful way to evaluate model adequacy is to
see if the addition of pathways causes the model
chi-square to drop by more than 3.84 units. This
is the single-degree-of-freedom chi-square
test. If adding a path reduces the chi-square by
less than 3.84, it implies that the added path is
not strongly supported by the data.
In the current example, the chi-square is 3.243,
which tells us that adding a path from s_age to
tcov could only reduce model chi-square by 3.243.
This further indicates that our model could be
considered to be adequate.
7
Finding Measures of Model Fit in Amos II
Cmin means minimum chi-square.
Model Fit tab gives us several measures to
consider.
8
continued
clicking on labels gives additional info
9
continued
RMSEA indicates close fit. Also that a value of
0 (perfect fit) cannot be ruled out.
An AIC for our model (the default model) of
13.243 could only be reduced to a value of 12.000
by saturating our model. This is less than the
minimum recommended AIC difference of 2.0,
suggesting models indistinguishable. BUT, AIC is
often not a reliable measure.
10
continued some more
The CAIC (consistent AIC) is generally viewed to
be a better measure than AIC. Here we see that
the default model value is more than 2.0 units
smaller than the saturated model, supporting the
conclusion that our model is adequate.
11
and still some more
The BIC (Bayesian Information Criterion) is one
of the more popular measures at the moment. In
this case, the saturated model BIC is only 1.257
greater, which is less than the 2.0 difference
recommended for picking among models. This index
tells us that while the evidence is better for
the default model, the saturated model cant be
ruled out.
12
and even still some more
The Hoelter index relates back to our model
Chi-square and its p-values. It tells us that at
a sample size of 106, we would have enough power
to detect an additional path from s_age to tcov
with a p-value less than 0.05. 183 samples would
be required to obtain a p-value less than 0.01.
13
AIC difference criteria
AIC diff support for equivalency of
models 0-2 substantial 4-7 weak gt 10 none
Burnham, K.P. and Anderson, D.R. 2002. Model
Selection and Multimodel Inference.
Springer Verlag. (second edition), p 70.
14
BIC difference criteria
BIC diff support for difference between
models 0-2 weak 2-6 positive 6-10 strong gt
10 very strong
Raftery, A.E. 1995. Sociological Methodology.
25111-163, p 70
15
What do we conclude in this case?
Given the data we have available, we could
justify (in my view) omitting the pathway from
s_age to tcov. However, we must recognize that
this is an approximation of the truth. If we had
more samples, would they lead us to decide that
we needed to include a path from s_age to tcov?
Without the additional samples we dont really
know. Comparing the path coefficients for the two
models would allow us to decide the scientific
consequences of our model choice.
16
What is the SEM perspective on model selection?
In SEM we use our scientific knowledge to guide
our decisions, and this applies especially to
model selection. Do we believe it serves our
scientific purposes to omit the path from s_age
to tcov? We certainly can present the results for
the path in the following fashion if we think it
merits discussion.
e1
e2
0.45
-0.35
s_age
fidx
tcov
-0.19ns
17
Final thought
"Statistical tests are aids to (hopefully wise)
judgement, not two-valued logical declarations of
truth or falsity". Abelson, RP (1995) Statistics
as Principled Argument. Lawrence Erlbaum
Associates, Hillsdale, NJ, USA
Write a Comment
User Comments (0)
About PowerShow.com