Title: Using Quasivariance to Communicate Sociological Results from Statistical Models
1Using Quasi-variance to Communicate Sociological
Resultsfrom Statistical Models
- Vernon Gayle Paul S. Lambert University of
Stirling
Gayle and Lambert (2007) Sociology,
41(6)1191-1208.
2- One of the useful things about mathematical and
statistical models of educational realities is
that, so long as one states the assumptions
clearly and follows the rules correctly, one can
obtain conclusions which are, in their own terms,
beyond reproach. The awkward thing about these
models is the snares they set for the casual
user the person who needs the conclusions, and
perhaps also supplies the data, but is untrained
in questioning the assumptions.
3- What makes things more difficult is that, in
trying to communicate with the casual user, the
modeller is obliged to speak his or her language
to use familiar terms in an attempt to capture
the essence of the model. It is hardly surprising
that such an enterprise is fraught with
difficulties, even when the attempt is genuinely
one of honest communication rather than
compliance with custom or even subtle
indoctrination (Goldstein 1993, p. 141).
4A little biography (or narrative)
- Since being at Centre for Applied Stats in 1998/9
I has been thinking about the issue of model
presentation - Done some work on Sample Enumeration Methods with
Richard Davies - Summer 2004 (with David Steeles help) began to
think about quasi-variance - Summer 2006 began writing a paper with Paul
Lambert
5The Reference Category Problem
- In standard statistical models the effects of a
categorical explanatory variable are assessed by
comparison to one category (or level) that is set
as a benchmark against which all other categories
are compared - The benchmark category is usually referred to as
the reference or base category
6The Reference Category Problem
- An example of Some English Government Office
Regions - 0 North East of England
- --------------------------------------------------
-------------- - 1 North West England
- 2 Yorkshire Humberside
- 3 East Midlands
- 4 West Midlands
- 5 East of England
7Government Office Region
8Table 1 Logistic regression prediction that
self-rated health is good (Parameter estimates
for model 1 )
9(No Transcript)
10(No Transcript)
11Conventional Confidence Intervals
- Since these confidence intervals overlap we might
be beguiled into concluding that the two regions
are not significantly different to each other - However, this conclusion represents a common
misinterpretation of regression estimates for
categorical explanatory variables - These confidence intervals are not estimates of
the difference between the North West and
Yorkshire and Humberside, but instead they
indicate the difference between each category and
the reference category (i.e. the North East) - Critically, there is no confidence interval for
the reference category because it is forced to
equal zero
12Formally Testing the Difference Between
Parameters -
The banana skin is here!
13Standard Error of the Difference
Variance North West (s.e.2 )
Only Available in the variance covariance matrix
Variance Yorkshire Humberside (s.e.2 )
14Covariance
15Standard Error of the Difference
0.0083
Variance North West (s.e.2 )
Only Available in the variance covariance matrix
Variance Yorkshire Humberside (s.e.2 )
16Formal Tests
- t -0.03 / 0.0083 -3.6
- Wald c2 (-0.03 /0.0083)2 12.97 p 0.0003
- Remember earlier because the two sets of
confidence intervals overlapped we could wrongly
conclude that the two regions were not
significantly different to each other
17Comment
- Only the primary analyst who has the opportunity
to make formal comparisons - Reporting the matrix is seldom, if ever, feasible
in paper-based publications - In a model with q parameters there would, in
general, be ½q (q-1) covariances to report
18Firths Method (made simple)
s.e. difference
19(No Transcript)
20Firths Method (made simple)
s.e. difference
0.0083
t (0.09-0.12) / 0.0083 -3.6 Wald c2
(-.03 / 0.0083)2 12.97 p 0.0003 These
results are identical to the results calculated
by the conventional method
21The QV based comparison intervals no longer
overlap
22Firth QV Calculator (on-line)
23(No Transcript)
24Information from the Variance-Covariance Matrix
Entered into the Data Window (Model 1)
- 0
- 0 0.00010483
- 0 0.00007543 0.00011543
- 0 0.00007543 0.00007543 0.00012312
- 0 0.00007543 0.00007543 0.00007543 0.00011337
- 0 0.00007544 0.00007543 0.00007543 0.00007543
0.00011480 - 0 0.00007545 0.00007544 0.00007544 0.00007544
0.00007545 0.00010268 - 0 0.00007544 0.00007543 0.00007544 0.00007543
0.00007544 0.00007546 0.00011802 - 0 0.00007552 0.00007548 0.00007550 0.00007547
0.00007554 0.00007572 0.00007558 0.00015002 - 0 0.00007547 0.00007545 0.00007546 0.00007545
0.00007548 0.00007555 0.00007549 0.00007598
0.00012356
25(No Transcript)
26Conclusion We should start using method
- Benefits
- Overcomes the reference category problem when
presenting models - Provides reliable results (even though based on
an approximation) - Easy(ish) to calculate
- Has extensions to other models
- Costs
- Extra column in results
- Time convincing colleagues that this is a good
thing
27Conclusion
- Why have we told you this
- Categorical X vars are ubiquitous
- Interpretation of coefficients is critical to
sociological analyses - Subtleties / slipperiness
- (cf. in Economics where emphasis is often on
precision rather than communication)