Lab 14 - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Lab 14

Description:

Results of normality ... Proc mean and corr results ... For participants who did not live with a partner, the correlation between stress ... – PowerPoint PPT presentation

Number of Views:23

Avg rating:3.0/5.0

Slides: 40

Provided by: lisawil5

Category:

Tags: lab

more less

Transcript and Presenter's Notes

Title: Lab 14

1
Lab 14

Curvilinear analysis and detailed example of
categorical and continuous variables analysis

2
Curvilinear Regression

Linear regression assumes that a straight line
properly represents the relations between each IV
and the DV.
This is not always the case. For example, it has
been found that the relationship between job
satisfaction and job tenure (length of time in a
job) is a curvilinear relationship. Employees
with low and high tenure have high satisfaction
and employees with moderate tenure have the
lowest satisfaction.

3
Example of Curvilinear Job Satisfaction and tenure
4
How to test this with SAS

What we do in polynomial regression is to conduct
a sequence of tests. We start with regressing DV
on IV.
Then add IVIV to model to see if that accounts
for a significant amount of additional variance.
If it does, we add IVIVIV to see if it adds
variance. We stop when adding a successive power
term fails to add variance accounted for.

5
Example

A sports physiologist is interested in the
effects of diet on strength of athletes. He
measures strength and the amount of protein
consumed and he wants to know what the
relationship is between these two variables.
Form quadratic and cubic terms.
Run the regressions to test for trends and
identify the best model.
Graph the relations between X and Y for evidence
of nonlinearity.

6
Example program

data d1
input protein strength
create power terms
protein2proteinprotein
protein3protein2protein
cards
regressions with linear, quadratic, and cubic
models
linear
proc reg
model strength protein
plot strengthprotein r.p.
quadratic
proc reg
model strength protein protein2
plot r.p.
cubic
proc reg
model strength protein protein2 protein3
plot r.p.

7
Output Model 1

Model MODEL1
Dependent
Variable strength
Analysis
of Variance
Sum of Mean
Source DF
Squares Square F Value Pr gt F
Model 1
16191 16191 646.01 lt.0001
Error 248
6215.86885 25.06399
Corrected Total 249 22407
Root MSE
5.00639 R-Square 0.7226
Dependent Mean
202.56800 Adj R-Sq 0.7215
Coeff Var
2.47146
Parameter
Estimates
Parameter Standard
Variable DF Estimate
Error t Value Pr gt t
Intercept 1 145.33012
2.27414 63.91 lt.0001
protein 1 0.81480
0.03206 25.42 lt.0001

8
(No Transcript)
9
(No Transcript)
10
Model 2 Output

Model MODEL1
Dependent
Variable strength
Analysis
of Variance
Sum of Mean
Source DF
Squares Square F Value Pr gt F
Model 2
19145 9572.45217 724.73 lt.0001
Error 247
3262.43966 13.20826
Corrected Total 249 22407
Root MSE
3.63432 R-Square 0.8544
Dependent Mean
202.56800 Adj R-Sq 0.8532
Coeff Var
1.79412
Parameter
Estimates
Parameter Standard
Variable DF Estimate
Error t Value Pr gt t
Intercept 1 22.06447
8.40699 2.62 0.0092
protein 1 4.42387
0.24247 18.24 lt.0001
protein2 1 -0.02589
0.00173 -14.95 lt.0001

11
(No Transcript)
12
Model 3 Output

Sum of Mean
Source DF
Squares Square F Value Pr gt F
Model 3
19145 6381.64432 481.20 lt.0001
Error 246
3262.41104 13.26183
Corrected Total 249 22407
Root MSE
3.64168 R-Square 0.8544
Dependent Mean
202.56800 Adj R-Sq 0.8526
Coeff Var
1.79776
Parameter
Estimates
Parameter Standard
Variable DF Estimate
Error t Value Pr gt t
Intercept 1 20.15763
41.90111 0.48 0.6309
protein 1 4.51006
1.87112 2.41 0.0167
protein2 1 -0.02716
0.02742 -0.99 0.3230
protein3 1 0.00000613
0.00013194 0.05 0.9630

13
(No Transcript)
14
Conclusions

The b-weight is significant for the quadratic
model and not for the cubic model, therefore it
appears that the quadratic equation is the best
fit for this data (Y22.064.42X1-.026X12) and
it accounts for 85 of the variance.
Looking back at the graph (strengthprotein), it
appears that the benefit of protein is large at
first and then levels off, where athletes receive
little to no benefit at around the 70 mark.

15
Detailed Example

Events variable is a person's score on a life
event scale, indicating the number and severity
of recent life events.
Status variable is a measure of whether a person
co-habits with a partner (a 0 indicates that they
do not, and a 1 indicates that they do).
Stress variable is the score on self-report
measure of experienced stress

16
Hypotheses

1 The more life events, the greater the stress.
2 Those who live with their partner will have
lower stress than participants who dont live
with a partner.
3 The relationship between events and stress is
predicted to be moderated by status.
Participants who cohabitate with a partner are
predicted to be less stressed by life events than
those who do not live with a partner.

17
Evaluate Normality

Check normality in variables.
Proc univariate normal plot
Check normality by Status.
Proc univariate normal plot
By status

18
Results of normality

Box plots Stress variable looks normal but
Events is positively skewed with few people
having high scores. No evident outliers.
Shapiro-Wilk supports visual conclusions, Stress
was not significant (W 0.981, ns) and Events
was significant (W 0.935, p lt .05) , indicating
non normality. With a small percentage of
participants reporting large number of life
events.
Good distribution of status, 30 in a relationship
and 30 not in a relationship.

19
Normality with by Status

Participants not in a relationship had higher
means on events in life than those in a
relationship. Similar variability in the both
status groups across the event variable.
Participants not in a relationship had higher
means on stress variable than those in a
relationship, providing visual support for
hypothesis 1. There were two outliers in the
relationship group and the variance appears
smaller in the relationship group.

20
Descriptive stats

Means, SD, and correlations.
Proc means
Proc corr

21
Proc mean and corr results

Both independent variables, Status and Events,
had significant relationships with stress.
Status had a significant negative relationship
with stress (r(58) -.49, p lt.05 0doesnt
cohabit and 1does cohabit).
Events had a significant positive relationship
with stress (r(58) .41, p lt .05).
Independent variables not significantly
correlated with one another (status and events
r(58) -.12, ns), which indicates that
collinearity is not a problem with these data.

22
Linearity, Outliers, and Homoscedasticity

Look at plots for heteroscedasiticity and
nonlinearity.
proc gplot
plot stressevent
proc gplot
plot stressevent
by status

23
Graphs

No evidence of heteroscedasticity or non linear
trends.
There does appear to be a stronger relationship
between stress and events for those participants
who do not live with a partner.

24
Statistical test for Curvilinear data

Create power terms
Event2eventevent
Standardize variables
Proc standard m0
Run regression on linear and quadratic models
proc reg
model stress event
proc reg
model stress event event2

25
Results of curvilinear analysis

Linear model is significant and accounts for 17
of the variance in stress (F(1, 58) 11.85, p lt
.05).
Quadratic model is also significant (F(2,57)
6.80, p lt .05) and accounts for 19 of the
variance, but the beta-weight for the quadratic
term is not significant (b(57) 1.27, ns).
Therefore, the linear model appears to be the
best fit for this data.

26
Data fit, outliers and homoscedasticity

Run regression and check for outliers.
Proc reg
Model stress event status/ stb R influence
Plot p.r. stressp.

27
Results outliers

Predicted by residuals plot showed no apparent
heteroscedasticity. The values appeared to be
randomly scattered around the zero residual line.
Predicted by actual demonstrates a positive
relationship. No apparent outliers

28
Results outliers (cont.)

3 outliers were identified with a studentized
residual greater than 2, 10, 29, and 54.
Leverage gt 2(k1)/N .10.
Cooks D gt.2
DF Betas gt .26

29
Outlier conclusions

There doesnt appear to be any large problems
with outliers. 29 did have some influence so we
will try running the regression analysis without
it at the end and see if there are differences in
the significance.

30
Collinearity

Analyze regression with collinearity diagnostics
included.
Proc reg
Model stress event status/ vif tol collin

31
Collinearity results

32
Analyze Regression Results

Create interaction term
inter statusevent
Run regression analysis with and without
interaction.
Proc Reg
Model stress status event inter/stb
Go to flow chart on the next slide.

Y a b1X1(groupvar) b2X2(continvar)
b3X1X2(inter)

34
(No Transcript)
35
Regression results

Overall model without the interaction was
significant (F(2,57) 16.51, p lt .05) and
accounted for 37 of the variance.
Both life events (ß .36, t(58) 3.35, plt.05)
and status (ß -.45, t(58) -4.21, plt.05) were
significant predictors of stress.
The overall model with the interaction was also
significant (F(3,56) 12.98, p lt .05) and
accounted for 41 of the variance.
The interaction was significant (ß -.40, t(58)
-2.00, plt.05), but status was no longer
significant (ß -.13, t(58) -.67, ns).
Therefore, The slopes of the two groups differ
Compute separate regressions for each group

36
Produce regression on the same graph, correlation
by status, proc means

Proc Means
By status
Run correlation by group
Proc corr
Var stress event
By status
Overlay regressions for two groups
symbol1 colorblue interpolr1 valuenone
symbol2 colorblack interpolr2 valuenone
Proc Sort by status
Proc gplot
plot stressevent status

37
Conclusions

For participants who did not live with a partner,
the correlation between stress and life events
was not significant (r(28) .10, ns).
For participants who did live with a partner, the
correlation between stress and life events was
significant (r(28) .62, p lt .05).
The graph of the two regression lines illustrate
the interaction effect, with almost no slope for
those not living with a partner and a moderate
slope for those living with a partner.
Those participants living with a partner did show
lower levels of stress (M 18.3, SD 5.47) than
participants who do not live with a partner (M
24.3, SD 6.14), but this difference was not
significant when the interaction was added to the
model.

38
Oops, one last thing, we forgot to run the model
again deleting participant 29

Delete participant 29 and rerun the analysis.
If _n_ 29 then delete

39
Conclusions after deleting

After deleting that one case, the interaction
term is no longer significant (ß -.32, t(57)
-1.62, ns). You would want to look at that one
value and see if it was an error.
If you feel that the data point is a true score
you should probably report results before and
after.
A big limitation of this example is the low
sample size.
If the sample size was larger, the interaction
would probably be significant. There seemed to
be a large effect. Even after the outlier was
deleted, the correlations for the two groups were
.62 and .19.
Might try testing for difference in significance
between the two correlations, even though this
test generally has less power.