Lab 14 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Lab 14

Description:

Results of normality ... Proc mean and corr results ... For participants who did not live with a partner, the correlation between stress ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 40
Provided by: lisawil5
Category:
Tags: lab

less

Transcript and Presenter's Notes

Title: Lab 14


1
Lab 14
  • Curvilinear analysis and detailed example of
    categorical and continuous variables analysis

2
Curvilinear Regression
  • Linear regression assumes that a straight line
    properly represents the relations between each IV
    and the DV.
  • This is not always the case. For example, it has
    been found that the relationship between job
    satisfaction and job tenure (length of time in a
    job) is a curvilinear relationship. Employees
    with low and high tenure have high satisfaction
    and employees with moderate tenure have the
    lowest satisfaction.

3
Example of Curvilinear Job Satisfaction and tenure
4
How to test this with SAS
  • What we do in polynomial regression is to conduct
    a sequence of tests. We start with regressing DV
    on IV.
  • Then add IVIV to model to see if that accounts
    for a significant amount of additional variance.
  • If it does, we add IVIVIV to see if it adds
    variance. We stop when adding a successive power
    term fails to add variance accounted for.

5
Example
  • A sports physiologist is interested in the
    effects of diet on strength of athletes. He
    measures strength and the amount of protein
    consumed and he wants to know what the
    relationship is between these two variables.
  • Form quadratic and cubic terms.
  • Run the regressions to test for trends and
    identify the best model.
  • Graph the relations between X and Y for evidence
    of nonlinearity.

6
Example program
  • data d1
  • input protein strength
  • create power terms
  • protein2proteinprotein
  • protein3protein2protein
  • cards
  • regressions with linear, quadratic, and cubic
    models
  • linear
  • proc reg
  • model strength protein
  • plot strengthprotein r.p.
  • quadratic
  • proc reg
  • model strength protein protein2
  • plot r.p.
  • cubic
  • proc reg
  • model strength protein protein2 protein3
  • plot r.p.

7
Output Model 1
  • Model MODEL1
  • Dependent
    Variable strength
  • Analysis
    of Variance

  • Sum of Mean
  • Source DF
    Squares Square F Value Pr gt F
  • Model 1
    16191 16191 646.01 lt.0001
  • Error 248
    6215.86885 25.06399
  • Corrected Total 249 22407
  • Root MSE
    5.00639 R-Square 0.7226
  • Dependent Mean
    202.56800 Adj R-Sq 0.7215
  • Coeff Var
    2.47146
  • Parameter
    Estimates

  • Parameter Standard
  • Variable DF Estimate
    Error t Value Pr gt t
  • Intercept 1 145.33012
    2.27414 63.91 lt.0001
  • protein 1 0.81480
    0.03206 25.42 lt.0001

8
(No Transcript)
9
(No Transcript)
10
Model 2 Output
  • Model MODEL1
  • Dependent
    Variable strength
  • Analysis
    of Variance

  • Sum of Mean
  • Source DF
    Squares Square F Value Pr gt F
  • Model 2
    19145 9572.45217 724.73 lt.0001
  • Error 247
    3262.43966 13.20826
  • Corrected Total 249 22407
  • Root MSE
    3.63432 R-Square 0.8544
  • Dependent Mean
    202.56800 Adj R-Sq 0.8532
  • Coeff Var
    1.79412
  • Parameter
    Estimates

  • Parameter Standard
  • Variable DF Estimate
    Error t Value Pr gt t
  • Intercept 1 22.06447
    8.40699 2.62 0.0092
  • protein 1 4.42387
    0.24247 18.24 lt.0001
  • protein2 1 -0.02589
    0.00173 -14.95 lt.0001

11
(No Transcript)
12
Model 3 Output

  • Sum of Mean
  • Source DF
    Squares Square F Value Pr gt F
  • Model 3
    19145 6381.64432 481.20 lt.0001
  • Error 246
    3262.41104 13.26183
  • Corrected Total 249 22407
  • Root MSE
    3.64168 R-Square 0.8544
  • Dependent Mean
    202.56800 Adj R-Sq 0.8526
  • Coeff Var
    1.79776
  • Parameter
    Estimates

  • Parameter Standard
  • Variable DF Estimate
    Error t Value Pr gt t
  • Intercept 1 20.15763
    41.90111 0.48 0.6309
  • protein 1 4.51006
    1.87112 2.41 0.0167
  • protein2 1 -0.02716
    0.02742 -0.99 0.3230
  • protein3 1 0.00000613
    0.00013194 0.05 0.9630

13
(No Transcript)
14
Conclusions
  • The b-weight is significant for the quadratic
    model and not for the cubic model, therefore it
    appears that the quadratic equation is the best
    fit for this data (Y22.064.42X1-.026X12) and
    it accounts for 85 of the variance.
  • Looking back at the graph (strengthprotein), it
    appears that the benefit of protein is large at
    first and then levels off, where athletes receive
    little to no benefit at around the 70 mark.

15
Detailed Example
  • Events variable is a person's score on a life
    event scale, indicating the number and severity
    of recent life events. 
  • Status variable is a measure of whether a person
    co-habits with a partner (a 0 indicates that they
    do not, and a 1 indicates that they do). 
  • Stress variable is the score on self-report
    measure of experienced stress

16
Hypotheses
  • 1 The more life events, the greater the stress.
  • 2 Those who live with their partner will have
    lower stress than participants who dont live
    with a partner.
  • 3 The relationship between events and stress is
    predicted to be moderated by status.
    Participants who cohabitate with a partner are
    predicted to be less stressed by life events than
    those who do not live with a partner.

17
Evaluate Normality
  • Check normality in variables.
  • Proc univariate normal plot
  • Check normality by Status.
  • Proc univariate normal plot
  • By status

18
Results of normality
  • Box plots Stress variable looks normal but
    Events is positively skewed with few people
    having high scores. No evident outliers.
  • Shapiro-Wilk supports visual conclusions, Stress
    was not significant (W 0.981, ns) and Events
    was significant (W 0.935, p lt .05) , indicating
    non normality. With a small percentage of
    participants reporting large number of life
    events.
  • Good distribution of status, 30 in a relationship
    and 30 not in a relationship.

19
Normality with by Status
  • Participants not in a relationship had higher
    means on events in life than those in a
    relationship. Similar variability in the both
    status groups across the event variable.
  • Participants not in a relationship had higher
    means on stress variable than those in a
    relationship, providing visual support for
    hypothesis 1. There were two outliers in the
    relationship group and the variance appears
    smaller in the relationship group.

20
Descriptive stats
  • Means, SD, and correlations.
  • Proc means
  • Proc corr

21
Proc mean and corr results
  • Both independent variables, Status and Events,
    had significant relationships with stress.
  • Status had a significant negative relationship
    with stress (r(58) -.49, p lt.05 0doesnt
    cohabit and 1does cohabit).
  • Events had a significant positive relationship
    with stress (r(58) .41, p lt .05).
  • Independent variables not significantly
    correlated with one another (status and events
    r(58) -.12, ns), which indicates that
    collinearity is not a problem with these data.

22
Linearity, Outliers, and Homoscedasticity
  • Look at plots for heteroscedasiticity and
    nonlinearity.
  • proc gplot
  • plot stressevent
  • proc gplot
  • plot stressevent
  • by status

23
Graphs
  • No evidence of heteroscedasticity or non linear
    trends.
  • There does appear to be a stronger relationship
    between stress and events for those participants
    who do not live with a partner.

24
Statistical test for Curvilinear data
  • Create power terms
  • Event2eventevent
  • Standardize variables
  • Proc standard m0
  • Run regression on linear and quadratic models
  • proc reg
  • model stress event
  • proc reg
  • model stress event event2

25
Results of curvilinear analysis
  • Linear model is significant and accounts for 17
    of the variance in stress (F(1, 58) 11.85, p lt
    .05).
  • Quadratic model is also significant (F(2,57)
    6.80, p lt .05) and accounts for 19 of the
    variance, but the beta-weight for the quadratic
    term is not significant (b(57) 1.27, ns).
  • Therefore, the linear model appears to be the
    best fit for this data.

26
Data fit, outliers and homoscedasticity
  • Run regression and check for outliers.
  • Proc reg
  • Model stress event status/ stb R influence
  • Plot p.r. stressp.

27
Results outliers
  • Predicted by residuals plot showed no apparent
    heteroscedasticity. The values appeared to be
    randomly scattered around the zero residual line.
  • Predicted by actual demonstrates a positive
    relationship. No apparent outliers

28
Results outliers (cont.)
  • 3 outliers were identified with a studentized
    residual greater than 2, 10, 29, and 54.
  • Leverage gt 2(k1)/N .10.
  • Cooks D gt.2
  • DF Betas gt .26

29
Outlier conclusions
  • There doesnt appear to be any large problems
    with outliers. 29 did have some influence so we
    will try running the regression analysis without
    it at the end and see if there are differences in
    the significance.

30
Collinearity
  • Analyze regression with collinearity diagnostics
    included.
  • Proc reg
  • Model stress event status/ vif tol collin

31
Collinearity results

32
Analyze Regression Results
  • Create interaction term
  • inter statusevent
  • Run regression analysis with and without
    interaction.
  • Proc Reg
  • Model stress status event inter/stb
  • Go to flow chart on the next slide.

33
  • Y a b1X1(groupvar) b2X2(continvar)
    b3X1X2(inter)

34
(No Transcript)
35
Regression results
  • Overall model without the interaction was
    significant (F(2,57) 16.51, p lt .05) and
    accounted for 37 of the variance.
  • Both life events (ß .36, t(58) 3.35, plt.05)
    and status (ß -.45, t(58) -4.21, plt.05) were
    significant predictors of stress.
  • The overall model with the interaction was also
    significant (F(3,56) 12.98, p lt .05) and
    accounted for 41 of the variance.
  • The interaction was significant (ß -.40, t(58)
    -2.00, plt.05), but status was no longer
    significant (ß -.13, t(58) -.67, ns).
  • Therefore, The slopes of the two groups differ
    Compute separate regressions for each group

36
Produce regression on the same graph, correlation
by status, proc means
  • Proc Means
  • By status
  • Run correlation by group
  • Proc corr
  • Var stress event
  • By status
  • Overlay regressions for two groups
  • symbol1 colorblue interpolr1 valuenone
  • symbol2 colorblack interpolr2 valuenone
  • Proc Sort by status
  • Proc gplot
  • plot stressevent status

37
Conclusions
  • For participants who did not live with a partner,
    the correlation between stress and life events
    was not significant (r(28) .10, ns).
  • For participants who did live with a partner, the
    correlation between stress and life events was
    significant (r(28) .62, p lt .05).
  • The graph of the two regression lines illustrate
    the interaction effect, with almost no slope for
    those not living with a partner and a moderate
    slope for those living with a partner.
  • Those participants living with a partner did show
    lower levels of stress (M 18.3, SD 5.47) than
    participants who do not live with a partner (M
    24.3, SD 6.14), but this difference was not
    significant when the interaction was added to the
    model.

38
Oops, one last thing, we forgot to run the model
again deleting participant 29
  • Delete participant 29 and rerun the analysis.
  • If _n_ 29 then delete

39
Conclusions after deleting
  • After deleting that one case, the interaction
    term is no longer significant (ß -.32, t(57)
    -1.62, ns). You would want to look at that one
    value and see if it was an error.
  • If you feel that the data point is a true score
    you should probably report results before and
    after.
  • A big limitation of this example is the low
    sample size.
  • If the sample size was larger, the interaction
    would probably be significant. There seemed to
    be a large effect. Even after the outlier was
    deleted, the correlations for the two groups were
    .62 and .19.
  • Might try testing for difference in significance
    between the two correlations, even though this
    test generally has less power.
Write a Comment
User Comments (0)
About PowerShow.com