Psychology 412 - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Psychology 412

Description:

The standardized average directional distance from the mean. Research overview: Correlations ... wtf. Testing our parameters. Is 0 significantly different from 0? ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 78
Provided by: adamk2
Category:
Tags: psychology | wtf

less

Transcript and Presenter's Notes

Title: Psychology 412


1
Psychology 412
  • Instructor Adam Kramer
  • Week 3

2
Last week
  • Sampling distributions lead to p-values
  • Dependent data
  • Deriving a correlation
  • The standardized average directional distance
    from the mean
  • Research overview Correlations

3
Re-deriving correlation
  • Standardized average directional difference
  • You can also standardize first
  • Then, r is the covariance of the standardized data

4
Using correlations
  • Correlations are standardized twice
  • Once for each variable
  • The result is directionless.
  • Our task, however, is likely prediction
  • If R2 tells us the explained variance, which
    variance do we care to explain?

5
Predicting
  • Back to Extraversion and SWL
  • r .357, t(271) 6.29, p
  • So, E explains SWL variance
  • So, if we use E, we can do BETTER than guessing
    the mean for SWL

6
Predicting Our task
7
Predicting Mean SWL?
8
Predicting Mean BFIE?
9
Predicting Means?
10
Predicting Some diagonal?
11
Predicting Your choice?
12
Predicting Why?
13
Predicting Why?
  • Minimization of squared residuals
  • Residual is the unexplained part
  • In this case, the distance from the line
  • Were trying to predict SWL
  • Squared because it differentiates

14
Residuals
15
Residuals
16
Predicting Why?
17
Predicting How?
  • We use the correlation, but un-standardize it
  • Put it back in y-units by multiplying by ?y
  • Divide by ?y again, and youre in y units per x
    unit.

18
Predicting SWL from E
  • SWLy, Ex
  • (by convention, y is the DV)
  • y units per x unit
  • We expect someones SWL to increase by 0.41 for
    every increase in BFIE
  • But thats only MORE, where do we start?

19
Predicting SWL from E
  • Start at zero.
  • The regression line hits (µY,µX)
  • It has to, or it wouldnt minimize variance
  • (variance is computed relative to the mean)
  • So, go back µX slope points, and youre at 0 on
    the x axis, and 40.311 on the y axis

20
Modeling
  • SWL 40.311 Extraversion0.411
  • We say, We have a MODEL of SWL, based on
    extraversion.
  • Generally, Y ?0?1X
  • The DV is a function of the IV, plus intercept
    estimates are represented with indexed betas
    (sometimes alpha is the intercept)

21
Modeling
  • One issue here is that were not really dealing
    with SWL and E
  • These were just self-report scales
  • There is likely some error involved
  • Y ?0?1X?
  • The regression model.

22
Testing the model
  • How good is our model?
  • Can it fly like a REAL airplane?
  • We have estimated two parameters
  • So, we have two parameters to test
  • Y ?0?1X
  • Is ?0 significantly different from 0?
  • Is ?1 significantly different from 0?
  • These sound like t-test questions

23
t-tests
  • In general, weve been looking at mean
    differences over standard errors to compute t
  • Really, t-tests are more general if you can
    estimate a standard error for ANY value, t is
    the ratio.

24
Standard errors
  • The standard error of the model has to do with
    the deviation from the line
  • Y given X
  • Numerator is deviations from the line
  • Denominator n-2 because we estimated the mean for
    X and Y

25
Standard errors
  • Standard error of the intercept and slope are not
    the same.
  • wtf.

26
Testing our parameters
  • Is ?0 significantly different from 0?
  • When BFIE is zero, is SWL significantly different
    from zero?

27
Testing our parameters
  • Is ?0 significantly different from 0?
  • When BFIE is zero, is SWL significantly different
    from zero?
  • t(271) 10.44, p
  • YES. Even total introverts have some modicum of
    satisfaction with their life.

28
Testing our parameters
  • Is ?1 significantly different from 0?
  • Does BFIE help us predict SWL better than the
    mean of SWL?

29
Testing our parameters
  • Is ?1 significantly different from 0?
  • Does BFIE help us predict SWL better than the
    mean of SWL?
  • t(271) 6.32, p
  • YES. BFIE helps us predict SWL every point of
    extraversion predicts 0.411 SWL.
  • Note that the t is the same as for rwhy?

30
Reporting our regression
  • Coefficients
  • Estimate Std. Error t value Pr(t)
  • (Intercept) 40.34084 3.97257 10.155
  • bfie 0.40994 0.06516 6.291 1.26e-09
  • Computers do the math for us!
  • An appropriately-reported regression
  • A line for each estimated parameter
  • List estimate, SE, t, p

31
Intercepts
  • Even total introverts have some modicum of
    satisfaction with their life.
  • Uhhduh. So what?
  • The intercept tests Y when X is zero
  • If we change Y (SWL), we change the intercept
  • What is a MEANINGFUL level to test Y against?

32
Intercepts
  • If we first center our dataset, the units remain
    the same, but 0 becomes synonymous with the
    mean.
  • So the intercept would test Is SWL significantly
    above or below the mean when BFIE is zero?
  • And we could center BFIE
  • Is SWL significantly above or below the mean for
    the AVERAGE BFIE?

33
Intercepts
This is where lecture ended on Tuesday.
  • Estimate Std. Error t value
    Pr(t)
  • (Intercept) -0.03148 1.20244 -0.026
    0.98
  • center(kbfie) 0.40994 0.06516 6.291
    1.26e-09
  • The intercept is no longer significant at mean
    extraversion level, SWL is no different from its
    mean
  • Note that the estimate, SE, t are the same for
    BFIE
  • Why ask? Because we can.

34
Intercepts
  • Centering is not the only way
  • Test whether when X0 is Y50? 100?
  • Add 100 to the Y variable

35
Intercepts
  • Centering is not the only way
  • Test whether when X0 is Y50? 100?
  • Add 100 to the Y variable
  • Estimate Std. Error t value
    Pr(t)
  • (Intercept) 40.34084 3.97257 10.155
  • bfie 0.40994 0.06516 6.291 1.26e-09
  • versus
  • (Intercept) 140.34084 3.97257 35.327
  • bfie 0.40994 0.06516 6.291 1.26e-09
  • The estimate went up by 100 t by 15!

36
Slopes
  • every point of extraversion predicts 0.411
    SWL.
  • Once again, so what does 0.411 mean?
  • The slope converts X units to Y units, but X and
    Y may not have meaningful units

37
Slopes
  • The meaning of a unit
  • A one-point increase is 1/7 of the scale
  • But if everyones a 4 5 or 6, thats a lot
  • Meaningful units
  • 0.411 contented sighs per day
  • Percentage points on a test
  • Standard deviations
  • Addresses the variability within the variable
    scale doesnt matter as much

38
Slopes
  • Standardization can help
  • If we standardized SWL, we would see how many
    standard deviations in SWL are predicted by
    unit-changes in E
  • If we standardized E, we would see how much a
    standard deviation worth of difference in E
    predicted an increase in SWL units
  • If we standardize both, we would see how many SD
    in SWL are predicted by a 1-SD increase in E

39
Slopes
  • Estimate Std. Error t value
    Pr(t)
  • (Intercept) -0.001483 0.056640 -0.026
    0.98
  • scale(kbfie) 0.356068 0.056598 6.291
    1.26e-09
  • As BFIE goes up by one sd, we expect SWL to go up
    by 0.356 SD
  • SE, t, p the same
  • Same centered intercept
  • The standardized slope is the correlation!

40
Residuals
  • Remember those residuals? You can plot them.
  • EPDAA doesnt have the ring of EDA
  • Plot a residual for a data point against the
    points predicted value

41
Residuals
  • Looks like a mess!
  • Thats good.
  • Its as if we took the regression line and
    rotated our plot so that its flat.
  • Note the scale differences
  • Range is

42
Residuals
  • New data set Airplane load factor and stall
    speed
  • LF The amount of weight placed on a maneuvering
    aircraft relative to its ground weight
  • Stall speed The speed at which the wing loses
    its lift

43
Residuals
  • OK, do some EDA.
  • Hmm, its definitely line-likemaybe kinda curvy
  • May as well try a regression

44
Residuals
  • OK, do some EDA.
  • Hmm, its definitely line-likemaybe kinda curvy
  • May as well try a regression
  • Ooh, nice! Good fit!
  • Not perfect, but good.

45
Residuals
  • Estimate Std. Error t value Pr(t)
  • (Intercept) 35.2289 1.5102 23.33
  • LF 17.9096 0.4609 38.85
  • Looks good!
  • Each additional point in load factor predicts
    almost 18MPH higher stall speed
  • So, clearly, if youre gonna maneuver, ygotta
    fly faster.
  • And my God, R2 .9655! Almost perfect!
  • Well, it is an engineering issue, not psych.

46
Residuals
  • That does NOT look messy.
  • The deviation from the line depends on what we
    predict
  • The issue is LINES
  • We shot a line through a curve theres still a
    curve.

47
Residuals
  • Our line explains variance
  • But our errors depend on one of our variables
  • We are clearly missing something.

48
Residuals
  • When we plot residuals, were looking for
    patterns. A pattern indicates that our regression
    is better for predicting SOME data points than
    OTHERS.
  • Were still explaining variance, were still
    making better estimates than nothing, but there
    is something we are missing.
  • If we can articulate how the error relates to
    where on the line we are

49
Residuals
  • What to do?
  • Best option Explain it away. WHY is that curve
    there?
  • Probably because a line is insufficient
  • Another option Accept it.
  • We have a significant LINEAR relationship it
    explains 95 of the variance. There is something
    left, and we dont know what
  • Is our research question answered?

50
Explain it away.
  • In many cases, we can transform the data so that
    it no longer irks us
  • Experience teaches us what graphs look likelog,
    sqrt, square, sine

51
Explain it away.
  • XLF on left, Xsqrt(LF) on right
  • Estimate Std. Error t value
    Pr(t)
  • (Intercept) 6.748e-04 1.156e-03 0.583
    0.562
  • sqrt(lfLF) 5.400e01 6.851e-04 78823.615
  • R2 1 (physics, not psychology)

52
Assumption 1 Linearity
  • assumes a linear relationship between X and Y
  • Why would we assume this?

53
Assumption 1 Linearity
  • Predicting beyond the data
  • In our example, the further from the line we get,
    the worse our line performs
  • Prediction accuracy is not consistant
  • But do we need to predict there?
  • A clearly flawed model
  • A pattern in the residuals means were missing
    something
  • But do we care?

54
Assumption 1 Linearity
  • R2, ?, t are inaccurate
  • They do NOT estimate parameters
  • They are too small when curving makes them err
  • to the extent that our data does not represent
    the population.
  • They still describe the data set, and they still
    test the linear component.

55
Assumption 2 Normality
  • Regression assumes that errors are normally
    distributed along the regression line
  • NOT that X or Y is normal!

56
Assumption 2 Normality
57
Assumption 2 Normality
  • The distance here from the line is constant, not
    normal
  • In this case, there are two effects.
  • Once again, the model is incomplete
  • What is the grouping?

58
Assumption 2 Normality
  • However, the regression line still represents the
    data
  • Explanation is mediocre
  • Prediction is bad
  • But there is on average a linear effect.

59
Assumption 2 Normality
  • What to do about it?
  • If the errors are NOT normally distributed, what
    does that mean?

60
Assumption 2 Normality
  • What to do about it?
  • If the errors are NOT normally distributed, what
    does that mean?
  • Variation in X or Y is due to something else
  • Neither X nor Y is measured precisely there may
    be multiple causes of a certain score
  • Random causes are normally distributed in
    relevance, so nonrandom effects indicate another
    cause of the report
  • So, seek the other cause

61
Assumption 3 Homoscedasticity
  • A word too big to fit on one title line
  • Scadesticity refers to the extent to which
    variance is consistent.

62
Assumption 3 Homoscedasticity
  • Heteroscedastic plots make us worry about our
    slope
  • Wide-varianced points have more influence

63
Assumption 3 Homoscedasticity
  • Solutions
  • Transform the data, or give the outliers less
    influencedont end up like this graph

64
Assumption 3 Homoscedasticity
  • But if you do, same issues
  • Does this answer your question?
  • The line still summarizes the data linearly

65
Assumption 4 Independence
  • Every X has a corresponding Y
  • But the pairs should be independent.
  • Each x and y value contribute equally to the x
    and y mean each value contributes equally to the
    x and y sd
  • So if you can predict one x from another, those
    two xs are closer together than the others
  • So the variance is underestimated.

66
Assumption 4 Independence
  • Example Parent/child height
  • Hypothesis You can predict a childs height from
    a biological parents
  • Method Get height for mother, father, children
  • One data point for each parent/child pair
  • How does this violate independence?

67
Assumption 4 Independence
68
Assumption 4 Independence
69
Assumption 4 Independence
  • But how do you compute the means and covariance?
  • How many observations do you have?
  • How many degrees of freedom?
  • There are ways to handle data like these, but
    simple regression is not the way.

70
Assumption 5 Independence?
  • of errors.
  • How far off from the line we are for one (x,y)
    data point should not predict how far from the
    line we are for any other data point.

71
Assumption 5 Independence?
  • Example Happiness over the month.

72
Assumption 5 Independence?
  • Example Happiness over the month.

73
Assumption 5 Independence?
  • Oh, right, weekends.

74
Assumption 5 Independence?
  • t(29)0.06, n.s. What does this tell us?

75
Assumption 5 Independence?
  • What does this tell us?
  • Theres more to the story than just the
    regression
  • Overall, happiness does not change over the
    course of the month
  • The correlation is negative but insignificant.

76
Assumptions
  • underlie conclusions.
  • You make assumptions in your interpretation
  • The math is deterministic
  • The regression line is always the best fit to the
    data
  • But whether thats good enough fit is your
    decision

77
Assumptions
  • Regression estimates a line, its the best line
  • If a line is not the way, find another way
  • If theres more than a line in your data, that is
    a good thing. Your data answers questions you
    didnt ask!
  • Oh, right, questions!
  • Does fitting a line to the data answer your
    question?
Write a Comment
User Comments (0)
About PowerShow.com