Simple Regression - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Simple Regression

Description:

8. Simple Regression. yh=[ones(length(x),1) x]*b; plot ... df=length(y)-2. rmse=sqrt(sse/df) sxx=(x-mean(x) ... X=ones(length(y),1); B=xy; B turns out to be ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 43
Provided by: johnt1
Category:

less

Transcript and Presenter's Notes

Title: Simple Regression


1
Simple Regression
  • Regression is a way to measure the relationship
    between two variables
  • Especially useful when equality doesnt make
    sense
  • SAT, QPR might be related, but certainly cannot
    be equal

2
Simple Regression
  • Simple regression model
  • Y b0 b1 X e
  • E has normal distn, mean0, SD unknown
  • Need to estimate b0, b1

3
Simple Regression
  • Use principle of least squares
  • Find b0, b1 to minimize
  • ? (y (b0b1x))2
  • Residuals are diff between observed Y and our
    model

4
Simple Regression
  • Can use calculus to find formulas for b0, b1, but
    it is built in to Matlab
  • B0yavg-b1xavg
  • So (xavg, yavg) is on LS line
  • Really should use point-slope rather than
    slope-intercept
  • B1?(x-xavg)(y-yavg) / ?(x-xavg)2
  • Denom is sum of squares
  • Numerator is sum of products generalized
    squares

5
Simple Regression
  • Consider Fig 12.05, p 536
  • gtgt xd(,1)yd(,2)
  • gtgt plot(x,y,'o')
  • gtgt xlabel('Production')
  • gtgt ylabel('Usage')
  • gtgt grid

6
Simple Regression
7
Simple Regression
  • Need to construct col of 1s
  • gtgt bones(length(x),1) x\y
  • 0.4090
  • 0.4988
  • B00.4090 intercept
  • B10.4988 slope

8
Simple Regression
  • yhones(length(x),1) xb
  • gtgt plot(x,y,'o',x,yh,'-')
  • gtgt grid

9
Simple Regression
10
Simple Regression
  • Exercises
  • 1. Calculate slope, intcpt for Fig 12.08, p. 538
  • 2. Repeat for Fig 12.11, p. 540

11
Simple Regression
  • In Fig 12.13, p 540, Run is a function of pushups
  • What if we had done it the other way?
  • NOTE Generally do NOT get the reciprocal slope
  • VERY IMPORTANT FACT
  • Comes from the fact that LS minimizes differences
    in a particular direction
  • In the direction of the variable being fitted
  • That changes between the two models so we get
    different results

12
Simple Regression
  • Note that in the text, he gives a name to the SS
    that we have minimized
  • SSE ? (y (b0b1x))2
  • sum((y-yh).2)
  • (y-yh)(y-yh) (if y is a col vector)
  • 0.2991 for 12.05
  • This is the same notation as in ANOVA
  • Return to it later

13
Simple Regression
  • Some other important sums of squares
  • SXY ? (x-xavg)(y-yavg)
  • SXX ? (x-xavg)2
  • SYY ? (y-yavg)2
  • Can be calculated using XXXX for some vector XX

14
Simple Regression
  • Most inferences are on the slope
  • This tells about the relation between X and Y
  • B1 is a combination of our observations
  • Observations are assumed to be normal
  • So B1 is also normal
  • Unknown SD
  • SD(B1) SD(e) / ?SXX
  • Est SD(e) ? SSE/(N-2)

15
Simple Regression
  • Since we have used an estimate of the SD, we use
    the t-distn
  • DfN-2 (the divisor in the estimate)

16
Simple Regression
  • Return to original problem
  • xd(,1)yd(,2)bones(length(x),1) x\y
  • yhones(length(x),1) xb
  • sse(y-yh)'(y-yh)
  • dflength(y)-2
  • rmsesqrt(sse/df)
  • sxx(x-mean(x))'(x-mean(x))
  • pvtprob(0,rmse/sqrt(sxx),df,b(2),99)
  • 4.0879e-005 (one sided)

17
Simple Regression
  • 95 Confidence interval
  • gtgt bubisect(_at_(d) tprob(d,rmse/sqrt(sxx),
    dfe,-99,b(2)),.025,-9,9,.00001)
  • 0.6734
  • gtgt blbisect(_at_(d) tprob(d,rmse/sqrt(sxx),
    dfe,b(2),99),.025,-9,9,.00001)
  • 0.32426

18
Simple Regression
  • Exercises
  • 1. Calculate one- and two-sided p-values and 95
    CI for slope for Fig 12.08, p. 538
  • 2. Repeat for Fig 12.11, p. 540

19
Simple Regression
  • ANOVA table for regression
  • SSE(y-yh) (y-yh)
  • DfN-2
  • Instead of SSTr, we have SSR
  • SSR (yavg-yh) (yavg-yh)
  • Df1
  • SSTotal is still (y-yavg) (y-yavg)
  • DfN-1
  • If PV for F is small, we reject H0 slope0 vs
    Ha slope not 0

20
Simple Regression
  • For orig problem
  • gtgt sst(y-yavg)'(y-yavg)
  • 1.5115
  • gtgt ssr(yh-yavg)'(yh-yavg)
  • 1.2124
  • gtgt f(ssr/1)/(sse/df)
  • 40.5330
  • gtgt fprob(1,df,f,999)
  • 8.1759e-005
  • gtgt pv2
  • 8.1759e-005 (2 sided t test)

21
Simple Regression
  • ANOVA gives exactly the same PV as two-sided t
    test
  • How can ANOVA work this problem and the problems
    in the previous chapter?

22
Simple Regression
  • Common thread is evaluating a model
  • In regression, the line is the model
  • In ANOVA, our model is that each group has its
    own mean (which we estimate by the avgs)
  • If we use avgj for yh, then the formulas match up
  • This is why SSTr has the factor of nj because the
    model is the same for each value in the group

23
Simple Regression
  • Exercises
  • 1. Calculate ANOVA for Fig 12.08, p. 538. Compare
    the ANOVA PV with the (two sided) PV for slope
  • 2. Repeat for Fig 12.11, p. 540

24
R2
  • In ANOVA table, we want SSR to be large and SSE
    to be small
  • They sum to SST, so we could calculate what
    percent of SST is in SSR and what percent is in
    SSE
  • R2 SSR/SST
  • Coefficient of determination
  • (Index of determination)
  • R2 large -gt points are near the line

25
R2
  • Recall that if we swap X and Y, then the slopes
    are NOT reciprocals of each other
  • If we multiply the two slopes together, we can
    see how different this is from 1
  • In our problem, orig slope0.4988
  • Swap X, Y, slope1.6080
  • Product 0.8021
  • R2 , amazingly enough

26
Correlation
  • Correlation can be thought of as a general notion
    of slope that does not depend on which way we use
    X, Y
  • See p 585, rSXY/?(SXXSYY)
  • Note no units
  • Also, symmetric in X, Y
  • Also, -1 lt r lt 1
  • (Cauchy-Schwartz inequality)

27
Correlation
  • R is how many SD one variable changes when the
    other variable changes 1 SD
  • SD(r)sqrt((1-r2)/df)
  • Test H0r0 vs Ha r not 0
  • gtgt rsxy/sqrt(sxxsyy)
  • 0.8956
  • gtgt tprob(0,sqrt((1-r2)/df),df,r,99)2
  • 8.1759e-005
  • Same PV as everything else!

28
Correlation
  • Also note that correlation2 R2
  • Could be used to show that correlation is always
    lt1

29
Indicators
  • What would happen if we didnt have an X
    variable?
  • Recall that we always add a col of 1s
  • Suppose thats the only variable
  • Xones(length(y),1)
  • Bx\y
  • B turns out to be the avg of Y
  • So averages are the least squares estimate of a
    constant
  • (Also turns out that the SD(b) SD(y)/vN )

30
Indicators
  • Usually think of regression as relating 2
    quantitative variables
  • Consider the case where X is an indicator
    variable (and we include the col of 1s)
  • X1 if the obs is from one group and X0 if the
    obs is from the other group

31
Indicators
  • Consider Fig9.42, p 413 (problem 9.3.20)
  • Let X1 for High level (2nd col)
  • B 87.4375
  • 1.9309
  • Sdslope 1.2892
  • F2.2434
  • Dfe33

32
Indicators
  • Compare to sumstats (add 1 to X)
  • 16.0000 87.4375 2.7318
  • 19.0000 89.3684 4.4995
  • (First row is Low level, 2nd row is High level)
  • Next, calculate Sp?1/n11/n21.2892
  • And Av2-Av11.9309

33
Indicators
  • Comparing these, we find
  • B0avg where X0
  • B1diff of avgs (X1 X0)
  • SdslopeSp ?1/n11/n2
  • Dfen1-1n2-1 df for t
  • Pvalues are the same

34
Indicators
  • This gives us THREE ways to do 2 sample t test
  • 1. Do t test
  • 2. Do ANOVA
  • 3. Do regression with indicator variables

35
Indicators
  • Why do we care?
  • Next chapter, we will be able to use multiple
    indicator variables
  • Will solve NEW problems for us then (as well as
    old ones)

36
Other than LS
  • The idea of minimizing sums of squares is not a
    natural one
  • We certainly want to minimize a non-negative
    quantity
  • Seems like we ought to treat negative es the
    same as positive es

37
Other than LS
  • One reason for LS is that it leads to known
    distns
  • In a certain way, LS leads to the normal distn
  • LS also leads to averages and sums, which will be
    approximately normal

38
Other than LS
  • MAD Minimum Absolute Deviation
  • We could fit the model by minimizing
  • S y yh
  • The soln to this problem is to try a bunch of
    lines and see which one works
  • Made easier by the fact that the MAD soln must
    pass thru 2 of our data pts.
  • Hard to do p-values, etc

39
Other than LS
  • Robust regression
  • We can think of LS as minimizing
  • S (y-yh) w(y-yh)
  • Where w() is a weighting fn
  • For LS, w(x)x
  • This says that the weight we give to a deviation
    constantly increases as the deviation increases

40
Other than LS
  • This results in a fit that really tries to avoid
    large deviations
  • But sometimes, we choose to ignore large
    deviations
  • Think they might be caused by other things
  • This suggests that, at some point, w() should
    stop increasing

41
Other than LS
  • We could either use w() that become constant or
    w() that actually asymptote back to 0
  • Then we need to decide how large a deviation
    should be before we start to give it less weight
  • And the size of the deviation depends on the
    model we fit
  • Chicken and egg

42
Other than LS
  • On the other hand, we see that w(x)x leads to LS
  • Also, w(x)sign(x) leads to MAD
Write a Comment
User Comments (0)
About PowerShow.com