Title: Regression Lecture 8
 1Regression Lecture 8 
 2Aims for Today - Regression
- Drawing lines on scatterplots 
- The regression line Predicting values 
- Correlation 
- Ranked based correlation 
- Break/Handout 
- Examples by Dan 
-  Chile and maybe being hit by a car 
- How tos 
3(No Transcript) 
 4Scatter Plot
- Plotting 2 continuous-ish variables 
- Exploring their association 
- One of the most used and most useful techniques 
 in science.
5(No Transcript) 
 6Several ways to make in SPSS. 
 7Default shows what appears to be a negative 
relationship, but the graphs can be improved. 
 8(No Transcript) 
 9(No Transcript) 
 10(No Transcript) 
 11(No Transcript) 
 12(No Transcript) 
 13(No Transcript) 
 14Graphing 3 Variables (London et al., 2007)
-  4- to 9-year olds 
-  2 week recall 
-  10 month recall
15Can you see the 8s? 
 16(No Transcript) 
 17(No Transcript) 
 18Is this the right approach?
- Fitting a straight line (a linear relationship)
19Finding the Regression Line
- Very general procedure (easily expanded) 
- Simple linear regression 
- Easiest way is just to draw a straight line 
 yourself
- A more formal method has some value 
-  
-  and finding the ß0 and ß1 which minimize Sei2 
 
- Least Squares is also used in t test and mean 
-  (least absolute value is used for the median)
20(No Transcript) 
 21- minimizing the squared residuals min Sei2 
- Is least squares regression 
- better than eyeballing it? 
- Are there better formal methods? 
22Do you need to know the equations for ß0 and 
ß1? Not reallyWould they be worth seeing 
once? Probably
just look at, don't write 
 23Regressions sometimes used to predict values
(data based on Tytherleigh, 2002) 
 24Running a regression in R
lm is for Linear Model 
 25(No Transcript) 
 26r2 or adjusted r2and r or R 
 27(No Transcript) 
 28Assessing the Fit The Correlation 
 29Equation bit
- Top part determines whether positive or negative. 
 If xi and yi are same side as their means,
 positive, otherwise negative.
- If as one goes up, the other goes up, positive.
30Correlation Strength of the linear relationship
- Can get to it in several ways. 
- The correlation squared in the proportion of 
 shared variance.
- The correlation can range only from -1 to 1. 
- Does a correlation between x and y mean x caused 
 y?
- Does a correlation between x and y mean that 
 there is some causal relationship in the network
 of hypotheses that include x and y?
- Are the most parsimonious ones x -gt y and y -gt x?
31Significance Testing
- H0 ? (rho)  0 
- Almost always use two tailed tests 
- You must know the sample size 
- r  0.1 is significant with n500 at 5 
- r  0.4 is not significant with n20 at 5 
- (Cohen sizes .1 small, .3 medium, .5 large) 
32Significance Testing and Confidence IntervalsThe 
equations
with df  n - 2, and df  1, n - 2 
 33Making Confidence Intervals
-  Several programs on web. http//glass.ed.asu.edu/
 stats/analysis/rci.html
34(No Transcript) 
 35Notice the Normal and Basic Bootstraps give 
impossible upper bounds.
BCa very similar to asymptotic methods 
 36(No Transcript) 
 37(No Transcript) 
 38(No Transcript) 
 39(No Transcript) 
 40r  .64, w/o outlier r  .92, w/o influential 
point r  .38 
 41Assumptions for Significance
- Random sampling 
- It must make sense to talk about the response 
 variable (the DV) as being continuous.
- No weird patterns (or non-linear in general) in 
 residuals. Variance of residual homoscedastic
 (ie., not varying by other variables -
 heteroscedastic)
- Examination of outliers
42What to do if assumptions not meet(to get data, 
install and load mrt. data(crime) and attach) 
 43Ranked based Correlation
- Spearman's rho 
- Rank the data and use Pearson's  stuff for ties. 
- r  .94 and Spearman's rS  .78. 
44In SPSS and R just tick a box or change the method
Doesn't print confidence interval 
 45- Same correlation estimate. 
- But the CI really does meet appropriate 
 assumptions.
46(No Transcript) 
 47Break Time
- Short break, we have a lot to get through 
 afterwards.
- In 4 groups 
- Look at the handout that I am about to give you. 
 Discuss how you would report your findings in a
 scientific journal versus People magazine. Are
 there any other statistics you would want to do?
- Talk about what you wrote for Suppose an 
 undergraduate said "Since it is for looking at
 differences among means, why is it called an
 Analysis of Variance?"
-  
48Some Examples
- Chile Heat To discuss re-expression and what to 
 do with outliers.
- Automobile Accidents To discuss using theory to 
 guide your statistics.
49Are smaller chiles hotter?
- How to measure length and heat. 
- Length skewed
50Testing Normality 
 51par(mfrowc(1,2)) qqnorm(LENGTH) 
qqline(LENGTH)qqnorm(log(LENGTH2.54))qqline(log
(LENGTH2.54)) par(mfrowc(1,1)) 
 52Measuring Heat Scoville units or the number of 
chiles? 
 53(No Transcript) 
 54(No Transcript) 
 55Command Summary
- r1 lt- lm(HEATLENGTH) 
- r2 lt- lm(HEATLENGTHlt30LENGTHLENGTHlt30) 
- r3 lt- lm(HEAT  log(LENGTH  2.54)) 
-  
56(No Transcript) 
 57(No Transcript) 
 58plot(r1)Nu Mex is hotter than predicted for 
its length 
 59What to do with
-  
-  Genetically 
-  engineered. 
-  
-  Depends on the population and purpose.
60What is a "linear model"
Y  ßX  e
Don't worry if you dislike matrix notation 
 61(No Transcript) 
 62Vehicle-Pedestrian Accidents
- What is the relationship between the impact 
 velocity of a vehicle and the throw of a
 pedestrian?
- A lot is known about how a body should move when 
 hit by a car at a certain velocity.
- Good reason to suggest throwi  k vi2  ei 
-  Dan will glance around to see if anyone 
 looks interested in "why" this equation makes
 sense, and may skip the next two slides.
63Why Theoretical Sense?
-  Body takes on impact 
-  horizontal velocity of 
-  the car, v, at an angle 
-  above the horizontal. 
- Vertical velocity vy  v sin ? 
- Horizontal velocity vx  v cos ? 
-  Time in air, t, is related only to vy. t  2 
 vy / g, where g is the constant for gravity on
 Earth, about 10m/s2..
64- Without friction, vx is constant and thus throw 
 should be
- and if ? is the same for all cars throw  v2 k, 
 where k is a constant.
- Thus, throwi  k vi2  ei 
- Simpler Only 1 unknown (k) to solve for AND it 
 has some empirical meaning
65Wood, Simms  Walsh (2005) 
 66Otte's work with crash test dummies 
 67- reglin lt- lm(distance  speed) 
- regpoly lt- lm(distance  speed  spsq) 
- regmodel lt- lm(distance  spsq - 1) 
This can done in SPSS too. Tick 
no intercept/constant. 
 68(No Transcript) 
 69Summary 
 70This week's journal
-  Try help(par) 
-  Write an equation in Word 
-  Access these data from web fishstock in .dat 
 (use read.table) or .sav (use SPSS or read.spss)
-  Variables are ocean (how much winter low 
 temperature is above freezing in Celsius) and
 fishstock (gt 2cm in
-  thousands per cubic kilometer). 
-  What are the correlation and the regression 
 equation?
-  Write a sentence about the results. 
71(No Transcript)