Correlation 2 - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Correlation 2

Description:

Correlation 2 Computations, and the best fitting line. Computing r from a more realistic set of data A study was performed to investigate whether the quality of an ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 48
Provided by: RobertK170
Category:

less

Transcript and Presenter's Notes

Title: Correlation 2


1
Correlation 2
  • Computations, and the best fitting line.

2
Computing r from a more realistic set of data
  • A study was performed to investigate whether the
    quality of an image affects reading time.
  • The experimental hypothesis was that reduced
    quality would slow down reading time.
  • Quality was measured on a scale of 1 to 10.
    Reading time was in seconds.

3
Quality vs Reading Time data Compute the
correlation
Quality (scale 1-10) 4.30 4.55 5.55 5.65 6.30 6.45
6.45
Reading time (seconds) 8.1 8.5 7.8 7.3 7.5 7.3 6.0
Is there a relationship? Check for
linearity. Compute r.
4
Calculate t scores for X
X 4.30 4.55 5.55 5.65 6.30 6.45 6.45
5
Calculate t scores for Y
tY (Y - Y) / sY 0.76 1.26 0.38 -025
0.00 -0.25 -1.89
Y 8.1 8.5 7.8 7.3 7.5 7.3 6.0
(Y - Y)2 0.36 1.00 0.09 0.04 0.00 0.04 2.25
Y - Y 0.60 1.00 0.30 -0.20 0.00 -0.20 -1.50
?Y52.5 n 7 Y7.50
MSW 3.78/(7-1) 0.63
sY 0.79
6
Plot t scores
tY 0.76 1.28 0.39 -0.25 0.00 -0.25 -1.89
tX -1.48 -1.19 -0.07 0.05 0.78 0.95 0.95
7
t score plot with best fitting line linear? YES!
8
Calculate r
tY 0.76 1.28 0.39 -0.25 0.00 -0.25 -1.88
tX -1.48 -1.19 -0.07 0.05 0.78 0.95 0.95
tY -tX -2.24 -2.47 -0.46 0.30 0.78 1.20 2.83
(tY -tX)2 5.02 6.10 0.21 0.09 0.61 1.44 8.01
? (tX - tY)2 / (nP - 1) 3.580
r 1 - (1/2 3.580)
1 - 1.79 -0.790
9
Best fitting line
10
The definition of the best fitting line plotted
on t axes
  • A best fitting line minimizes the average
    squared vertical distance of Y scores in the
    sample (expressed as tY scores) from the line.
  • The best fitting line is a least squares,
    unbiased estimate of values of Y in the sample.
  • The generic formula for a line is Ymxb where m
    is the slope and b is the Y intercept.
  • Thus, any specific line, such as the best fitting
    line, can be defined by its slope and its
    intercept.

11
The intercept of the best fitting line plotted on
t axes
  • The origin is the point where both tX and
    tY0.000
  • So the origin represents the mean of both the X
    and Y variable
  • When plotted on t axes all best fitting lines go
    through the origin.
  • Thus, the tY intercept of the best fitting line
    0.000

12
The slope of and formula for the best fitting line
  • When plotted on t axes the slope of the best
    fitting line r, the correlation coefficient.
  • To define a line we need its slope and Y
    intercept
  • r the slope and tY intercept0.00
  • The formula for the best fitting line is
    therefore tYrtX 0.00 or tY rtX

13
Heres how a visual representation of the best
fitting line (slope r, Y intercept 0.000) and
the dots representing tX and tY scores might be
described. (Whether the correlation is positive
of negative doesnt matter.)
  • Perfect - scores fall exactly on a straight
    line.
  • Strong - most scores fall near the line.
  • Moderate - some are near the line, some not.
  • Weak - the scores are only mildly linear.
  • Independent - the scores are not linear at all.

14
Strength of a relationship
15
Strength of a relationship
16
Strength of a relationship
Moderate r about .500
17
Strength of a relationshipr about 0.000
18
r.800, the formula for the best fitting line
???
19
r-.800, the formula for the best fitting line
???
20
r0.000, the formula for the best fitting line is
21
Notice what that formula for independent
variables says
  • tY rtX 0.000 (tX) 0.000
  • When tY 0.000, you are at the mean of Y
  • So, when variables are independent, the best
    fitting line says that the best estimate of Y
    scores in the sample is back to the mean of Y
    regardless of your score on X
  • Thus, when variables are independent we go back
    to saying everyone will score right at the mean

22
A note of caution Watch out for the plot for
which the best fitting line is a curve.
23
Confidence intervals around rhoT relation to
Chapter 6
  • In Chapter 6 we learned to create confidence
    intervals around muT that allowed us to test a
    theory.
  • To test our theory about mu we took a random
    sample, computed the sample mean and standard
    deviation, and determined whether the sample mean
    fell into that interval.
  • If it did not, we had shown the theory that led
    us to predict muT was false.
  • We then discarded the theory and muT and used the
    sample mean as our best estimate of the true
    population mean.

24
If we discard muT, what do we use as our best
estimate of mu?
  • Generally, our best estimate of a population
    parameter is the sample statistic that estimates
    it.
  • Our best estimate of mu has been and is the
    sample mean, X-bar.
  • Since we have discarded our theory, we went back
    to using X-bar as our best (least squares,
    unbiased, consistent estimate) of mu.

25
More generally, we can test a theory (hypothesis)
about any population parameter using a similar
confidence interval.
  • We theorize about what the value of the
    population parameter is.
  • We get an estimate of the variability of the
    parameter
  • We construct a confidence interval (usually a 95
    confidence interval) in which our hypothesis says
    that the sample statistic should fall.
  • We obtain a random sample and determine whether
    the sample statistic falls inside or outside our
    confidence interval

26
The sample statistic will fall inside or outside
of the CI.95
  • If the sample statistic falls inside the
    confidence interval, our theory has received some
    support and we hold on to it.
  • But the more interesting case is when the sample
    statistic falls outside the confidence interval.
  • Then we must discard the theory and the theory
    based estimate of the population parameter.
  • In that case, our best estimate of the population
    parameter is the sample statistic
  • Remember, the sample statistic is a least
    squares, unbiased, consistent estimate of its
    population parameter.

27
We are going to do the same thing with a theory
about rho
  • rho is the correlation coefficient for the
    population.
  • If we have a theory about rho, we can create a
    95 confidence interval into which we expect r
    will fall.
  • An r computed from a random sample will then fall
    inside or outside the confidence interval.

28
When r falls inside or outside of the CI.95
around rhoT
  • If r falls inside the confidence interval, our
    theory about rho has received some support and we
    hold on to it.
  • But the more interesting case is when r falls
    outside the confidence interval.
  • Then we must discard the theory and the theory
    based estimate of the population parameter.
  • In that case, our best estimate of rho is the r
    we found in our random sample
  • Thus, when r falls outside the CI.95 we can go
    back to using it as a least squares unbiased
    estimate of rho.

29
Chapter 7 slides end here
  • Rest of slides are for other chapters and should
    not be reviewed here.
  • RK 10/24

30
Why is it so important to determine whether r
fits a theory
  • In Chapter 8 we go on to predict values of Y from
    values of X and r.
  • The formula we use is called the regression
    equation, it is very much like the formula for
    the best fitting line.
  • The only difference is that the best fitting line
    describes the relationship among the Y scores in
    the sample.
  • But in Chapter 8 we move to predicting scores for
    people who are in the population from which the
    sample was drawn, but not in the sample.

31
Thats dangerous.
  • Let me give you an example.

32
Assume, you are the personnel officer for a mid
size company.
  • You need to hire a typist.
  • There are 2 applicants for the job.
  • You give the applicants a typing test.
  • Which would you hire someone who types 6 words a
    minute with 12 mistakes or someone who types 100
    words a minute with 1 mistake.

33
Who would you hire?
  • Of course, you would predict that the second
    person will be a better typist and hire that
    person.
  • Notice that we never gave the person with 6
    words/minute a chance to be a typist in our firm.
  • We prejudged her on the basis of the typing test.
  • That is probably valid in this case a typing
    test probably predicts fairly well how good a
    typist someone will be.

34
But say the situation is a little more
complicated!
  • You have several applicants for a leadership
    position in your firm.
  • But it is not 2002, it is 1957, when we knew that
    only white males were capable of leadership in
    corporate America.
  • That is, we all know that leadership ability is
    correlated with both gender and skin color, white
    and male are associated with high leadership
    ability and darker skin color and female gender
    with lower leadership ability.
  • We now know this is absurd, but lots of people
    were never

35
Confidence intervals around muT
36
Confidence intervals and hypothetical means
  • We frequently have a theory about what the mean
    of a distribution should be.
  • To be scientific, that theory about mu must be
    able to be proved wrong (falsified).
  • One way to test a theory about a mean is to state
    a range where sample means should fall if the
    theory is correct.
  • We usually state that range as a 95 confidence
    interval.

37
  • To test our theory, we take a random sample from
    the appropriate population and see if the sample
    mean falls where the theory says it should,
    inside the confidence interval.
  • If the sample mean falls outside the 95
    confidence interval established by the theory,
    the evidence suggests that our theoretical
    population mean and the theory that led to its
    prediction is wrong.
  • When that happens our theory has been falsified.
    We must discard it and look for an alternative
    explanation of our data.

38
For example
  • For example, lets say that we had a new
    antidepressant drug we wanted to peddle. Before
    we can do that we must show that the drug is
    safe.
  • Drugs like ours can cause problems with body
    temperature. People can get chills or fever.
  • We want to show that body temperature is not
    effected by our new drug.

39
Testing a theory
  • Everyone knows that normal body temperature for
    healthy adults is 98.6oF.
  • Therefore, it would be nice if we could show that
    after taking our drug, healthy adults still had
    an average body temperature of 98.6oF.
  • So we might test a sample of 16 healthy adults,
    first giving them a standard dose of our drug
    and, when enough time had passed, taking their
    temperature to see whether it was 98.6oF on the
    average.

40
Testing a theory - 2
  • Of course, even if we are right and our drug has
    no effect on body temperature, we wouldnt expect
    a sample mean to be precisely 98.600000
  • We would expect some sampling fluctuation around
    a population mean of 98.6oF.
  • So, if our drug does not cause change in body
    temperature, the sample mean should be close to
    98.6. It should, in fact, be within the 95
    confidence interval around muT, 98.6.
  • SO WE MUST CONSTRUCT A 95 CONFIDENCE INTERVAL
    AROUND 98.6o AND SEE WHETHER OUR SAMPLE MEAN
    FALLS INSIDE OR OUTSIDE THE CI.

41
To create a confidence interval around muT, we
must estimate sigma from a sample.
  • We randomly select a group of 16 healthy
    individuals from the population.
  • We administer a standard clinical dose of our new
    drug for 3 days.
  • We carefully measure body temperature.
  • RESULTS We find that the average body
    temperature in our sample is 99.5oF with an
    estimated standard deviation of 1.40o (s1.40).
  • IS 99.5oF. IN THE 95 CI AROUND MUT???

42
Knowing s and n we can easily compute the
estimated standard error of the mean.
  • Lets say that s1.40o and n 16
  • 1.40/4.00
    0.35
  • Using this estimated standard error we can
    construct a 95 confidence interval for the body
    temperature of a sample of 16 healthy adults.

43
We learned how to create confidence intervals
with the Z distribution in Chapter 4. 95 of
sample means will fall in a symmetrical interval
around mu that goes from 1.960 standard errors
below mu to 1.960 standard errors above mu
  • A way to write that fact in statistical language
    is
  • CI.95 mu ZCRIT sigmaX-bar or
  • CI.95 mu - ZCRIT sigmaX-bar lt X-bar lt mu
    ZCRIT sigmaX-bar
  • For a 95 CI, ZCRIT 1.960

44
  • But when we must estimate sigma with s, we must
    use the t distribution to define critical
    intervals around mu or muT.
  • Here is how we would write the formulae
    substituting t for Z and s for sigma
  • CI95 muT tCRIT sX-bar or
  • CI.95 muT - tCRIT sX-bar lt X-bar lt muT tCRIT
    sX-bar
  • Notice that the critical value of t that includes
    95 of the sample means changes with the number
    of degrees of freedom for s, our estimate of
    sigma, and must be taken from the t table.
  • If n 16 in a single sample, dfWn-k15.

45
df 1 2 3 4 5 6 7 8 .05
12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 .
01 63.657 9.925 5.841 4.604 4.032 3.707 3.499
3.355 df 9 10 11 12 13 14 15 16 .05
2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 .0
1 3.250 3.169 3.106 3.055 3.012 2.997 2.947 2.
921 df 17 18 19 20 21 22 23 24 .05
2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 .0
1 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.
797 df 25 26 27 28 29 30 40 60 .05
2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000 .0
1 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.
660 df 100 200 500 1000 2000 10000 .05
1.984 1.972 1.965 1.962 1.961 1.960 .01
2.626 2.601 2.586 2.581 2.578 2.576
46
So, muT98.6, tCRIT2.131, s1.40, n16Here is
the confidence interval
  • CI.95 muT tCRIT sX-bar
  • 98.6 (2.131)(1.40/ )
  • 98.6 (2.131)(1.40/4)
  • 98.6 (2.131)(0.35) 98.60 0.75
  • CI.95 97.85 lt X-bar lt 99.35
  • Our sample mean fell outside the CI.95 and
    falsifies the theory that our drug has no effect
    on body temperature. Our drug may cause a slight
    fever.

47
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com