ARCH 21266126 - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

ARCH 21266126

Description:

... in the sample we have collected is an estimate of it the sample statistic ... Usual symbols are: (rho) for population parameter; r for sample statistic ... – PowerPoint PPT presentation

Number of Views:492

Avg rating:3.0/5.0

Slides: 30

Provided by: anu9

Category:

more less

Transcript and Presenter's Notes

Title: ARCH 21266126

1
ARCH 2126/6126

Session 11 Relating variables to each other

2
Bivariate statistics

Considering 2 numerical variables observed on the
same set of cases simultaneously
For example, - length breadth of a set of
scrapers - height weight of a set of
people- spleen size malaria positivity-
mothers education level use of traditional
medicines

3
Or sometimes

Or some cases may have missing values for one
variable, which you want to estimate from the
other, e.g.
You want to estimate stature from femur length

4
Association of metric variables can be shown
visually

Here is a simple hypothetical example of a
positive association between variables
They increase together
Labels, units, caption, number, to be added

5
And similarly

Here is a simple hypothetical example of a
negative association between variables
As one increases, the other decreases

6
And again

Here is a simple hypothetical example of no
(significant) association between variables
There are other possibilities but lets consider
these

7
The approach is similar in many ways to univariate

In the population out there, there is a
measurable association (or lack of it) between
these two variables the population parameter
The association that we measure in the sample we
have collected is an estimate of it the sample
statistic
The H0 is normally no association

8
Last time correlation as a measure of association

Correlation (positive or negative) is a specific
term with a specific meaning
Usual measure is Pearson or product-moment
correlation coefficient
Usual symbols are ? (rho) for population
parameter r for sample statistic
This is a quantity we can calculate

9
Summary on calculation of r

r SPxy/v(SSxSSy), where-
SPxySxy ((Sx)(Sy)/n)
SSxS(x2) ((Sx)2/n)
SSyS(y2) ((Sy)2/n)
n number of cases
D.f. n - 2

10
Some points about r

Ranges from 1 through 0 to 1
No units
Extremes indicate perfect straight-line
variation, upwards or downwards unlikely to be
seen in real data
Intermediate values indicate how tightly the
points on a scatterplot cluster around a straight
line

11
Some further points

r2 tells us the proportion of the variance in one
variable that can be explained by straight-line
dependence on the other
Notice the stress on straight-line association r
does not automatically measure other forms of
association curvilinear, U- or J-shaped,
bimodal

12
Lets re-consider the same situation

We have, for a sample of n cases, the values for
each case of the 2 variables-x1, x2, x3, xn
and y1, y2, y3, yn

13
But now lets ask a differentquestion

Not- how strong is the association between
co-varying variables?
But- how can we state the mathematical
relationship between 2 variables?- how would you
predict one, knowing the other?

14
The appropriate approach for that question is
regression

Also deals with 2 variables in 1 sample
But the two are not treated in same way
Mathematically related to correlation
Statpacks will often give you both measures
together for a bivariate data set
Comes in a variety of versions, suited to
different situations

15
The relationship between variable x and variable y

x, by convention plotted on the horizontal axis,
is the independent, predictor, or controlled
variable
y, on the vertical axis, is the dependent or
response variable
This does not always literally mean a hypothesis
of x causes y
Linear regression finds the straight line that
best fits this relationship

16
Works in, but is not restricted to, experiments

There is an interest in explaining the dependent
(y) variable in terms of the independent (x) one
Thus ideally x can be controlled or at least
measured without error by researcher
Distinction depends on purpose of research, not
nature of variable

17
Simple examples

y x
y x 2
y x - 5
y x 10
y 2x
y ½x
y ½x 10
y 1.5x 3.5

All these equations symbolize straight lines that
could be graphed, by hand or by computer
In each case, they state a relationship whereby,
if you know x, you can work out y

18
y x - 5
y x
y x 2
y 2x
y x/2
y 1.5x 3.5
19
y x0 6
y -0.8x
y 10 x/2
y 3x/4 - 4
y -2-x/2
y 29x/10
20
The general form of the equation is y bx a,
where

b is the coefficient of x and governs the slope
of the linear relationship
If b is positive, the line rises from left to
right if negative, it falls
a is the value of y when x 0, i.e. where the
line crosses the y axis, is known as the y
intercept
If a is positive, the line crosses the y axis
above the origin if negative, below it

21
This is known as the regression line of y on x

When the points on a scatterplot are a cloud
rather than a row, how do we find the line of
best fit through it?
By statistics rather than by eye
We take the values of x as given try to
minimize the overall deviations (or residuals) of
y from the line the sum of squares of all the
residuals is least

22
(No Transcript)
23
So this is sometimes called a least-squares
regression

And again there is a not too complex formula for
it, based on the observations and the means for
each variable
We need eight quantities-means of x and yn?x
and ?x2?y and ?y2?xy

24
Lets do it all components are already familiar

Enter data as two columns
Sum each column to get Sx Sy
Also square each value, sum these, to get Sx2
Sy2
Also, for each case, multiply the two values,
sum the products, to get Sxy

25
Calculate sums of squaresas before

SSxS(x2) ((Sx)2/n)
SSyS(y2) ((Sy)2/n)
and sum of products as before
SPxySxy ((Sx)(Sy)/n)

26
Now we are ready for the new and final steps

b SPxy/SSx
a y-bar (b x-bar)
As a reminder the basic regression equation is
that
y bx a

27
Now we are in a position to

Review this in relation to r2 (calculated before)
to see how much variation the equation can
account for
Test null hypothesis that the slope is not
significantly different from 0 (by F or t test)
Predict values of y from x (with confidence
intervals)
Analyse residuals to check fit of model (scatter
should show no particular shape)

28
(No Transcript)
29
Variations on regression

There are other methods for defining the line of
least fit, within the overall simple linear
regression method
There are also more complex regression methods
which go beyond this one in a) fitting curves
rather than straight lines and b) having a
number of independent variables, not just one
i.e. multiple regression

Write a Comment

User Comments (0)