Matters arising - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Matters arising

Description:

Uncheck Auto and enter zero into the Custom slot. 18. Final version. 19 ... Data are obtained on 79 people, who are classified with respect to 2 attributes: 1. ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 72
Provided by: colin111
Category:
Tags: arising | matters

less

Transcript and Presenter's Notes

Title: Matters arising


1
Matters arising
  • Summary of last weeks lecture
  • The exercises
  • Your queries

2
The Pearson correlation (r)
  • The PEARSON CORRELATION is a measure of a
    supposed linear association between two
    variables.

3
Linear, but imperfect association
  • If the scatterplot is elliptical in shape, a
    linear association is indicated.
  • In psychology, all measurement is subject to
    random error.
  • No association between measured variables is ever
    perfect.
  • That is why the points do not all lie on a
    straight line.

4
The Pearson correlation
Sum of products
Sums of squares
5
Explanation
  • The numerator of r is known as a SUM OF PRODUCTS
    (SP).
  • It is the sum of products that captures the
    extent to which X and Y are associated, or
    CO-VARY.
  • The sums of squares in the denominator merely
    constrain the range of variation of r.

6
The sum of products captures covariation
  • Points in the upper right quadrant have positive
    deviation products points in the lower left also
    have positive deviation products (a minus times a
    minus is a plus).
  • Points in the other two quadrants have negative
    products.
  • Since the positive products predominate, we can
    expect the covariance to be very large.
  • The negative products are small the points are
    near the intersection of the mean lines.

Mean Actual Violence score
Mean Preference score
7
An elliptical scatterplot
  • This is fine.
  • The elliptical scatterplot indicates that there
    is indeed a basically linear relationship between
    variable Y1 and variable X1.

8
No association
  • There is NO association between Z and Y.
  • The high value of r is driven solely by the
    presence of a single OUTLIER.

9
Anscombes rule
  • When you examine a scatterplot (something you
    should ALWAYS do when interpreting a
    correlation), ask yourself the following
    question
  • Would the removal of one or two points at random
    affect the basically ellipical shape of the
    scatterplot? If the shape would remain
    essentially the same, the value of r accurately
    reflects the association between the variables.

10
Summary
  • The Pearson correlation r is a measure of the
    strength of a SUPPOSED linear relationship
    between 2 variables.
  • It is one of the most widely used of statistical
    measures but it is also one of the most misused.
  • You should always try to see the scatterplot when
    interpreting a value of r.

11
Exercise
  • From the Violence data, obtain a scatterplot and
    calculate the Pearson correlation.

12
Direction of causation
  • When we measure and obtaining the correlation
    between two variables we nearly always do so
    because we believe that one variable X causes or
    influences the other Y.
  • We have measured Exposure X and Violence Y
    because we have the hypothesis that X causes Y.

13
The scatterplot of Y against X
  • If we believe that X causes Y, we want to PLOT Y
    AGAINST X .
  • We want a scatterplot with Y on the vertical axis
    and X on the horizontal axis.

Richard
John
Jim
14
Ordering the plot
15
The default graph
16
The vertical scale
  • Notice that the vertical axis begins at 3, rather
    than at zero.
  • I like to see the whole scale on the vertical
    axis.
  • Double-click on the graph to enter the Chart
    Editor.
  • Double-click on the vertical axis to enter a
    dialog which will enable you to control the
    amount of the vertical scale that you can see.

17
Ordering the full Y scale
Uncheck Auto and enter zero into the Custom slot.
18
Final version
19
Why do I like to see the entire scale on
the vertical axis?
20
Beware!
  • Modern computing packages such as SPSS afford a
    bewildering variety of attractive graphs and
    displays to help you bring out the most important
    features of your results. You should certainly
    use them.
  • But there are pitfalls awaiting the unwary.

21
Performance profiles
  • We often want to see how mean performance varies
    (or not) over various treatment conditions.
  • We might want to compare the performance of
    participants who have ingested different kinds
    (or dosages) of drugs with that of a comparison
    or control group.
  • There is a set of methods known as Analysis of
    Variance (ANOVA) which enable us to do that.

22
Ordering a means plot
23
A picture of the results
24
The picture is false!
  • The table of means shows miniscule differences
    among the five group means!
  • The graph suggested that there were vast
    differences among the means!

25
A small scale view
  • Only a microscopically small section of the scale
    is shown on the vertical axis.
  • This greatly magnifies even small differences
    among the group means.

26
Putting things right
  • Double-click on the image to get into the Graph
    Editor.
  • Double-click on the vertical axis to access the
    scale specifications.

Click here
27
Putting things right
  • Uncheck the minimum value box and enter zero as
    the desired minimum point.
  • Click Apply.

Amend entry
28
The true picture!
29
The true picture
  • The effect is dramatic.
  • The profile now reflects the true situation.
  • ALWAYS BE SUSPICIOUS OF GRAPHS THAT DO NOT SHOW
    THE COMPLETE VERTICAL SCALE!

30
Your queries
  • Several of you have e-mailed me asking how you
    fit a line graph to a scatterplot.
  • Last week, I said that an elliptical scatterplot
    indicated that the relationship between the
    variables was basically LINEAR.
  • So we want the best-fitting straight line through
    the points.
  • This is known as the REGRESSION LINE.

31
Drawing the regression line through the points
Choose Fit Line at Total.
To leave the Chart Editor, choose Close from the
Edit menu or double-click on the Viewer outside
the rectangle around the figure.
32
Finding the value of r
33
Hypothesis testing
  • In HYPOTHESIS TESTING, a proposition known as the
    NULL HYPOTHESIS (H0) is set up.
  • H0 is the NEGATION of your scientific hypothesis.
  • So if our scientific hypothesis is that there is
    an association, H0 says theres NO association.

34
The p-value
  • To test H0, we gather our data and calculate the
    value of a TEST STATISTIC.
  • If the null hypothesis is true, how probable
    would a value of our test statistic as extreme as
    ours have been?
  • The answer is given by a probability known as the
    p-value.
  • SPSS calls the p-value the Sig., i.e., the
    SIGNIFICANCE PROBABILITY.

35
A significant result
  • A SIGNIFICANCE LEVEL is a small probability
    accepted by convention as a criterion for a
    decision about a statistical test.
  • Most commonly, the 0.05 significance level is
    accepted by psychologists.
  • If the p-value of your test statistic is LESS
    than the 0.05 significance level, your result is
    said to be significant beyond the 0.05 level.

36
The result
The p-value
Never report a p-value like this! Report the
p-value to 2 places of decimals if its less
than .01, use the inequality sign lt.
  • Report this result as follows
  • r(27) 0.89 p lt .01

Number of pairs value of r
p-value
37
Lecture 9MORE ON ASSOCIATION
38
We have shown that there is a strong
association between a childs violence and the
amount of violent screen material watched
39
but have we really gathered evidence for
the hypothesis that exposure to screen violence
promotes actual violence?
40
Remember
  • CORRELATION
  • does not necessarily mean
  • CAUSATION

41
One causal model
  • The hypothesis implies this CAUSAL MODEL.
  • The results are CONSISTENT with the hypothesis.
  • The correlation may indeed arise because exposure
    to violence causes actual violence.

42
Another causal model
  • The childs violent tendencies towards and
    appetite for violence lead to his (or her)
    watching violent programmes as often as possible.
  • This model is also consistent with the data.

43
A third causal model
  • NEITHER variable causes the other.
  • Both are determined by the behaviour of the
    childs parents.

44
The choice
  • Does exposure cause violence (top model)?
  • Does Violence lead to more exposure (middle
    model)?
  • Are both exposure and violence caused by a third,
    background, variable (bottom model)?

45
A background variable
  • Perhaps neither Exposure nor Actual violence
    cause one another.
  • Perhaps they are caused by a background parental
    behaviour variable.
  • We have data on such a variable.
  • The background variable correlates highly with
    both Exposure and Actual violence.

46
Partial correlation
  • A PARTIAL CORRELATION is what remains of a
    Pearson correlation between two variables when
    the influence of a third variable has been
    removed, or PARTIALLED OUT.

47
Three variables
  • Let X1, X2 and X3 be three variables.
  • Let r12 be the Pearson correlation between X1 and
    X2.
  • Let r(12.3) be the partial correlation between
    X1 and X2 when the covariation of each with X3
    has been removed.

48
Partial correlation
49
Explanation
Removes the influence of the third variable.
Rescales with new variances, so that the range is
as below.
50
Obtaining a partial correlation
51
The partial correlation
  • The partial correlation fails to reach
    significance.
  • Now that we have taken the background variable
    into consideration, we see that there is no
    significant correlation between Exposure and
    Actual violence.
  • It appears that, of the three possible causal
    models, the third party model gives the most
    convincing account of the data.

52
Levels of measurement
  • There are three levels
  • 1. The SCALE level. The data are measures on an
    independent scale with units. Heights, weights,
    performance scores and IQs are scale data. Each
    score has stand-alone meaning.
  • 2. The ORDINAL level. Data in the form of RANKS
    (1st, 3rd, 53rd). A rank has meaning only in
    relation to the other individuals in the sample.
    A rank does not express, in units, the extent to
    which a property is possessed.
  • 3. The NOMINAL level. Assignments to categories
    (so-many males, so-many females.)

53
3. Nominal data
  • NOMINAL data relate to qualitative variables or
    attributes, such as gender or blood group, and
    are merely records of CATEGORY MEMBERSHIP.
  • Nominal data are merely LABELS they may take the
    form of numbers, but such numbers are arbitrary
    code numbers representing, say, the different
    blood groups or different nationalities. ANY
    numbers will do, as long as they are all
    different.

54
A set of nominal data
  • A medical researcher wishes to test the
    hypothesis that people with a certain type of
    body tissue (Critical) are more likely to show
    the presence of a potentially harmful antibody.
  • Data are obtained on 79 people, who are
    classified with respect to 2 attributes
  • 1. Tissue Type
  • 2. Whether the antibody is present or absent.

55
The research question
  • Do more of the people in the critical group have
    the antibody?
  • We are asking whether there is an ASSOCIATION
    between the variables of category membership
    (tissue type) and presence/absence of the
    antibody.
  • This is the SCIENTIFIC hypothesis.

56
The null hypothesis
  • The NULL HYPOTHESIS is the negation of the
    scientific hypothesis.
  • The null hypothesis states that there is NO
    association between tissue type and presence of
    the antibody.

57
Contingency tables (cross-tabulations)
  • When we wish to investigate whether an
    association exists between qualitative or
    categorical variables, the starting point is
    usually a display known as a CONTINGENCY TABLE,
    whose rows and columns represent the categories
    of the qualitative variables we are studying.
  • Contingency tables are also known as
    CROSS-TABULATIONS, or CROSSTABS.

58
The contingency table
  • Is there an association between Tissue Type and
    Presence of the antibody?
  • It looks as if the antibody is indeed more in
    evidence in the Critical tissue group.

59
The null hypothesis
  • The null hypothesis is the negation of our
    scientific hypothesis, namely, the statement that
    the two variables are INDEPENDENT.
  • In other words, any differences in the relative
    incidence of the antibody in the different tissue
    groups have resulted from SAMPLING ERROR.

60
Expected cell frequencies
  • The pattern of the OBSERVED FREQUENCIES (O) would
    suggest that there is a greater incidence of the
    antibody in the Critical tissue group.
  • But the marginal totals showing the frequencies
    of the various groups in the sample also vary.
  • What cell frequencies would we expect under the
    independence hypothesis?

61
Expected cell frequencies (E)
  • According to the null hypothesis, the joint
    occurrence of the antibody and a particular
    tissue type are independent events.
  • The probability of the joint occurrence of
    independent events is the product of their
    separate probabilities.
  • We find the expected frequencies (E) by
    multiplying together the marginal totals that
    intersect at the cells concerned and dividing by
    the total number of observations.

62
The expected frequencies
  • To obtain, say, the value of E for the top left
    cell, multiply the intersecting marginal totals
    (36 and 22) and divide by 79 (the total
    frequency), obtaining
  • (3622)/79 10.03 .
  • In the Critical group, there seem to be large
    differences between O and E fewer Nos than
    expected and more Yess.

63
The chi-square (?2) statistic
  • We need a statistic which compares the
    differences between the O and E, so that a large
    value will cast doubt upon the null hypothesis of
    independence.
  • Such a statistic is CHI-SQUARE (?2).

64
Formula for chi-square
  • The element of chi-square expresses the square of
    the difference between O and E as a proportion of
    E.
  • Add up these squared differences for all the
    cells in the contingency table.

65
The value of chi-square
  • There are 8 terms in the summation, but only the
    first two and the last are shown in the
    calculation below.

66
Degrees of freedom
  • To decide whether a given value of chi-square is
    significant, we must specify the DEGREES OF
    FREEDOM df.
  • If a contingency table has R rows and C columns,
    the degrees of freedom is given by
  • df (R 1)(C 1)
  • In our example, R 4, C 2 and so
  • df (4 1)(2 1) 3.

67
Significance
  • SPSS will tell us that the p-value of a
    chi-square with a value of 10.655 in the
    chi-square distribution with three degrees of
    freedom is .014.
  • We should write this result as
  • ?2(3) 10.66 p .01 .
  • Since the result is significant beyond the .05
    level, we have evidence against the null
    hypothesis of independence and evidence for the
    scientific hypothesis.

68
Summary
  • This week I extended my discussion of statistical
    association to the topic of partial correlation.
  • A partial correlation can help the researcher to
    choose from different causal models.
  • I also considered the analysis of nominal data in
    the form of contingency tables.
  • The chi-square statistic can be used to test for
    the presence of an association between
    qualitative or categorical variables.

69
Multiple-choice example
70
Multiple-choice example
71
Another example
Write a Comment
User Comments (0)
About PowerShow.com