Sociology 690 - PowerPoint PPT Presentation

About This Presentation
Title:

Sociology 690

Description:

Sociology 690 Data Analysis Simple Quantitative Data Analysis Four Issues in Describing Quantity 1. Grouping/Graphing Quantitative Data 2. – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 26
Provided by: Jeraldf4
Learn more at: http://www.csun.edu
Category:

less

Transcript and Presenter's Notes

Title: Sociology 690


1
Sociology 690 Data Analysis
  • Simple Quantitative
  • Data Analysis

2
Four Issues in Describing Quantity
  • 1. Grouping/Graphing Quantitative Data
  • 2. Describing Central Tendency
  • 3. Describing Variation
  • 4. Describing Co-variation

3
1. Grouping Quantitative Data
If there are a large number of quantitative
scores, one would not simply create a raw score
frequency distribution, as that would contain too
many unique scores and, therefore, not fulfill
the data reduction goal.
  • Intervals and Real Limits
  • Widths and midpoints
  • Graphing grouped data

4
Grouping Data - Intervals
  • To group quantitative data, three rules are
    followed
  • 1. Make the intervals no greater than the most
    amount of information you are willing to lose.
  • 2. Make the intervals in multiples of five.
  • 3. Make the distribution intervals few enough to
    be internalized at a glance.

5
Grouping Data Intervals Example
  • If these are the scores on a midterm
  • 9,13,18,19,22,25,31,34,35,36,36,38,41,43,44,45
  • The corresponding grouped frequency distribution
    would look like
  • i fi
  • 01-10 1
  • 11-20 3
  • 21-30 2
  • 31-40 6
  • 41-50 4
  • Total 16

6
Grouping Data - Real Limits
  • This implies the need for real limits as there
    are gaps in these intervals. The real limits
    of an interval are characterized by numbers that
    are plus and minus one-half unit on each side of
    stated limits
  • For example
  • the interval 11-20 becomes 10.5 20.5
  • the interval 3.5 4.5 becomes 3.45 4.55

7
Grouped Data Width and Midpoint
  • The width of an interval is simply the difference
    between the upper and lower real limits.
  • e.g. 11-20 ? 20.5 10.5 10
  • The midpoint is determined by calculating the
    interval width, dividing it by 2, and adding that
    number to the lower real limit.
  • e.g. 10/2 10.5 15.5

8
Graphing Grouped Data
  • A Quantitative version of a bar graph is called
    an Histogram
  • When the frequencies are connected via a
    line, it is call a frequency polygon

9
2. Describing Central Tendency
But we can do more than simply create a frequency
distribution. We can also describe how these
observations bunch up and how they
distribute. Describing how they bunch up
involves measures of
  • Modes
  • Medians
  • Means
  • Skew

10
Central Tendency - Modes
  • The mode for raw data is simply the most frequent
    score e.g. 2,3,5,6,6,8. The mode is 6.
  • The mode for grouped data is the midpoint of the
    interval containing the highest frequency
    (35.5 here)

i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
11
Central Tendency - Medians
  • The median for raw data is simply the score
    at the middle position. This involves taking the
    (N1)/2 position and stating the associated value
    attached to it
  • e.g. 2,3,5,6,8 (51)/2 ? the third position
    score
  • The third position score is 5.
  • e.g. 2,3,5,8 (41)/2 ? the 2.5 position
    score
  • The 2.5 position score is (35)/2 4

12
Medians for Grouped Data
  • The median for grouped data is
  • For our previous distribution of scores,
  • the answer would be
  • 30.5 ((16/2-6)/6)10
  • 30.5 3.33 33.83

i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
13
Central Tendency - Mean
  • For raw data, the mean is simply the sum of the
    values divided by N
  • Suppose Xi 2,3,5,6
  • The mean would be 16/4 4

14
Means for Grouped Data
  • For grouped data, the mean would be the sum of
    the frequencies times midpoints for each
    interval, that sum divided by N
  • For our previous distribution,
  • the answer would be
  • i fi
  • 01-10 1
  • 11-20 3
    1(5.5)3(15.5)2(25.5)6(35.5)
  • 21-30 2 4(45.5)
    498 / 16 31.125
  • 31-40 6
  • 41-50 4
  • Total 16

15
3. Describing Variation
  • Range
  • Mean Deviation
  • Variance
  • Standard Scores (Z score)

16
Describing Variation - Range
  • The Range for raw scores is the highest minus the
    lowest score, plus one (i.e. inclusive)
  • The Range for grouped scores is the upper real
    limit of the highest interval minus the lower
    real limit of the lowest interval. In the case
    of our
  • previous distribution this would be
  • 50.5 - .5 50

i fi 01-10
1 11-20 3 21-30 2 31-40
6 41-50 4 Total 16
17
Describing Variation Mean Deviation
  • The mean deviation is the sum of all deviations,
    in absolute numbers, divided by N.
  • Consider the set of observations, 6,7,9,10 The
    mean is 8 and the MD is (6-87-89-810-8)
    /4 6/4 1.5

18
Mean Deviation for Grouped Data
  • Again grouped data implies we substitute
    frequencies and midpoints for values

The mean would be 50,000 (satisfy yourself that
that is true) and the MD would be (638-50)
(843-50) (1248-50) (1253-50)
(858-50) (463-50) 725624366452
304/50 6.080 x 1000 6,080
19
Variation The Variance
  • The variance for raw data is the sum of the
    squared deviations divided by N
  • Consider the set Xi 6,7,9,10 The mean is 8
    and the variance is ((6-8)2(7-8)2(9-8)2(10-8)2)
    /4 2.5

20
Variance for Grouped Data
  • Frequencies and midpoints are still
    substituted for the values of Xi.

Again the mean is 50 and the Variance is
6(38-50)2 8(43-50)2 12(48-50)2 12 (53-50)2
8(58-50)2 4(63-60)2 1014 392 48 108
512 676 2690 / 50 53.8 x 1000 53,800.
The Standard Deviation is the sq root of this.
21
4. Covariance and Correlation
  • The Definition and Concept
  • The Formula
  • Proportional Reduction in Error and r2

22
Correlation Definition and Concept
  • Visually we can observe the co-variation of
    two variables as a scatter diagram where the
    abscissa and ordinate are the quantitative
    continua and the points are simultaneously
    mapping of the pairs of scores.

23
Correlation - Formula
  • Think of the correlation as a proportional
    measure of the relationship between two
    variables. It consists of the co-variation
    divided by the average variation

24
Correlation and P.R.E.
Consider this scatter diagram. The proportion of
variation around the Y mean (variation before
knowing X), less the proportion of variation
around the regression line (variation after
knowing x) is r2
25
Partial Correlation
IV. Quantitative Statistical Example of
Elaboration
Step 1 Construct the zero order

Pearsons correlations (r).
Assume rxy .55 where x divorce rates and y
suicide rates.
Further, assume that unemployment rates (z) is
our control variable and that rxz .60 and ryz
.40
Step 2 Calculate the partial correlation
(rxy.z)


.42
Therefore, Z accounts for (.30-.18) or 12 of Y
and (.12/.30) or 40 of the relationship between
XY
Before z (rxy)2 .30
Step 3 Draw conclusions
After z (rxy.z)2 .18
Write a Comment
User Comments (0)
About PowerShow.com