Title: Data Preparation
1Chapter 14
- Data Preparation Basic Data Analysis
2Some types of Variable Respecification (1)
- Reversal of reverse coding
- take number of alternatives plus one minus the
response - for 7-point scale, 8-score.
- Standardization
- equalizing the scale of multiple variables to
allow direct comparison - use z transformation
- z (x-x)/sx
3Some types of Variable Respecification (2)
- Additive scale construction
- adding scale variables that represent a common
dimension - e.g. five seven-point semantic differential items
to create one 35 point scale with greater
reliability
4Some types of Variable Respecification (3)
- Dummy coding
- creating binary (classes 2) variables from
multiple class categorical data - requires k (number of original categories) -1
dummy variables
5Frequency Distributions(one-way tabulations)
- The count, or frequency of occurrence,
- of the alternative responses for a variable
- often produced with a plot (e.g. box or pie) and
- SPSS Statistics/Summarize/frequency
6SPSS-Frequencies
7SPSS-Frequencies--Output
8Descriptive Statistics
- Mean
- median
- mode
- range
- interquartile range
- variance and standard deviation
9SPSS Descriptive Statistics
10SPSS Descriptive Statistics
11Summary of Summary Statistics and Types of Data
- Non-metric Data (Nominal Ordinal)
- frequencies
- mode
- median (ordinal only)
- range interquartile range (ordinal only)
- Metric Data (Interval Ratio)
- mean
- median
- mode
- standard deviation
- range and interquartile range
12Cross-tabulations (contingency tables)
- The simultaneous frequency distribution of the
alternative responses on two or more variables - Most popular method of studying associations
- Simplest form is two-way (two variables)
- Usually translated into percentages
- A third variable often provides more insight
- SPSS Statistics/Summarize/Cross-tab
13Transparency 89 Family Income and Number of Cars
Family Owns
14Transparency 90 Number of Cars by Family Income
15Transparency 91 Family Income by Number of Cars
16Transparency 93 Number of Cars by Size of Family
17Transparency 95 Number of Cars by Income and Size
of Family
18Addition of a Third Variable in a Cross-tab (1)
- refine association--i.e., show relationship more
pronounced for one level of a third factor than
for another - reveal suppressed association--i.e. how that a
apparent lack of association in aggregated data
is because it is contingent upon the levels of a
third
19Addition of a Third Variable in a Cross-tab (2)
- Reveal spurious relationship--i.e., an apparent
relationship is actually driven by a spurious
related to both of the original variables - Indicate no change in original relationship
20SPSS Cross-tab Cells
21SPSS Cross-tab Output
22Simple Bar Chart
23Simple Bar Chart
24Simple Bar Chart--Output
25Simple Bar Chart--missing data
26Two-way Bar-chart
27Two-way Bar-chart--Output
28Clustered (three-level) bar-chart (1)
29Clustered (three-level) bar-chart (2)
30Clustered (three-level) bar-chart--Output
31Chi-square (1)
- A general test of association
- Ratio of the difference between observed and
expected cell frequencies (in cross-tab) divided
by the expected frequency - Nullno association
- df (for table) (row cells -1) x
- (column cells -1)
32Chi-square (2)
- Should not use
- with percentages--use frequencies only
- if any expected cell frequency is 0 or more than
20 percent of cell frequencies less than 5 - SPSS from Cross-tabs or
- Non-Parametric/Chi-square