Title: Chapter 15 Differences Between Groups and Relationships Among Variables
1Chapter 15Differences Between Groups and
Relationships Among Variables
2LEARNING OUTCOMES
After studying this chapter, you should be able to
- Understand what multivariate statistical analysis
involves and know the two types of multivariate
analysis - Interpret results from multiple regression
analysis. - Interpret results from multivariate analysis of
variance (MANOVA) - Interpret basic exploratory factor analysis
results
3What Is the Appropriate Test of Difference?
- Test of Differences
- An investigation of a hypothesis that two (or
more) groups differ with respect to measures on a
variable. - Behavior, characteristics, beliefs, opinions,
emotions, or attitudes - Bivariate Tests of Differences
- Involve only two variables a variable that acts
like a dependent variable and a variable that
acts as a classification variable. - Differences in mean scores between groups or in
comparing how two groups scores are distributed
across possible response categories.
4EXHIBIT 15.1 Choosing the Right Statistic
5EXHIBIT 15.1 Choosing the Right Statistic (contd)
6Common Bivariate Tests
Type of Measurement
Differences between two independent groups
Differences among three or more independent groups
Interval and ratio
Independent groups t-test or Z-test
One-way ANOVA
Ordinal
Mann-Whitney U-test Wilcoxon test
Kruskal-Wallis test
Nominal
Z-test (two proportions) Chi-square test
Chi-square test
7Cross-Tabulation Tables The ?2 Test for
Goodness-of-Fit
- Cross-Tabulation (Contingency) Table
- A joint frequency distribution of observations on
two more variables. - ?2 Distribution
- Provides a means for testing the statistical
significance of a contingency table. - Involves comparing observed frequencies (Oi) with
expected frequencies (Ei) in each cell of the
table. - Captures the goodness- (or closeness-) of-fit of
the observed distribution with the expected
distribution.
8Example Papa Johns Restaurants
Univariate HypothesisPapa Johns restaurants
are more likely to be located in a stand-alone
location or in a shopping center.
Bivariate Hypothesis Stand-alone locations are
more likely to be profitable than are shopping
center locations.
9Chi-Square Test
?² chi-square statistic Oi observed frequency
in the ith cell Ei expected frequency on the
ith cell
Ri total observed frequency in the ith row Cj
total observed frequency in the jth column n
sample size
10Degrees of Freedom (d.f.)
d.f.(R-1)(C-1)
11The t-Test for Comparing Two Means
- Independent Samples t-Test
- A test for hypotheses stating that the mean
scores for some interval- or ratio-scaled
variable grouped based on some less than interval
classificatory variable.
12The t-Test for Comparing Two Means (contd)
- Determining when an independent samples t-test
is appropriate - Is the dependent variable interval or ratio?
- Can the dependent variable scores be grouped
based upon some categorical variable? - Does the grouping result in scores drawn from
independent samples? - Are two groups involved in the research question?
13The t-Test for Comparing Two Means (contd)
- Pooled Estimate of the Standard Error
- An estimate of the standard error for a t-test of
independent means that assumes the variances of
both groups are equal.
14The t-Test for Comparing Two Means (contd)
15EXHIBIT 15.2 Independent Samples t-Test Results
16What Is ANOVA?
- Analysis of Variance (ANOVA)
- An analysis involving the investigation of the
effects of one treatment variable on an
interval-scaled dependent variable - A hypothesis-testing technique to determine
whether statistically significant differences in
means occur between two or more groups. - A method of comparing variances to make
inferences about the means. - ANOVA tests whether grouping observations
explains variance in the dependent variable.
17Simple Illustration of ANOVA
- How much coffee respondents report drinking each
day based on which shift they work (GY stands for
Graveyard shift).
Day 1 Day 3 Day 4 Day 0 Day 2 GY 7 GY 2 GY
1 GY 6 Night 6 Night 8 Night 3 Night
7 Night 6
18EXHIBIT 15.3 Illustration of ANOVA Logic
19Partitioning Variance in ANOVA
- Total Variability
- Grand mean
- The mean of a variable over all observations.
- SST
- The total observed variation across all groups
and individual observations - SST Total of (observed value-grand mean)2
20Partitioning Variance in ANOVA
- Between-groups Variance
- The sum of differences between the group mean and
the grand mean summed over all groups for a given
set of observations. - SSB
- Systematic variation of scores between groups due
to manipulation of an experimental variable or
group classifications of a measured independent
variable or between-group variance. - SSB Total of ngroup(Group Mean - Grand Mean)2
21Partitioning Variance in ANOVA
- Within-group Error or Variance
- The sum of the differences between observed
values and the group mean for a given set of
observations also known as total error variance. - SSE
- Variation of scores due to random error or
within-group variance due to individual
differences from the group mean. - This is the error of prediction.
- SSE Total of (Observed Mean - Group Mean)2
22The F-Test
- F-Test
- Is used to determine whether there is more
variability in the scores of one sample than in
the scores of another sample. - Variance components are used to compute f-ratios
- SSE, SSB, SST
23EXHIBIT 15.4 Interpreting ANOVA
24Correlation Coefficient Analysis
- Correlation coefficient
- A statistical measure of the covariation, or
association, between two at-least interval
variables. - Covariance
- Extent to which two variables are associated
systematically with each other.
25Simple Correlation Coefficient
- Correlation coefficient (r)
- Ranges from 1 to -1
- Perfect positive linear relationship 1
- Perfect negative (inverse) linear relationship
-1 - No correlation 0
- Correlation coefficient for two variables (X,Y)
26Correlation, Covariance, and Causation
- When two variables covary, they display
concomitant variation. - This systematic covariation does not in and of
itself establish causality. - Roosters crow and the rising of the sun
- Rooster does not cause the sun to rise.
27Coefficient of Determination
- Coefficient of Determination (R2)
- A measure obtained by squaring the correlation
coefficient the proportion of the total variance
of a variable accounted for by another value of
another variable. - Measures that part of the total variance of Y
that is accounted for by knowing the value of X.
28Regression Analysis
- Simple (Bivariate) Linear Regression
- A measure of linear association that investigates
straight-line relationships between a continuous
dependent variable and an independent variable
that is usually continuous, but can be a
categorical dummy variable. - The Regression Equation (Y a ßX )
- Y the continuous dependent variable
- X the independent variable
- a the Y intercept (regression line intercepts Y
axis) - ß the slope of the coefficient (rise over run)
29The Regression Equation
- Parameter Estimate Choices
- ß is indicative of the strength and direction of
the relationship between the independent and
dependent variable. - a (Y intercept) is a fixed point that is
considered a constant (how much Y can exist
without X) - Standardized Regression Coefficient (ß)
- Estimated coefficient of the strength of
relationship between the independent and
dependent variables. - Expressed on a standardized scale where higher
absolute values indicate stronger relationships
(range is from -1 to 1).
30EXHIBIT 15.5 The Advantage of Standardized
Regression Weights
31The Regression Equation (contd)
- Parameter Estimate Choices (contd)
- Raw regression estimates (b1)
- Raw regression weights have the advantage of
retaining the scale metricwhich is also their
key disadvantage. - If the purpose of the regression analysis is
forecasting, then raw parameter estimates must be
used. - This is another way of saying when the researcher
is interested only in prediction. - Standardized regression estimates (ß1)
- Standardized regression estimates have the
advantage of a constant scale. - Standardized regression estimates should be used
when the researcher is testing explanatory
hypotheses.
32Multiple Regression Analysis
- Multiple Regression Analysis
- An analysis of association in which the effects
of two or more independent variables on a single,
interval-scaled dependent variable are
investigated simultaneously.
- Dummy variable
- The way a dichotomous (two group) independent
variable is represented in regression analysis by
assigning a 0 to one group and a 1 to the other.
33Multiple Regression Analysis (contd)
- A Simple Example
- Assume that a toy manufacturer wishes to explain
store sales (dependent variable) using a sample
of stores from Canada and Europe. - Several hypotheses are offered
- H1 Competitors sales are related negatively to
sales. - H2 Sales are higher in communities with a sales
office than when no sales office is present. - H3 Grammar school enrollment in a community is
related positively to sales.
34Multiple Regression Analysis (contd)
- Statistical Results of the Multiple Regression
- Regression Equation
- Coefficient of multiple determination (R2) 0.845
- F-value 14.6 plt.05
35Multiple Regression Analysis (contd)
- Regression Coefficients in Multiple Regression
- Partial correlation
- The correlation between two variables after
taking into account the fact that they are
correlated with other variables too. - R2 in Multiple Regression
- The coefficient of multiple determination in
multiple regression indicates the percentage of
variation in Y explained by all independent
variables.
36Multiple Regression Analysis (contd)
- Coefficients of Partial Regression
- bn
- Independent variables correlated with one another
- The percentage of variance in the dependent
variable that is explained by a single
independent variable, holding other independent
variables constant - R2
- The percentage of variance in the dependent
variable that is explained by the variation in
the independent variables.
37Multiple Regression Analysis (contd)
- Statistical Significance in Multiple Regression
- F-test
- Tests statistical significance by comparing the
variation explained by the regression equation to
the residual error variation. - Allows for testing of the relative magnitudes of
the sum of squares due to the regression (SSR)
and the error sum of squares (SSE).
38Multiple Regression Analysis (contd)
- Degrees of Freedom (d.f.)
- k number of independent variables
- n number of observations or respondents
- Calculating Degrees of Freedom (d.f.)
- d.f. for the numerator k
- d.f. for the denominator n - k - 1
39F-test
40EXHIBIT 15.4Interpreting Multiple Regression
Results
41Steps in Interpreting a Multiple Regression Model
- Examine the model F-test.
- Examine the individual statistical tests for each
parameter estimate. - Examine the model R2.
- Examine collinearity diagnostics.
42Other Multivariate Techniques
- Multivariate Data Analysis
- A group of statistical techniques allowing for
the simultaneous analysis of three or more
variables. - Multivariate Techniques
- Exploratory factor analysis
- Confirmatory factor analysis
- Multivariate analysis of variance (MANOVA)
- Multiple discriminant analysis
- Cluster analysis.
43Key Terms and Concepts
- Cross-tabulation (contingency table)
- Univariate analysis
- Bivariate analysis
- Analysis of variance (ANOVA)
- Grand mean
- Between-groups variance
- Within-group error or variance
- F-test
- Within-group variation
- Between-group variance
- Total variability (SST)
- Correlation coefficient
- Coefficient of determination (r2)
- Simple linear regression
- Standardized regression coefficient (ß)
- Multiple regression analysis
- Multivariate data analysis