Title: Introduction to Inferential Statistics
1Introduction toInferential Statistics
2Inferential statistics
- So far weve assessed relationships between
variables two ways - Categorical variables tables and proportions
(percentages) - Continuous variables scattergrams and simple
correlation (r) - Inferential statistics are an extension of these
procedures - Provide far more precise assessments of
relationships
Higher rank ? more stress
r -.6 r2 .36
Higher income ? less crime
3Using inferential statistics
- Examples of inferential statistics
- Categorical variables Chi-Square (X2)
- Combination of categorical dependent and
continuous independent variable - Difference between the means test (t statistic)
- Continuous variables
- Correlation and regression (r and r2) can be used
inferentially - b statistic, generated through regression
analysis - Combination of nominal and continuous variables
- Logistic regression, generates b and exp b (odds
ratio) statistics - Requirements
- Must use probability sampling techniques (e.g.,
random sampling) - Parametric inferential statistics, including r,
r2, b and t - Variables must be continuous and approximately
normally distributed in the population - Non-parametric statistics
- Variables need not be normally distributed. We
will cover one Chi-Square (X2).
4General procedure
- Types of hypotheses
- Working hypothesis what a regular hypothesis
is called - Null hypothesis Fixed presumption that any
observed relationship between two variables is
caused by chance - Draw one or more samples and code the independent
and dependent variables - Use a test statistic (e.g., r) to assess the
hypothesized relationship - The computer calculates a coefficient for the
test statistic (e.g., r .21) - These coefficients are the sum of two components
- Systematic variance The actual, systematic
relationship between variables - Error variance An apparent relationship,
caused by sampling error. The size of this
component can be precisely calculated and shrinks
as sample size increases.
The big question Once we remove the error
component, is there enough of a real
relationship left to reject the null hypothesis?
Systematic variance
Errorvariance
5Test statistics and the null hypothesis
- To reject the null hypothesis, the test statistic
coefficient (e.g., r .7) must be sufficiently
large, after subtracting sampling error, to
reject the null hypothesis of no relationship - How much room is required? Enough to yield a
probability of less than five in one-hundred (lt
.05) that the relationship between variables was
produced by chance. - If the computer decides that the coefficient is
sufficiently large it will award at least one
asterisk. The relationship between variables is
statistically significant and the null
hypothesis (no relationship) is FALSE. - If the coefficient is too small, no asterisk ()
is awarded. The association between variables is
deemed non-significant and the null hypothesis
is TRUE. Working hypotheses that depend on this
relationship must be rejected. - For significant relationships, one to three
asterisks usually appear next to the test
statistics coefficient (e.g., .25, .36,
.41). More asterisks greater confidence that
a relationship is systematic not the product of
chance. - Probability less than 5 in 100 that a
coefficient was produced by chance (plt .05) - Probability less than 1 in 100 that a
coefficient was produced by chance (plt .01) - Probability less than 1 in 1,000 that a
coefficient was produced by chance (plt .001) - Instead of asterisks, sometimes the actual
probability that a coefficient was produced by
chance are given, usually in a column labeled
p. - Again, significant relationships are denoted by
ps less than .05 -
-
Good Better Best
6Some statistics used for testing relationships
Procedure Level of Measurement Statistic Interpretation
Correlation All variables continuous r Range -1 to 1, with 0 meaning no relationship. For example, .35 denotes a moderately strong positive relationship
Regression All variables continuous r2, R2 b Proportion of change in the dependent variable accounted for by change in the independent variable. R2 denotes cumulative effect of multiple independent variables. Unit change in the dependent variable caused by a one-unit change in the independent variable
Logistic regression DV nominal dichotomous, IVs nominal or continuous b exp(B) Dont try Odds that DV will change if IV changes one unit, or, if IV is dichotomous, if it changes its state. Range 0 to infinity 1 denotes even odds, or no relationship. Higher than 1 means positive relationship, lower negative relationship. Use percentage to describe likelihood of effect.
Chi-Square All variables categorical, not ordinal X2 Reflects difference between Observed and Expected frequencies. Use table to determine if coefficient is sufficiently large to reject null hypothesis
Difference between means IV dichotomous, DV continuous t Reflects magnitude of difference. Use table to determine if coefficient is sufficiently large to reject null hypothesis.
7A caution on hypothesis testing
- Probability statistics are the most common way to
evaluate relationships, but they are being
criticized for suggesting misleading results.
(Click here for a summary of the arguments.) - We normally use p values to accept or reject null
hypotheses. But the actual meaning is more
subtle - Formally, a p lt.05 means that, if an association
between variables was tested an infinite number
of times, a test statistic coefficient as large
as the one actually obtained (say, an r of .3)
would come up less than five times in a hundred
if the null hypothesis of no relationship was
actually true. - For our purposes, as long as we keep in mind the
inherent sloppiness of social science, and the
difficulties of accurately quantifying social
science phenomena, its sufficient to use
p-values to accept or reject null hypotheses. - We should always be skeptical of findings of
significance, and particularly when very large
samples are involved, as even weak relationships
will tend to be statistically significant. (More
on this later.)
8Examples of tables fromarticles, panels 1-12
91
Hypothesis Alcohol consumption ?
VictimizationMethod Logistic regression
Statistics b and Odds Ratio (Exp b)
Richard B. Felson and Keri B. Burchfield,
Alcohol and the Risk of Physical and Sexual
Assault Victimization, Criminology (424, 2004)
102
Hypothesis Black race related factors ?
Distrust of policeMethod Logistic regression
Statistic b (called the Estimate)
Elaine B. Sharp and Paul E. Johnson, Accounting
for Variation in Distrust of Local Police,
Justice Quarterly (261, 2009)
113
Hypothesis Race and class ? Satisfaction with
policeMethod Logistic regression Statistics b
and Exp b (odds ratio)
Yuning Wu, Ivan Y. Sun and Ruth A. Triplett,
Race, Class or Neighborhood Context Which
Matters More in Measuring Satisfaction With
Police?, Justice Quarterly (261, 2009)
124
Hypothesis Low self control ? More contact with
policeMethod Logistic regression Statistics b
and Exp b (odds ratio)
Kevin M. Beaver, Matt DeLisi, Daniel P. Mears and
Eric Stewart, Low Self-Control and Contact with
the Criminal Justice System in a Nationally
Representative Sample of Males, Justice
Quarterly (264, 2009)
135
Hypothesis Gender and race of victim ?
Imposition of death sentenceMethod Logistic
regression Statistics b (coefficient) and
odds-ratio (exp b)
Marian R. Williams, Stephen Demuth and Jefferson
E. Holcomb, Understanding the Influence of
Victim Gender in Death Penalty Cases The
Importance of Victim Race, Sex-Related
Victimization, and Jury Decision Making,
Criminology (454, 2007)
146
Hypothesis Academic performance ?
DelinquencyMethod Tobit regression
Statistic b
Richard B. Felson and Jeremy Staff, Explaining
the Academic Performance-Delinquency
Relationship, Criminology (442, 2006)
Best when the DV for a large proportion of
cases has a zero value
157
Hypothesis Strains of imprisonment ?
RecidivismMethod Logistic regression
Statistics B and exp B (odds-ratio)
Shelley Johnson Listwan, Christopher J. Sullivan,
Robert Agnew, Francis T. Cullen and Mark Colvin,
The Pains of Imprisonment Revisited The Impact
of Strain on Inmate Recidivism, Justice
Quarterly (301, 2013)
168
Hypothesis Fathers incarceration ? Sons
delinquencyMethod Logistic regression
Statistic Odds ratio (Standard Error in
parentheses)
Michael E. Roettger and Raymond R. Swisher,
Associations of Fathers History of
Incarceration With Sons Delinquency and Arrest
Among Black, White and Hispanic Males in the
United States, Criminology (494, 2011)
179
Hypothesis Officer and driver race ? Vehicle
searchMethod Logistic regression Statistics
Odds ratio (Standard Error in parentheses)
Jeff Rojek, Richard Rosenfeld and Scott Decker,
Policing Race The Racial Stratification of
Searches in Police Traffic Stops, Criminology
(504, 2012
18Brian D. Johnson and Stephanie M. Dipietro, The
Power of Diversion Intermediate Sanctions and
Sentencing Disparity Under Presumptive
Guidelines, Criminology (503, 2012)
1911
Hypothesis Child abuse neighborhood factors ?
Childs subsequent violent behaviorMethod
Logistic regression Statistic b (coefficient)
Emily M. Wright and Abigail A. Fagan, The Cycle
of Violence in Context Exploring the Moderating
Roles of Neighborhood Disadvantage and Cultural
Norms, Criminology (512, 2013)
2012
Hypothesis Marriage ? Desistance from
crimeMethod HLM (like logistic regression)
Statistics b (Coeff.) Can compute log odds)
Bianca E. Bersani and Elaine Eggleston Doherty,
When the Ties That Bind Unwind Examining the
Enduring and Situational Processes of Change
Behind the Marriage Effect, Criminology (512,
2013)