Title: Statistics for Decision Making
1Statistics for Decision Making
QM 2113 -- Spring 2003
- Wrapping Up Descriptive Statistics
Instructor John Seydel, Ph.D.
2Student Objectives
- Determine and apply correlation measures
- Avoid modeling errors in regression analysis
- Use regression appropriately with categorical
variables - Perform basic multiple regression modeling and
analysis - Use Excels Data Analysis tool for regression
- Interpret histograms
- Apply the standard deviation
- Define sampling error and use estimates to
determine population averages
3Comments/Questions Homework Other Stuff
- Technical reports using nontechnical language!
- Questions about mechanics?
- Crosstabs
- PivotTables
- Interpretations for crosstabs?
- Joint relative frequencies
- Contingent frequencies
- Collect homework
- Report
- Crosstabs (WA and KIVZ)
- How does this all fit together?
4Now, Briefly Back to Regression Analysis
- Often youll see a single statistic used to
summarize a bivariate relationship - Correlation coefficient (r)
- Summarizes the estimated strength of the
relationship - Square root of R2
- Same sign as b1
- -1 lt r lt 1
- What not to do
- Typical modeling errors
- Reverse Y and X
- Treat categorical variables as numeric
- Use Excel shortcuts to create inflexible
worksheets - Data analysis tool (demo)
- Plot trend line (demo)
5That Said, Lets Try Something
- Some interesting scatterplots for GNI data
- Salary versus Race (demo)
- Salary versus Gender (demo)
- Does this present any problems?
- Yes!
- That is, were using the wrong technique to
incorporate categorical variables - Thus, these analyses are useless, or are they?
6Treating Categorical Variables as Explanatory
Factors
- Scatterplots
- We can get a sense of how Y differs across
different categories - Example (Salary vs Race)
- It appears that salaries are about the same for
all races - We can estimate/describe (by inspection)
- Averages
- Variation
- Need to use ANOVA to get better understanding
- Binary variables (i.e., only two possible
categories) - Scatterplots and regression might have some value
- Regression remember what the statistics
represent - Slope estimated average change in Y as X
changes by 1 unit - Intercept estimated average Y when X is 0
7Consider Analysis of Salary versus Gender
- Independent variable is binary
- Lets do some univariate analyses
- Average salary for males
- Average salary for females
- Average salary difference between males and
females - Now, look at the regression statistics
- Intercept Average salary for males
- Slope Average salary difference between males
and females - Hence, there is some value to using binary
variables as independent variables in regression
analysis - This must, nevertheless, be done with caution
8Regression One Last Thing (for Now)
- Consider Salary vs Performance
- R2 81
- What does this mean?
- Yes, we estimate that 81 of all the variation in
Salary is explained by Performance - Now, what about the other 19?
- Lets expand our bivariate analysis into a
multivariate one
9Multiple Regression
- Just as bivariate analyses consider two variables
simultaneously, multivariate analyses consider
many variables at the same time - Allows us to identify interaction among the
explanatory variables - Easily accomplished with Excels Data Analysis
tool - But look at all that output
- Compare it to the same output for a bivariate
analysis - Our models
- Simple regression y-hat b0 b1x
- Multiple regression y-hat b0 b1x1 b1x2
b1x3 . . . - Interpretation of the bi values marginal
effect, holding other factors constant (i.e.,
ceteris paribus) - Now, were really getting beyond the scope of
this course - More study/application of this in the next course
(?)
10Now, Lets Take Another Look at Histograms
- Concepts to consider
- Symmetry, skew, and modality
- Estimating descriptive statistics by inspection
- Skew direction of the long, skinny part
- If on right, then average gt median
- Lets look at some histograms and identify the
skew - What does this tell us?
- Now, lets estimate
- Median (remember, its the 50th percentile)
- Mode the most likely data value (or range of
values) - Average hint the balance point
- Standard deviation hint R/6 lt s lt R/4
- So, whats the big deal?
11Finally, Lets Revisit the Average and Standard
Deviation
- A couple of rules
- Tchebycheffs rule (100 100/h2)
- The empirical rule
- Now, how well does x-bar estimate m?
- Consider the standard deviation
- Use the observed average to estimate the
population/process average - Related to, but different than the empirical rule
- Consider the concept of sampling error
- Difference between observed and population
measure - Average sampling error for x-bar is s/vn
- We can be 95 confident that m x-bar s/vn
- Application GNI salary data
12Summary of Objectives
- Determine and apply correlation
- Avoid modeling errors in regression analysis
- Use regression appropriately with categorical
variables - Perform basic multiple regression modeling and
use Excel for calculations - Interpret histograms
- Apply the standard deviation
- Define sampling error and use estimates to
determine population averages
13Next Time . . .
- Probability concepts and notation
- Homework
- Read probability material from text
- Perform crosstab analysis of GNI data
- Create PivotTables
- Report on relationship
- Look at last years midterm exam
- Review notation (prepare for quiz)
14Appendix
15Populations and Samples
Population
Sample
Statistic
Parameter
16Schematic View
17Nontechnical . . . ?