Title: Using Real Data For Decision Analysis
1Using Real Data For Decision Analysis
- Decision Theory Analysis
- David M. Dilts
2Which class would you take?
- Next semester, two professors will be teaching
Decision Theory.
VS.
Dr. Pieceofcake
Dr. Uragonnafail
- Based on RateMyProfessor.com, the past grades are
posted over the five years.
mean 1750.82
mean 1770.16
(out of 2000 points)
Which professor would you sign up for?
3Finding the Distribution
VS.
mean 1750.82
mean 1770.16
(out of 2000 points)
After quick graphic analysis, you realize that
the distributions are normally distributed with
approximately the same shape
Which professor would you still sign up for?
4Standardizing the axis
mean 1750.82 var 63.8635
mean 1770.16 var 7.139664
(out of 2000 points)
Which professor would you sign up for now?
5A matter of perspective
- Which professor would you want if you were a 4.0
students? - What if you were a slacker?
- Remember it is not just the central tendency
that is important, the dispersion is also
critical
6There are a variety of probability models that
can be used to help make decisions
- Binomial distribution
- Poisson distribution
- Normal distribution
- Exponential distribution
- Beta distribution
- etc, etc, etc
- It is always important to use the correct
distribution to explain your data - BUT
- More importantly, it is essential to always
consider the context by which you are making the
decision
7Caveats in using real data to make decisions
- Fallacy of Averages
- Assumptions of normality
- Errors in estimations
- Impact of outliers
- Residual Values
8Fallacy of averages
9Fallacy of Averages
Mean Duration of Typical Process
Mean Duration of Complex Processes
A simple case of fallacy of averages is the case
of overall height of a specific population. Men
and Women have natural bimodal distributions, but
combined have a normal distribution.
10Assumptions of Normality
- Normal distributions are not always bell
shapedRequirements for normal distribution
symmetrical 68, 95, 99 rule
- Not all distributions are normal, few are
perfectly normally distributed
- Much of decision analysis is based on assumptions
of the normal curve to make calculations easier.
It is important to understand the limitations of
the normal curve when basing your decisions off
of it.
11Errors in Estimations
Some linear-regressions force the line to the
0-0 point . (meaning x-intercept 0
y-intercept 0 ) When creating regressions to
the data, have an understanding of the range to
which your estimation is reasonable.
Would it make sense to have a the regression line
go be forced through the 0-0 point in this case?
12Impact of Outliers
- Outliers can bias the regression estimation to
accommodate the extreme data point(s). - What happens if you add Maudie Hopkins (19) and
William Cantrell (86) last civil war widow
Anna Nicole Smith (26) to J. Howard Marshal II
(89)? Or, more recently, Demi Moore (42) to
Ashton Kutcher (27)?
13Impact of Outliers
Outliers can also give a false sense of
correlation between two variables. In a
correlation test, the strength of the
relationship between the husbands age and the
wifes age would be incorrectly accentuated.
This would reflect the incorrect observation
pertaining the compactness of the dataset.
14Residual value
Two datasets can give identical regression
estimations. The graph on the right has a larger
residual value therefore there is a presence of
greater error. In other words, there is not as
strong of a relationship of predicting values of
y from x.
15Things to remember about real data
- It is messy!
- It is not always memoryless
- i.e., the recent past really is a better
indicator of future performance - It can have many outliers that are
- Important indications of new trends, or
- Oddities that should be eliminated
- There is a major difference between statistical
significance and practical significance - Dont just look at the statistical results, look
at the data itself!