Title: What on earth is a p value, a Process sigma, Cronbach
1What on earth is a p value, a Process sigma,
Cronbachs alpha, the Black-Scholes formula, a
Priority in AHP, or the Sunday Times score for
Portsmouth University? On the interpretability of
measurements based on mathematical models.
- Michael Wood
- June 2011
- http//userweb.port.ac.uk/woodm/presentations.htm
2- Management makes use of many measurements based
on mathematical models, but these are often
difficult to interpret sensibly. This talk will
look at some examples of such measurements, and
the consequences of the problems of their
interpretation including the employment of
unnecessary academics to teach what should be
obvious, and supporting the bad decisions which
led to the recent financial crash. I will then
discuss how these, and other, measurements could
be redesigned to make them more useful and
user-friendly.
3Ill look at four examples
- Six sigma and the process sigma measurement
- Null hypothesis significance tests and p values
- University league tables
- Risk measurements and the normal (Gaussian)
distribution
4Four examples with some imaginary dialogues
between the expert and a naive user ...
5Process sigma the measurement linked to the Six
Sigma philosophy
- The process sigma for this process is 4.833
- What on earth does this mean?
- It means there are 430 dpmo (defects per million
opportunities). Use this Sigma calculator - So why not just say 430 dpmo? Keep it simple!
- But this would be dumbing down. Life is difficult
and we mustnt join the modern trend of trying to
make it easier. - Why not? The complicated version adds nothing
except confusing the uninitiated. (Similar
comments apply to Cpk.) - ... which must be a good thing!
6p values
- Weve done a survey and found that women are more
intelligent than men. p value is 0.004. - What does the p value mean?
- It tells us how sure we can be about our results
taking sampling error into account. - 0.0002 is very small. Not very impressive!
- Its a bit difficult to explain p values to
someone like you, but smaller is better. Less
than 5 mean you can be fairly sure women are
cleverer than men, less than 1 is almost
conclusive. - Sounds like youre trying to confuse me
- Reverse measure of wrong thing, misinterpreted
- Statman bits. User friendly units - /inch, etc.
7 p values
- Im told that if the p value is 0.004 this means
that we can be 99.8 confident that women really
are more intelligent based on this data. Isnt
that a better way to put it? - No, thats a common misunderstanding ... you need
to go on a course, although Im not sure youll
take it in ... - There are lots of common misunderstandings, but
Im sure about the 99.8 confident ...
8University League tables
- The Sunday Times score for Portsmouth University
is 599. - What does that mean?
- Well e.g. Southampton got 783 points so
Southampton is obviously a better place to study - What are the points based on?
- Lots of things e.g. Student satisfaction,
Research quality - So do Southampton do better on these two? ...
9... University League tables
- Actually Portsmouth do a little better on student
satisfaction (174 vs 169/250), but Southampton do
better on research quality (136 vs 112/200) - But student satisfaction is more important to
students than research quality ... - Youve got to balance the two. The experts at the
Sunday Times have done this. - But different people may want different things ...
10Measurements of risk
- Muddled Michael has a habit of losing his car
keys when he goes on holiday. He reckons he has a
25 chance of losing his keys. He decides to
consult an expert on risk - Easy! If he takes 9 spare keys with him, then the
probability of losing all 10 keys is 0.2510 which
is about one chance in a million which seems an
acceptable risk. - Michael puts all 10 keys on the same key ring (he
doesnt want to confuse himself by putting them
in different places) and goes on holiday. - The problem here is that the maths assumes that
losing each key is an independent event. In fact
if he loses one key he will probably lose the
rest as well, so a more realistic estimate of
losing all his keys is 25! - There are similar assumptions underlying most
risk calculations but if the calculations are
more complicated it is easy not to notice.
11Risk and the weather
- The probability of more than 1 mm of rain falling
in Southampton in one day is 31.5 - (Estimated from Met Office graph based on
1971-2000 data.) - Then, theoretically, the probability of a week
when it rains every day is 0.3157 which suggests
that this happens about every 9 years. - Two weeks with rain every day is a once in 29000
years event. - Almost certainly happens more often last time
was 20-30 November 2009, and the time before was
10-16 of the same month - (Southampton Weather website)
- The theory is wrong because the assumptions are
wrong!
12Risk and the normal distribution
- Very similar assumptions underlie the normal
(Gaussian) distribution. This assumes that the
variable depends on a large number of small
independent factors. If not the predictions can
be misleading especially for rare events - Many finance measurements depend on the normal
distribution and similar assumptions e.g. Black
Scholes formula. OK in normal times, but tends to
seriously underestimate the probability of big
falls. - If the Dow Jones Industrial average moved in
accordance with a normal distribution, it would
have moved by 4.5 or more on only six days
between 1996 and 2003 . In reality 366 times
(Mandelbrot cited by Buckley, 2011, p. 140). - Black Monday (1987) was a 20 sd event, once in a
million year event, experienced several times by
people much young than a million years (Buckley,
2011, 141). - Measures understood but not assumptions trust
in a misunderstood version
13What can go wrong?
- Unnecessary time and effort expended
- E.g. 50 of time spent on stats courses could be
saved by redesigning concepts? Big savings in
time and effort possible! - Failure to understand
- Complete
- Subtleties
- Misunderstanding
- Of basic concept
- Of assumptions leading to misleading uses
14... for example ...
- P values
- Massive amount of wasted time and energy (think
of all those journal articles), general
confusion, misinterpretations like
significantimportant - University league tables
- scores taken too seriously, specific requirements
ignored, creates uniformity because everyone
thinks the same rational world would be more
varied - Risk
- ignoring unrealistic assumptions led to
over-confidence in mathematical measures which
helped the financial crash ...
15Principles for designing measurements for
understanding
- Remember most measurements determined by
historical accident therefore can probably be
improved for current users and uses. Design not
discovery. - Name should reflect meaning of result, not the
method used to get there - Make sure the direction is intuitive, use units
and percentages as appropriate - Must be an accurate description of meaning of
measurement in users language - Users must understand key assumptions (which are
not irrelevant technicalities). If possible users
should follow general idea of derivation.
16Reasons for the persistence of strange
measurements
- Aim often ticking a box, not understanding
- Users dont see problem
- Interests of experts and teachers
- Mystification is good for business! Some
measurements (e.g. process sigma) invented solely
for this purpose? - The dumbing down myth
- Increased user-friendliness should lead to more,
not less, powerful use of measurements - We need to dumb up so that even the dumb wont do
dumb things
17References
- Buckley, Adrian (2011). Financial Crisis causes,
context and consequences. Harlow Pearson
Education. - I Six Sigma (2011). Sigma calculator available at
http//www.isixsigma.com - Met Office graph
- Southampton Weather website