Title: Simplicity and Truth: an Alternative Explanation of Ockham's Razor
1Simplicity and Truth an Alternative Explanation
of Ockham's Razor
- Kevin T. Kelly
- Conor Mayo-Wilson
- Department of Philosophy
- Joint Program in Logic and Computation
- Carnegie Mellon University
- www.hss.cmu.edu/philosophy/faculty-kelly.php
2I. The Simplicity Puzzle
3Which Theory is Right?
???
4Ockham Says
Choose the Simplest!
5But Why?
Gotcha!
6Puzzle
- An indicator must be sensitive to what it
indicates.
simple
7Puzzle
- A reliable indicator must be sensitive to what it
indicates.
complex
8Puzzle
- But Ockhams razor always points at simplicity.
simple
9Puzzle
- But Ockhams razor always points at simplicity.
complex
10Puzzle
- How can a broken compass help you find something
unless you already know where it is?
complex
11Standard Accounts
- 1. Prior Simplicity Bias
- Bayes, BIC, MDL, MML, etc.
- 2. Risk Minimization
- SRM, AIC, cross-validation, etc.
121. Prior Simplicity Bias
The simple theory is more plausible now because
it was more plausible yesterday.
13More Subtle Version
- Simple data are a miracle in the complex theory
but not in the simple theory.
Regularity retrograde motion of Venus at solar
conjunction
Has to be!
P
C
14However
- e would not be a miracle given P(q)
Why not this?
P
C
15The Real Miracle
C
P
Ignorance about model p(C) ?
p(P) Ignorance about parameter setting
p(P(q) P) ? p(P(q ) P). Knowledge about
C vs. P(q) p(P(q)) ltlt p(C).
q
q
q
q
q
q
q
q
Lead into gold. Perpetual motion. Free lunch.
Ignorance is knowledge. War is peace. I love Big
Bayes.
16Standard Paradox of Indifference
Ignorance of red vs. not-red Ignorance over
not-red Knowledge about red vs. white.
q
q
Knognorance All the priveleges of
knowledge With none of the responsibilities
Yeah!
17The Ellsberg Paradox
1/3
?
?
18Human Preference
1/3
?
?
b
a
b
gt
lt
19Human View
1/3
?
?
knowledge
ignorance
b
a
b
gt
knowledge
ignorance
lt
20Bayesian View
1/3
?
?
knognorance
knognorance
b
a
b
gt
knognorance
knognorance
gt
21In Any Event
The coherentist foundations of Bayesianism have
nothing to do with short-run truth-conduciveness.
Not so loud!
22Bayesian Convergence
- Too-simple theories get shot down
Updated opinion
Theories
Complexity
23Bayesian Convergence
- Plausibility is transferred to the next-simplest
theory
Updated opinion
Theories
Plink!
Complexity
Blam!
24Bayesian Convergence
- Plausibility is transferred to the next-simplest
theory
Updated opinion
Theories
Plink!
Complexity
Blam!
25Bayesian Convergence
- Plausibility is transferred to the next-simplest
theory
Updated opinion
Theories
Plink!
Complexity
Blam!
26Bayesian Convergence
- The true theory is nailed to the fence.
Updated opinion
Zing!
Theories
Complexity
Blam!
27Convergence
- But alternative strategies also converge
- Anything in the short run is compatible with
convergence in the long run.
28Summary of Bayesian Approach
- Prior-based explanations of Ockhams razor are
circular and based on a faulty model of
ignorance. - Convergence-based explanations of Ockhams razor
fail to single out Ockhams razor.
29 2. Risk Minimization
- Ockhams razor minimizes expected distance of
empirical estimates from the true value.
Truth
30 Unconstrained Estimates
- are Centered on truth but spread around it.
Pop! Pop! Pop! Pop!
Unconstrained aim
31 Constrained Estimates
- Off-center but less spread.
Truth
Clamped aim
32 Constrained Estimates
- Off-center but less spread
- Overall improvement in expected distance from
truth
Pop! Pop! Pop! Pop!
Truth
Clamped aim
33 Doesnt Find True Theory
- The theory that minimizes estimation risk can be
quite false.
Four eyes!
Clamped aim
34 Makes Sense
- when loss of an answer is similar in nearby
distributions.
Close is good enough!
Loss
Similarity
p
35 But Truth Matters
- when loss of an answer is discontinuous with
similarity.
Loss
Close is no cigar!
Similarity
p
36 E.g. Science
If you want true laws, false laws arent good
enough.
37 E.g. Science
You must be a philosopher. This is a machine
learning conference.
38 E.g., Causal Data Mining
Protein A
Protein C
Cancer protein
Protein B
Now youre talking! Im on a cilantro-only diet
to get my protein C level under control.
Practical enough?
39 Central Idea
- Correlation does imply causation if there are
multiple variables, some of which are common
effects. Pearl, Spirtes, Glymour and Scheines
Protein A
Protein C
Cancer protein
Protein B
40Core assumptions
- Joint distribution p is causally compatible with
directed, acyclic graph G iff - Causal Markov Condition each variable X is
independent of its non-effects given its
immediate causes. - Faithfulness Condition no other conditional
independence relations hold in p.
41 Tell-tale Dependencies
C
C
H
F1
F
Given F, H gives some info about
C (Faithfulness)
Given C, F1 gives no further info about
F2 (Markov)
42Common Applications
- Linear Causal Case each variable X is a linear
function of its parents and a normally
distributed hidden variable called an error
term. The error terms are mutually independent. - Discrete Multinomial Case each variable X takes
on a finite range of values.
43 A Very Optimistic Assumption
- No unobserved latent confounding causes
Ill give you this one.
Whats he up to?
44 Current Nutrition Wisdom
Protein A
Protein C
Cancer protein
Protein B
Are you kidding? Its dripping with Protein C!
English Breakfast?
45 As the Sample Increases
Protein A
Protein C
Cancer protein
weak
Protein B
Protein D
This situation approximates The last one. So who
cares?
I do! Out of my way!
46 As the Sample Increases Again
Protein E
Protein A
weak
Protein C
Cancer protein
weak
Protein B
weak
Protein D
Wasnt that last approximation to the truth good
enough?
Aaack! Im poisoned!
47 Causal Flipping Theorem
- No matter what a consistent causal discovery
procedure has seen so far, there exists a pair G,
p satisfying the assumptions so that the current
sample is arbitrarily likely and the procedure
produces arbitrarily many opposite conclusions in
p as sample size increases.
oops
I meant
oops
I meant
oops
I meant
48The Wrong Reaction
- The demon undermines justification of science.
- He must be defeated to forestall skepticism.
- Bayesian circularity
- Classical instrumentalism
Urk!
Grrrr!
49Another View
- Many explanations have been offered to make sense
of the here-today-gone-tomorrow nature of medical
wisdom what we are advised with confidence one
year is reversed the next but the simplest one
is that it is the natural rhythm of science. - (Do We Really Know What Makes us Healthy, NY
Times Magazine, Sept. 16, 2007).
50Zen Approach
- Get to know the demon.
- Locate the justification of Ockhams razor in his
power.
51Connections to the Truth
- Short-run Reliability
- Too strong to be feasible when theory matters.
- Long-run Convergence
- Too weak to single out Ockhams razor
Complex
Simple
Simple
Complex
52Middle Path
- Short-run Reliability
- Too strong to be feasible when theory matters.
- Straightest convergence
- Just right?
- Long-run Convergence
- Too weak to single out Ockhams razor
Complex
Simple
53II. Navigation by Broken Compass
simple
54Asking for Directions
Wheres
55Asking for Directions
Turn around. The freeway ramp is on the left.
56Asking for Directions
57Best Route
58Best Route to Any Goal
59Disregarding Advice is Bad
Extra U-turn
60Best Route to Any Goal
so fixed advice can help you reach a hidden
goal without circles, evasions, or magic.
61In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
62In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
63In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
64In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
65Ahead of Mother Nature
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
66Ahead of Mother Nature
I know youre coming!
Cubic
Quadratic
Linear
Constant
67Ahead of Mother Nature
Maybe.
Cubic
Quadratic
Linear
Constant
68Ahead of Mother Nature
!!!
Hmm, its quite nice here
Cubic
Quadratic
Linear
Constant
69Ahead of Mother Nature
Youre back! Learned your lesson?
Cubic
Quadratic
Linear
Constant
70Ockham Violators Path
See, you shouldnt run ahead Even if you are
right!
Cubic
Quadratic
Linear
Constant
71Ockham Path
Cubic
Quadratic
Linear
Constant
72Empirical Problems
- Set K of infinite input sequences.
- Partition of K into alternative theories.
K
T1
T2
T3
73Empirical Methods
- Map finite input sequences to theories or to ?.
T3
K
T1
T2
T3
e
74Method Choice
Output history
At each stage, scientist can choose a new method
(agreeing with past theory choices).
T1
T2
T3
e1
e2
e3
e4
Input history
75Aim Converge to the Truth
T3
?
T2
?
T1
T1
T1
T1
. . .
T1
T1
T1
K
T1
T2
T3
76Retraction
- Choosing T and then not choosing T next
T
T
?
77Aim Eliminate Needless Retractions
Truth
78Aim Eliminate Needless Retractions
Truth
79Ancient Roots
"Living in the midst of ignorance and considering
themselves intelligent and enlightened, the
senseless people go round and round, following
crooked courses, just like the blind led by the
blind." Katha Upanishad, I. ii. 5, c. 600 BCE.
80Aim Eliminate Needless Delays to Retractions
theory
81Aim Eliminate Needless Delays to Retractions
application
theory
application
application
application
corollary
application
application
application
corollary
application
corollary
82Why Timed Retractions?
Retraction minimization generalized
significance level. Retraction time minimization
generalized power.
83Easy Retraction Time Comparisons
Method 1
T1
T1
T2
T2
T2
T2
T4
T4
T4
. . .
T1
T1
T2
T2
T3
T3
T2
T4
T4
. . .
at least as many at least as late
Method 2
84Worst-case Retraction Time Bounds
(1, 2, 8)
. . .
. . .
. . .
T1
T2
T3
T3
T3
T4
T3
. . .
T1
T2
T3
T3
T3
T4
T4
. . .
T1
T2
T3
T3
T4
T4
T4
. . .
T1
T2
T4
T3
T4
T4
T4
. . .
Output sequences
85IV. Ockham Without Circles, Evasions, or Magic
86Curve Fitting
- Data open intervals around Y at rational values
of X.
87Curve Fitting
88Curve Fitting
89Curve Fitting
90Empirical Effects
91Empirical Effects
92Empirical Effects
May take arbitrarily long to discover
93Empirical Effects
May take arbitrarily long to discover
94Empirical Effects
May take arbitrarily long to discover
95Empirical Effects
May take arbitrarily long to discover
96Empirical Effects
May take arbitrarily long to discover
97Empirical Effects
May take arbitrarily long to discover
98Empirical Effects
May take arbitrarily long to discover
99Empirical Theories
- True theory determined by which effects appear.
100Empirical Complexity
More complex
101Background Constraints
More complex
102Background Constraints
More complex
103Ockhams Razor
- Dont select a theory unless it is uniquely
simplest in light of experience.
104Weak Ockhams Razor
- Dont select a theory unless it among the
simplest in light of experience.
105Stalwartness
- Dont retract your answer while it is uniquely
simplest
106Stalwartness
- Dont retract your answer while it is uniquely
simplest
107Timed Retraction Bounds
- r(M, e, n) the least timed retraction bound
covering the total timed retractions of M along
input streams of complexity n that extend e
M
. . .
. . .
Empirical Complexity
0
1
2
3
108Efficiency of Method M at e
- M converges to the truth no matter what
- For each convergent M that agrees with M up to
the end of e, and for each n - r(M, e, n) ? r(M, e, n)
M
M
. . .
. . .
Empirical Complexity
0
1
2
3
109M is Beaten at e
- There exists convergent M that agrees with M up
to the end of e, such that - For each n, r(M, e, n) ? r(M, e, n)
- Exists n, r(M, e, n) gt r(M, e, n).
M
M
. . .
. . .
Empirical Complexity
0
1
2
3
110Basic Idea
- Ockham efficiency Nature can force arbitary,
convergent M to produce the successive answers
down an effect path arbitrarily late, so
stalwart, Ockham solutions are efficient.
111Basic Idea
- Unique Ockham efficiency A violator of Ockhams
razor or stalwartness can be forced into an extra
retraction or a late retraction in complexity
class zero at the time of the violation, so the
violator is beaten by each stalwart, Ockham
solution.
112Ockham Efficiency Theorem
- Let M be a solution. The following are
equivalent - M is always strongly Ockham and stalwart
- M is always efficient
- M is never weakly beaten.
113Example Causal Inference
- Effects are conditional statistical dependence
relations.
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
. . .
. . .
114Causal Discovery Ockhams Razor
X
Y
Z
W
115Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W
116Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
117Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W
118Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W Z, X,Z
119Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W X, Z, X,Z
120IV. Simplicity Defined
121Approach
- Empirical complexity reflects nested problems of
induction posed by the problem. - Hence, simplicity is problem-relative but
topologically invariant.
122Empirical Problems
- Set K of infinite input sequences.
- Partition Q of K into alternative theories.
K
T1
T2
T3
123Simplicity Concepts
- A simplicity concept for (K, Q) is just a
well-founded order lt on a partition S of K with
ascending chains of order type not exceeding
omega such that - Each element of S is included in some answer in
Q. - Each downward union in (S, lt) is closed
- Incomparable sets share no boundary point.
- Each element of S is included in the boundary of
its successor.
124Empirical Complexity Defined
- Let Ke denote the set of all possibilities
compatible with observations e. - Let (S, lt) be a simplicity concept for (Ke, Q).
- Define c(w, e) the length of the longest lt path
to the cell of S that contains w. - Define c(T, e) the least c(w, e) such that T is
true in w.
125Applications
- Polynomial laws complexity degree
- Conservation laws complexity particle types
conserved quantities. - Causal networks complexity number of logically
independent conditional dependencies entailed by
faithfulness.
126General Ockham Efficiency Theorem
- Let M be a solution. The following are
equivalent - M is always strongly Ockham and stalwart
- M is always efficient
- M is never beaten.
127Conclusions
- Causal truths are necessary for counterfactual
predictions. - Ockhams razor is necessary for staying on the
straightest path to the true theory but does not
point at the true theory. - No evasions or circles are required.
128Further Directions
- Extension of unique efficiency theorem to mixed
strategies, stochastic model selection and
numerical computations. - Latent variables as Ockham conclusions.
- Degrees of retraction.
- Ockham pooling of marginal Ockham conclusions.
- Retraction efficiency assessment of standard
model selection methods.
129Suggested Reading
- "Ockhams Razor, Truth, and Information", in
Handbook of the Philosophy of Information, J. van
Behthem and P. Adriaans, eds., to appear. - "Ockhams Razor, Empirical Complexity, and
Truth-finding Efficiency", Theoretical Computer
Science, 383 270-289, 2007. - Both available as pre-prints at
www.hss.cmu.edu/philosophy/faculty-kelly.php