Simplicity and Truth: an Alternative Explanation of Ockham's Razor

About This Presentation

Title:

Simplicity and Truth: an Alternative Explanation of Ockham's Razor

Description:

Simplicity and Truth: an Alternative Explanation of Ockham's Razor Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation – PowerPoint PPT presentation

Number of Views:157

Avg rating:3.0/5.0

Slides: 130

Provided by: KevinT60

Learn more at: https://www.andrew.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Simplicity and Truth: an Alternative Explanation of Ockham's Razor

1
Simplicity and Truth an Alternative Explanation
of Ockham's Razor

Kevin T. Kelly
Conor Mayo-Wilson
Department of Philosophy
Joint Program in Logic and Computation
Carnegie Mellon University
www.hss.cmu.edu/philosophy/faculty-kelly.php

2
I. The Simplicity Puzzle
3
Which Theory is Right?
???
4
Ockham Says
Choose the Simplest!
5
But Why?
Gotcha!
6
Puzzle

An indicator must be sensitive to what it
indicates.

simple
7
Puzzle

A reliable indicator must be sensitive to what it
indicates.

complex
8
Puzzle

But Ockhams razor always points at simplicity.

simple
9
Puzzle

But Ockhams razor always points at simplicity.

complex
10
Puzzle

How can a broken compass help you find something
unless you already know where it is?

complex
11
Standard Accounts

1. Prior Simplicity Bias
Bayes, BIC, MDL, MML, etc.
2. Risk Minimization
SRM, AIC, cross-validation, etc.

12
1. Prior Simplicity Bias
The simple theory is more plausible now because
it was more plausible yesterday.
13
More Subtle Version

Simple data are a miracle in the complex theory
but not in the simple theory.

Regularity retrograde motion of Venus at solar
conjunction
Has to be!
P
C
14
However

e would not be a miracle given P(q)

Why not this?
P
C
15
The Real Miracle
C
P
Ignorance about model p(C) ?
p(P) Ignorance about parameter setting
p(P(q) P) ? p(P(q ) P). Knowledge about
C vs. P(q) p(P(q)) ltlt p(C).
q
q
q
q
q
q
q
q
Lead into gold. Perpetual motion. Free lunch.
Ignorance is knowledge. War is peace. I love Big
Bayes.
16
Standard Paradox of Indifference
Ignorance of red vs. not-red Ignorance over
not-red Knowledge about red vs. white.
q
q
Knognorance All the priveleges of
knowledge With none of the responsibilities
Yeah!
17
The Ellsberg Paradox
1/3
?
?
18
Human Preference
1/3
?
?
b
a
b
gt
lt
19
Human View
1/3
?
?
knowledge
ignorance
b
a
b
gt
knowledge
ignorance
lt
20
Bayesian View
1/3
?
?
knognorance
knognorance
b
a
b
gt
knognorance
knognorance
gt
21
In Any Event
The coherentist foundations of Bayesianism have
nothing to do with short-run truth-conduciveness.
Not so loud!
22
Bayesian Convergence

Too-simple theories get shot down

Updated opinion
Theories
Complexity
23
Bayesian Convergence

Plausibility is transferred to the next-simplest
theory

Updated opinion
Theories
Plink!
Complexity
Blam!
24
Bayesian Convergence

Plausibility is transferred to the next-simplest
theory

Updated opinion
Theories
Plink!
Complexity
Blam!
25
Bayesian Convergence

Plausibility is transferred to the next-simplest
theory

Updated opinion
Theories
Plink!
Complexity
Blam!
26
Bayesian Convergence

The true theory is nailed to the fence.

Updated opinion
Zing!
Theories
Complexity
Blam!
27
Convergence

But alternative strategies also converge
Anything in the short run is compatible with
convergence in the long run.

28
Summary of Bayesian Approach

Prior-based explanations of Ockhams razor are
circular and based on a faulty model of
ignorance.
Convergence-based explanations of Ockhams razor
fail to single out Ockhams razor.

29
2. Risk Minimization

Ockhams razor minimizes expected distance of
empirical estimates from the true value.

Truth
30
Unconstrained Estimates

are Centered on truth but spread around it.

Pop! Pop! Pop! Pop!
Unconstrained aim
31

Constrained Estimates

Off-center but less spread.

Truth
Clamped aim
32
Constrained Estimates

Off-center but less spread
Overall improvement in expected distance from
truth

Pop! Pop! Pop! Pop!
Truth
Clamped aim
33
Doesnt Find True Theory

The theory that minimizes estimation risk can be
quite false.

Four eyes!
Clamped aim
34
Makes Sense

when loss of an answer is similar in nearby
distributions.

Close is good enough!
Loss
Similarity
p
35
But Truth Matters

when loss of an answer is discontinuous with
similarity.

Loss
Close is no cigar!
Similarity
p
36
E.g. Science
If you want true laws, false laws arent good
enough.
37
E.g. Science
You must be a philosopher. This is a machine
learning conference.
38
E.g., Causal Data Mining
Protein A
Protein C
Cancer protein
Protein B
Now youre talking! Im on a cilantro-only diet
to get my protein C level under control.
Practical enough?
39
Central Idea

Correlation does imply causation if there are
multiple variables, some of which are common
effects. Pearl, Spirtes, Glymour and Scheines

Protein A
Protein C
Cancer protein
Protein B
40
Core assumptions

Joint distribution p is causally compatible with
directed, acyclic graph G iff
Causal Markov Condition each variable X is
independent of its non-effects given its
immediate causes.
Faithfulness Condition no other conditional
independence relations hold in p.

41
Tell-tale Dependencies
C
C
H
F1
F
Given F, H gives some info about
C (Faithfulness)
Given C, F1 gives no further info about
F2 (Markov)
42
Common Applications

Linear Causal Case each variable X is a linear
function of its parents and a normally
distributed hidden variable called an error
term. The error terms are mutually independent.
Discrete Multinomial Case each variable X takes
on a finite range of values.

43
A Very Optimistic Assumption

No unobserved latent confounding causes

Ill give you this one.
Whats he up to?
44
Current Nutrition Wisdom
Protein A
Protein C
Cancer protein
Protein B
Are you kidding? Its dripping with Protein C!
English Breakfast?
45
As the Sample Increases
Protein A
Protein C
Cancer protein
weak
Protein B
Protein D
This situation approximates The last one. So who
cares?
I do! Out of my way!
46
As the Sample Increases Again
Protein E
Protein A
weak
Protein C
Cancer protein
weak
Protein B
weak
Protein D
Wasnt that last approximation to the truth good
enough?
Aaack! Im poisoned!
47
Causal Flipping Theorem

No matter what a consistent causal discovery
procedure has seen so far, there exists a pair G,
p satisfying the assumptions so that the current
sample is arbitrarily likely and the procedure
produces arbitrarily many opposite conclusions in
p as sample size increases.

oops
I meant
oops
I meant
oops
I meant
48
The Wrong Reaction

The demon undermines justification of science.
He must be defeated to forestall skepticism.
Bayesian circularity
Classical instrumentalism

Urk!
Grrrr!
49
Another View

Many explanations have been offered to make sense
of the here-today-gone-tomorrow nature of medical
wisdom what we are advised with confidence one
year is reversed the next but the simplest one
is that it is the natural rhythm of science.
(Do We Really Know What Makes us Healthy, NY
Times Magazine, Sept. 16, 2007).

50
Zen Approach

Get to know the demon.
Locate the justification of Ockhams razor in his
power.

51
Connections to the Truth

Short-run Reliability
Too strong to be feasible when theory matters.
Long-run Convergence
Too weak to single out Ockhams razor

Complex
Simple
Simple
Complex
52
Middle Path

Short-run Reliability
Too strong to be feasible when theory matters.
Straightest convergence
Just right?
Long-run Convergence
Too weak to single out Ockhams razor

Complex
Simple
53
II. Navigation by Broken Compass
simple
54
Asking for Directions
Wheres
55
Asking for Directions
Turn around. The freeway ramp is on the left.
56
Asking for Directions
57
Best Route
58
Best Route to Any Goal
59
Disregarding Advice is Bad
Extra U-turn
60
Best Route to Any Goal
so fixed advice can help you reach a hidden
goal without circles, evasions, or magic.
61
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
62
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
63
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
64
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
65
Ahead of Mother Nature
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
66
Ahead of Mother Nature
I know youre coming!
Cubic
Quadratic
Linear
Constant
67
Ahead of Mother Nature
Maybe.
Cubic
Quadratic
Linear
Constant
68
Ahead of Mother Nature
!!!
Hmm, its quite nice here
Cubic
Quadratic
Linear
Constant
69
Ahead of Mother Nature
Youre back! Learned your lesson?
Cubic
Quadratic
Linear
Constant
70
Ockham Violators Path
See, you shouldnt run ahead Even if you are
right!
Cubic
Quadratic
Linear
Constant
71
Ockham Path
Cubic
Quadratic
Linear
Constant
72
Empirical Problems

Set K of infinite input sequences.
Partition of K into alternative theories.

K
T1
T2
T3
73
Empirical Methods

Map finite input sequences to theories or to ?.

T3
K
T1
T2
T3
e
74
Method Choice
Output history
At each stage, scientist can choose a new method
(agreeing with past theory choices).
T1
T2
T3
e1
e2
e3
e4
Input history
75
Aim Converge to the Truth
T3
?
T2
?
T1
T1
T1
T1
. . .
T1
T1
T1
K
T1
T2
T3
76
Retraction

Choosing T and then not choosing T next

T
T
?
77
Aim Eliminate Needless Retractions
Truth
78
Aim Eliminate Needless Retractions
Truth
79
Ancient Roots
"Living in the midst of ignorance and considering
themselves intelligent and enlightened, the
senseless people go round and round, following
crooked courses, just like the blind led by the
blind." Katha Upanishad, I. ii. 5, c. 600 BCE.
80
Aim Eliminate Needless Delays to Retractions
theory
81
Aim Eliminate Needless Delays to Retractions
application
theory
application
application
application
corollary
application
application
application
corollary
application
corollary
82
Why Timed Retractions?
Retraction minimization generalized
significance level. Retraction time minimization
generalized power.
83
Easy Retraction Time Comparisons
Method 1
T1
T1
T2
T2
T2
T2
T4
T4
T4
. . .
T1
T1
T2
T2
T3
T3
T2
T4
T4
. . .
at least as many at least as late
Method 2
84
Worst-case Retraction Time Bounds
(1, 2, 8)
. . .
. . .
. . .
T1
T2
T3
T3
T3
T4
T3
. . .
T1
T2
T3
T3
T3
T4
T4
. . .
T1
T2
T3
T3
T4
T4
T4
. . .
T1
T2
T4
T3
T4
T4
T4
. . .
Output sequences
85
IV. Ockham Without Circles, Evasions, or Magic
86
Curve Fitting

Data open intervals around Y at rational values
of X.

87
Curve Fitting

No effects

88
Curve Fitting

First-order effect

89
Curve Fitting

Second-order effect

90
Empirical Effects
91
Empirical Effects
92
Empirical Effects
May take arbitrarily long to discover
93
Empirical Effects
May take arbitrarily long to discover
94
Empirical Effects
May take arbitrarily long to discover
95
Empirical Effects
May take arbitrarily long to discover
96
Empirical Effects
May take arbitrarily long to discover
97
Empirical Effects
May take arbitrarily long to discover
98
Empirical Effects
May take arbitrarily long to discover
99
Empirical Theories

True theory determined by which effects appear.

100
Empirical Complexity
More complex
101
Background Constraints
More complex
102
Background Constraints
More complex
103
Ockhams Razor

Dont select a theory unless it is uniquely
simplest in light of experience.

104
Weak Ockhams Razor

Dont select a theory unless it among the
simplest in light of experience.

105
Stalwartness

Dont retract your answer while it is uniquely
simplest

106
Stalwartness

Dont retract your answer while it is uniquely
simplest

107
Timed Retraction Bounds

r(M, e, n) the least timed retraction bound
covering the total timed retractions of M along
input streams of complexity n that extend e

M
. . .
. . .
Empirical Complexity
0
1
2
3
108
Efficiency of Method M at e

M converges to the truth no matter what
For each convergent M that agrees with M up to
the end of e, and for each n
r(M, e, n) ? r(M, e, n)

M
M
. . .
. . .
Empirical Complexity
0
1
2
3
109
M is Beaten at e

There exists convergent M that agrees with M up
to the end of e, such that
For each n, r(M, e, n) ? r(M, e, n)
Exists n, r(M, e, n) gt r(M, e, n).

M
M
. . .
. . .
Empirical Complexity
0
1
2
3
110
Basic Idea

Ockham efficiency Nature can force arbitary,
convergent M to produce the successive answers
down an effect path arbitrarily late, so
stalwart, Ockham solutions are efficient.

111
Basic Idea

Unique Ockham efficiency A violator of Ockhams
razor or stalwartness can be forced into an extra
retraction or a late retraction in complexity
class zero at the time of the violation, so the
violator is beaten by each stalwart, Ockham
solution.

112
Ockham Efficiency Theorem

Let M be a solution. The following are
equivalent
M is always strongly Ockham and stalwart
M is always efficient
M is never weakly beaten.

113
Example Causal Inference

Effects are conditional statistical dependence
relations.

X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
. . .
. . .
114
Causal Discovery Ockhams Razor
X
Y
Z
W
115
Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W
116
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
117
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W
118
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W Z, X,Z
119
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W X, Z, X,Z
120
IV. Simplicity Defined
121
Approach

Empirical complexity reflects nested problems of
induction posed by the problem.
Hence, simplicity is problem-relative but
topologically invariant.

122
Empirical Problems

Set K of infinite input sequences.
Partition Q of K into alternative theories.

K
T1
T2
T3
123
Simplicity Concepts

A simplicity concept for (K, Q) is just a
well-founded order lt on a partition S of K with
ascending chains of order type not exceeding
omega such that
Each element of S is included in some answer in
Q.
Each downward union in (S, lt) is closed
Incomparable sets share no boundary point.
Each element of S is included in the boundary of
its successor.

124
Empirical Complexity Defined

Let Ke denote the set of all possibilities
compatible with observations e.
Let (S, lt) be a simplicity concept for (Ke, Q).
Define c(w, e) the length of the longest lt path
to the cell of S that contains w.
Define c(T, e) the least c(w, e) such that T is
true in w.

125
Applications

Polynomial laws complexity degree
Conservation laws complexity particle types
conserved quantities.
Causal networks complexity number of logically
independent conditional dependencies entailed by
faithfulness.

126
General Ockham Efficiency Theorem

Let M be a solution. The following are
equivalent
M is always strongly Ockham and stalwart
M is always efficient
M is never beaten.

127
Conclusions

Causal truths are necessary for counterfactual
predictions.
Ockhams razor is necessary for staying on the
straightest path to the true theory but does not
point at the true theory.
No evasions or circles are required.

128
Further Directions

Extension of unique efficiency theorem to mixed
strategies, stochastic model selection and
numerical computations.
Latent variables as Ockham conclusions.
Degrees of retraction.
Ockham pooling of marginal Ockham conclusions.
Retraction efficiency assessment of standard
model selection methods.

129
Suggested Reading

"Ockhams Razor, Truth, and Information", in
Handbook of the Philosophy of Information, J. van
Behthem and P. Adriaans, eds., to appear.
"Ockhams Razor, Empirical Complexity, and
Truth-finding Efficiency", Theoretical Computer
Science, 383 270-289, 2007.
Both available as pre-prints at
www.hss.cmu.edu/philosophy/faculty-kelly.php

Write a Comment

User Comments (0)

About PowerShow.com

Simplicity and Truth: an Alternative Explanation of Ockham's Razor - PowerPoint PPT Presentation

Simplicity and Truth: an Alternative Explanation of Ockham's Razor

Simplicity and Truth: an Alternative Explanation of Ockham's Razor Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation – PowerPoint PPT presentation