Simplicity and Truth: an Alternative Explanation of Ockham's Razor - PowerPoint PPT Presentation

About This Presentation
Title:

Simplicity and Truth: an Alternative Explanation of Ockham's Razor

Description:

Simplicity and Truth: an Alternative Explanation of Ockham's Razor Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 130
Provided by: KevinT60
Category:

less

Transcript and Presenter's Notes

Title: Simplicity and Truth: an Alternative Explanation of Ockham's Razor


1
Simplicity and Truth an Alternative Explanation
of Ockham's Razor
  • Kevin T. Kelly
  • Conor Mayo-Wilson
  • Department of Philosophy
  • Joint Program in Logic and Computation
  • Carnegie Mellon University
  • www.hss.cmu.edu/philosophy/faculty-kelly.php

2
I. The Simplicity Puzzle
3
Which Theory is Right?
???
4
Ockham Says
Choose the Simplest!
5
But Why?
Gotcha!
6
Puzzle
  • An indicator must be sensitive to what it
    indicates.

simple
7
Puzzle
  • A reliable indicator must be sensitive to what it
    indicates.

complex
8
Puzzle
  • But Ockhams razor always points at simplicity.

simple
9
Puzzle
  • But Ockhams razor always points at simplicity.

complex
10
Puzzle
  • How can a broken compass help you find something
    unless you already know where it is?

complex
11
Standard Accounts
  • 1. Prior Simplicity Bias
  • Bayes, BIC, MDL, MML, etc.
  • 2. Risk Minimization
  • SRM, AIC, cross-validation, etc.

12
1. Prior Simplicity Bias
The simple theory is more plausible now because
it was more plausible yesterday.
13
More Subtle Version
  • Simple data are a miracle in the complex theory
    but not in the simple theory.

Regularity retrograde motion of Venus at solar
conjunction
Has to be!
P
C
14
However
  • e would not be a miracle given P(q)

Why not this?
P
C
15
The Real Miracle
C
P
Ignorance about model p(C) ?
p(P) Ignorance about parameter setting
p(P(q) P) ? p(P(q ) P). Knowledge about
C vs. P(q) p(P(q)) ltlt p(C).
q
q
q
q
q
q
q
q
Lead into gold. Perpetual motion. Free lunch.
Ignorance is knowledge. War is peace. I love Big
Bayes.
16
Standard Paradox of Indifference
Ignorance of red vs. not-red Ignorance over
not-red Knowledge about red vs. white.
q
q
Knognorance All the priveleges of
knowledge With none of the responsibilities
Yeah!
17
The Ellsberg Paradox
1/3
?
?
18
Human Preference
1/3
?
?
b
a
b
gt
lt
19
Human View
1/3
?
?
knowledge
ignorance
b
a
b
gt
knowledge
ignorance
lt
20
Bayesian View
1/3
?
?
knognorance
knognorance
b
a
b
gt
knognorance
knognorance
gt
21
In Any Event
The coherentist foundations of Bayesianism have
nothing to do with short-run truth-conduciveness.
Not so loud!
22
Bayesian Convergence
  • Too-simple theories get shot down

Updated opinion
Theories
Complexity
23
Bayesian Convergence
  • Plausibility is transferred to the next-simplest
    theory

Updated opinion
Theories
Plink!
Complexity
Blam!
24
Bayesian Convergence
  • Plausibility is transferred to the next-simplest
    theory

Updated opinion
Theories
Plink!
Complexity
Blam!
25
Bayesian Convergence
  • Plausibility is transferred to the next-simplest
    theory

Updated opinion
Theories
Plink!
Complexity
Blam!
26
Bayesian Convergence
  • The true theory is nailed to the fence.

Updated opinion
Zing!
Theories
Complexity
Blam!
27
Convergence
  • But alternative strategies also converge
  • Anything in the short run is compatible with
    convergence in the long run.

28
Summary of Bayesian Approach
  • Prior-based explanations of Ockhams razor are
    circular and based on a faulty model of
    ignorance.
  • Convergence-based explanations of Ockhams razor
    fail to single out Ockhams razor.

29
2. Risk Minimization
  • Ockhams razor minimizes expected distance of
    empirical estimates from the true value.

Truth
30
Unconstrained Estimates
  • are Centered on truth but spread around it.

Pop! Pop! Pop! Pop!
Unconstrained aim
31

Constrained Estimates
  • Off-center but less spread.

Truth
Clamped aim
32
Constrained Estimates
  • Off-center but less spread
  • Overall improvement in expected distance from
    truth

Pop! Pop! Pop! Pop!
Truth
Clamped aim
33
Doesnt Find True Theory
  • The theory that minimizes estimation risk can be
    quite false.

Four eyes!
Clamped aim
34
Makes Sense
  • when loss of an answer is similar in nearby
    distributions.

Close is good enough!
Loss
Similarity
p
35
But Truth Matters
  • when loss of an answer is discontinuous with
    similarity.

Loss
Close is no cigar!
Similarity
p
36
E.g. Science
If you want true laws, false laws arent good
enough.
37
E.g. Science
You must be a philosopher. This is a machine
learning conference.
38
E.g., Causal Data Mining
Protein A
Protein C
Cancer protein
Protein B
Now youre talking! Im on a cilantro-only diet
to get my protein C level under control.
Practical enough?
39
Central Idea
  • Correlation does imply causation if there are
    multiple variables, some of which are common
    effects. Pearl, Spirtes, Glymour and Scheines

Protein A
Protein C
Cancer protein
Protein B
40
Core assumptions
  • Joint distribution p is causally compatible with
    directed, acyclic graph G iff
  • Causal Markov Condition each variable X is
    independent of its non-effects given its
    immediate causes.
  • Faithfulness Condition no other conditional
    independence relations hold in p.

41
Tell-tale Dependencies
C
C
H
F1
F
Given F, H gives some info about
C (Faithfulness)
Given C, F1 gives no further info about
F2 (Markov)
42
Common Applications
  • Linear Causal Case each variable X is a linear
    function of its parents and a normally
    distributed hidden variable called an error
    term. The error terms are mutually independent.
  • Discrete Multinomial Case each variable X takes
    on a finite range of values.

43
A Very Optimistic Assumption
  • No unobserved latent confounding causes

Ill give you this one.
Whats he up to?
44
Current Nutrition Wisdom
Protein A
Protein C
Cancer protein
Protein B
Are you kidding? Its dripping with Protein C!
English Breakfast?
45
As the Sample Increases
Protein A
Protein C
Cancer protein
weak
Protein B
Protein D
This situation approximates The last one. So who
cares?
I do! Out of my way!
46
As the Sample Increases Again
Protein E
Protein A
weak
Protein C
Cancer protein
weak
Protein B
weak
Protein D
Wasnt that last approximation to the truth good
enough?
Aaack! Im poisoned!
47
Causal Flipping Theorem
  • No matter what a consistent causal discovery
    procedure has seen so far, there exists a pair G,
    p satisfying the assumptions so that the current
    sample is arbitrarily likely and the procedure
    produces arbitrarily many opposite conclusions in
    p as sample size increases.

oops
I meant
oops
I meant
oops
I meant
48
The Wrong Reaction
  • The demon undermines justification of science.
  • He must be defeated to forestall skepticism.
  • Bayesian circularity
  • Classical instrumentalism

Urk!
Grrrr!
49
Another View
  • Many explanations have been offered to make sense
    of the here-today-gone-tomorrow nature of medical
    wisdom what we are advised with confidence one
    year is reversed the next but the simplest one
    is that it is the natural rhythm of science.
  • (Do We Really Know What Makes us Healthy, NY
    Times Magazine, Sept. 16, 2007).

50
Zen Approach
  • Get to know the demon.
  • Locate the justification of Ockhams razor in his
    power.

51
Connections to the Truth
  • Short-run Reliability
  • Too strong to be feasible when theory matters.
  • Long-run Convergence
  • Too weak to single out Ockhams razor

Complex
Simple
Simple
Complex
52
Middle Path
  • Short-run Reliability
  • Too strong to be feasible when theory matters.
  • Straightest convergence
  • Just right?
  • Long-run Convergence
  • Too weak to single out Ockhams razor

Complex
Simple
53
II. Navigation by Broken Compass
simple
54
Asking for Directions
Wheres
55
Asking for Directions
Turn around. The freeway ramp is on the left.
56
Asking for Directions
57
Best Route
58
Best Route to Any Goal
59
Disregarding Advice is Bad
Extra U-turn
60
Best Route to Any Goal
so fixed advice can help you reach a hidden
goal without circles, evasions, or magic.
61
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
62
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
63
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
64
In Step with the Demon
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
65
Ahead of Mother Nature
There yet?
Maybe.
Cubic
Quadratic
Linear
Constant
66
Ahead of Mother Nature
I know youre coming!
Cubic
Quadratic
Linear
Constant
67
Ahead of Mother Nature
Maybe.
Cubic
Quadratic
Linear
Constant
68
Ahead of Mother Nature
!!!
Hmm, its quite nice here
Cubic
Quadratic
Linear
Constant
69
Ahead of Mother Nature
Youre back! Learned your lesson?
Cubic
Quadratic
Linear
Constant
70
Ockham Violators Path
See, you shouldnt run ahead Even if you are
right!
Cubic
Quadratic
Linear
Constant
71
Ockham Path
Cubic
Quadratic
Linear
Constant
72
Empirical Problems
  • Set K of infinite input sequences.
  • Partition of K into alternative theories.

K
T1
T2
T3
73
Empirical Methods
  • Map finite input sequences to theories or to ?.

T3
K
T1
T2
T3
e
74
Method Choice
Output history
At each stage, scientist can choose a new method
(agreeing with past theory choices).
T1
T2
T3
e1
e2
e3
e4
Input history
75
Aim Converge to the Truth
T3
?
T2
?
T1
T1
T1
T1
. . .
T1
T1
T1
K
T1
T2
T3
76
Retraction
  • Choosing T and then not choosing T next

T
T
?
77
Aim Eliminate Needless Retractions
Truth
78
Aim Eliminate Needless Retractions
Truth
79
Ancient Roots
"Living in the midst of ignorance and considering
themselves intelligent and enlightened, the
senseless people go round and round, following
crooked courses, just like the blind led by the
blind." Katha Upanishad, I. ii. 5, c. 600 BCE.
80
Aim Eliminate Needless Delays to Retractions
theory
81
Aim Eliminate Needless Delays to Retractions
application
theory
application
application
application
corollary
application
application
application
corollary
application
corollary
82
Why Timed Retractions?
Retraction minimization generalized
significance level. Retraction time minimization
generalized power.
83
Easy Retraction Time Comparisons
Method 1
T1
T1
T2
T2
T2
T2
T4
T4
T4
. . .
T1
T1
T2
T2
T3
T3
T2
T4
T4
. . .
at least as many at least as late
Method 2
84
Worst-case Retraction Time Bounds
(1, 2, 8)
. . .
. . .
. . .
T1
T2
T3
T3
T3
T4
T3
. . .
T1
T2
T3
T3
T3
T4
T4
. . .
T1
T2
T3
T3
T4
T4
T4
. . .
T1
T2
T4
T3
T4
T4
T4
. . .
Output sequences
85
IV. Ockham Without Circles, Evasions, or Magic
86
Curve Fitting
  • Data open intervals around Y at rational values
    of X.

87
Curve Fitting
  • No effects

88
Curve Fitting
  • First-order effect

89
Curve Fitting
  • Second-order effect

90
Empirical Effects
91
Empirical Effects
92
Empirical Effects
May take arbitrarily long to discover
93
Empirical Effects
May take arbitrarily long to discover
94
Empirical Effects
May take arbitrarily long to discover
95
Empirical Effects
May take arbitrarily long to discover
96
Empirical Effects
May take arbitrarily long to discover
97
Empirical Effects
May take arbitrarily long to discover
98
Empirical Effects
May take arbitrarily long to discover
99
Empirical Theories
  • True theory determined by which effects appear.

100
Empirical Complexity
More complex
101
Background Constraints
More complex
102
Background Constraints
More complex
103
Ockhams Razor
  • Dont select a theory unless it is uniquely
    simplest in light of experience.

104
Weak Ockhams Razor
  • Dont select a theory unless it among the
    simplest in light of experience.

105
Stalwartness
  • Dont retract your answer while it is uniquely
    simplest

106
Stalwartness
  • Dont retract your answer while it is uniquely
    simplest

107
Timed Retraction Bounds
  • r(M, e, n) the least timed retraction bound
    covering the total timed retractions of M along
    input streams of complexity n that extend e

M
. . .
. . .
Empirical Complexity
0
1
2
3
108
Efficiency of Method M at e
  • M converges to the truth no matter what
  • For each convergent M that agrees with M up to
    the end of e, and for each n
  • r(M, e, n) ? r(M, e, n)

M
M
. . .
. . .
Empirical Complexity
0
1
2
3
109
M is Beaten at e
  • There exists convergent M that agrees with M up
    to the end of e, such that
  • For each n, r(M, e, n) ? r(M, e, n)
  • Exists n, r(M, e, n) gt r(M, e, n).

M
M
. . .
. . .
Empirical Complexity
0
1
2
3
110
Basic Idea
  • Ockham efficiency Nature can force arbitary,
    convergent M to produce the successive answers
    down an effect path arbitrarily late, so
    stalwart, Ockham solutions are efficient.

111
Basic Idea
  • Unique Ockham efficiency A violator of Ockhams
    razor or stalwartness can be forced into an extra
    retraction or a late retraction in complexity
    class zero at the time of the violation, so the
    violator is beaten by each stalwart, Ockham
    solution.

112
Ockham Efficiency Theorem
  • Let M be a solution. The following are
    equivalent
  • M is always strongly Ockham and stalwart
  • M is always efficient
  • M is never weakly beaten.

113
Example Causal Inference
  • Effects are conditional statistical dependence
    relations.

X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
. . .
. . .
114
Causal Discovery Ockhams Razor
X
Y
Z
W
115
Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W
116
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
117
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W
118
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W Z, X,Z
119
Causal Discovery Ockhams Razor
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W X, Z, X,Z
120
IV. Simplicity Defined
121
Approach
  • Empirical complexity reflects nested problems of
    induction posed by the problem.
  • Hence, simplicity is problem-relative but
    topologically invariant.

122
Empirical Problems
  • Set K of infinite input sequences.
  • Partition Q of K into alternative theories.

K
T1
T2
T3
123
Simplicity Concepts
  • A simplicity concept for (K, Q) is just a
    well-founded order lt on a partition S of K with
    ascending chains of order type not exceeding
    omega such that
  • Each element of S is included in some answer in
    Q.
  • Each downward union in (S, lt) is closed
  • Incomparable sets share no boundary point.
  • Each element of S is included in the boundary of
    its successor.

124
Empirical Complexity Defined
  • Let Ke denote the set of all possibilities
    compatible with observations e.
  • Let (S, lt) be a simplicity concept for (Ke, Q).
  • Define c(w, e) the length of the longest lt path
    to the cell of S that contains w.
  • Define c(T, e) the least c(w, e) such that T is
    true in w.

125
Applications
  • Polynomial laws complexity degree
  • Conservation laws complexity particle types
    conserved quantities.
  • Causal networks complexity number of logically
    independent conditional dependencies entailed by
    faithfulness.

126
General Ockham Efficiency Theorem
  • Let M be a solution. The following are
    equivalent
  • M is always strongly Ockham and stalwart
  • M is always efficient
  • M is never beaten.

127
Conclusions
  • Causal truths are necessary for counterfactual
    predictions.
  • Ockhams razor is necessary for staying on the
    straightest path to the true theory but does not
    point at the true theory.
  • No evasions or circles are required.

128
Further Directions
  • Extension of unique efficiency theorem to mixed
    strategies, stochastic model selection and
    numerical computations.
  • Latent variables as Ockham conclusions.
  • Degrees of retraction.
  • Ockham pooling of marginal Ockham conclusions.
  • Retraction efficiency assessment of standard
    model selection methods.

129
Suggested Reading
  • "Ockhams Razor, Truth, and Information", in
    Handbook of the Philosophy of Information, J. van
    Behthem and P. Adriaans, eds., to appear.
  • "Ockhams Razor, Empirical Complexity, and
    Truth-finding Efficiency", Theoretical Computer
    Science, 383 270-289, 2007.
  • Both available as pre-prints at
    www.hss.cmu.edu/philosophy/faculty-kelly.php
Write a Comment
User Comments (0)
About PowerShow.com