Truth-conduciveness Without Reliability: A Non-Theological Explanation of Ockham - PowerPoint PPT Presentation

About This Presentation
Title:

Truth-conduciveness Without Reliability: A Non-Theological Explanation of Ockham

Description:

A Non-Theological Explanation of Ockham's Razor. Kevin T. Kelly. Department of Philosophy ... the truth has these virtues is wishful thinking. [van Fraassen] ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 166
Provided by: kevint84
Category:

less

Transcript and Presenter's Notes

Title: Truth-conduciveness Without Reliability: A Non-Theological Explanation of Ockham


1
Truth-conduciveness Without Reliability A
Non-Theological Explanation of Ockhams Razor
  • Kevin T. Kelly
  • Department of Philosophy
  • Carnegie Mellon University
  • www.cmu.edu

2
I. The Puzzle
3
Which Theory is True?
???
4
Ockham Says
Choose the Simplest!
5
But Why?
Gotcha!
6
Puzzle
  • An indicator must be sensitive to what it
    indicates.

simple
7
Puzzle
  • An indicator must be sensitive to what it
    indicates.

complex
8
Puzzle
  • But Ockhams razor always points at simplicity.

simple
9
Puzzle
  • But Ockhams razor always points at simplicity.

complex
10
Puzzle
  • If a broken compass is known to point North, then
    we already know where North is.

complex
11
Puzzle
  • But then who needs the compass?

complex
12
Proposed Answers
  • Evasive
  • Circular
  • Magical

13
A. Evasions
Truth
14
A. Evasions
Brevity
Testability
Unity
Explanation
Truth
15
Virtues
  • Simple theories have virtues
  • Testable
  • Unified
  • Explanatory
  • Symmetrical
  • Bold
  • Compress data

16
Virtues
  • Simple theories have virtues
  • Testable
  • Unified
  • Explanatory
  • Symmetrical
  • Bold
  • Compress data
  • But to assume that the truth has these virtues is
    wishful thinking.

van Fraassen
17
Convergence
  • At least a simplicity bias doesnt prevent
    convergence to the truth.

truth
Complexity
18
Convergence
  • At least a simplicity bias doesnt prevent
    convergence to the truth.

truth
Plink!
Blam!
Complexity
19
Convergence
  • At least a simplicity bias doesnt prevent
    convergence to the truth.

truth
Plink!
Blam!
Complexity
20
Convergence
  • At least a simplicity bias doesnt prevent
    convergence to the truth.

truth
Plink!
Blam!
Complexity
21
Convergence
  • Convergence allows for any theory choice whatever
    in the short run, so this is not an argument for
    Ockhams razor now.

truth
Alternative ranking
22
Overfitting
  • Empirical estimates based on complex models have
    greater expected distance from the truth

Truth
23
Overfitting
  • Empirical estimates based on complex models have
    greater expected distance from the truth.

Pop! Pop! Pop! Pop!
24
Overfitting
  • Empirical estimates based on complex models have
    greater expected distance from the truth.

Truth
clamp
25
Overfitting
  • Empirical estimates based on complex models have
    greater expected distance from the truth.

Pop! Pop! Pop! Pop!
Truth
clamp
26
Overfitting
  • ...even if the simple theory is known to be false

Four eyes!
clamp
27
C. Circles
28
Prior Probability
  • Assign high prior probability to simple theories.

Simplicity is plausible now because it was
yesterday.
29
Miracle Argument
  • e would not be a miracle given C
  • e would be a miracle given P.

q
P
C
30
Miracle Argument
  • e would not be a miracle given C
  • e would be a miracle given P.

q
C
S
31
However
  • e would not be a miracle given P(q)

Why not this?
q
C
S
32
The Real Miracle
Ignorance about model p(C) ?
p(P) Ignorance about parameter setting
p(P(q) P) ? p(P(q ) P). Knowledge about
C vs. P(q) p(P(q)) ltlt p(C).
Is it knognorance or Ignoredge?
33
The Ellsberg Paradox
1/3
?
?
34
The Ellsberg Paradox
1/3
?
?
gt 1/3
Human betting preferences
gt
35
The Ellsberg Paradox
1/3
?
?
gt 1/3
lt 1/3
Human betting preferences
gt
gt
36
Human View
knowledge
ignorance
1/3
?
?
Human betting preferences
gt
gt
37
Bayesian View
ignoredge
ignoredge
1/3
1/3
1/3
Human betting preferences
gt
gt
38
Moral
1/3
?
?
Even in the most mundane contexts, when Bayesians
offer to replace our ignorance with ignoredge, we
vote with our feet.
39
Probable Tracking
  • If the simple theory S were true, then the data
    would probably be simple so Ockhams razor would
    probably believe S.
  • If the simple theory S were false, then the
    complex alternative theory C would be true, so
    the data would probably be complex so you would
    probably believe C rather than S.

40
Probable Tracking
Given that you use Ockhams razor p(B(S) S)
p(eS S) 1. p(not-B(S) not-S) 1 - p(eS
C) 1.
41
Probable Tracking
Given that you use Ockhams razor p(B(C) C)
1 probability that the data look simple given
C. p(B(C) not-C) 0 probability that the
data look simple given alternative theory P.
42
B. Magic
Truth
Simplicity
43
Magic
  • Simplicity informs via hidden causes.

G
44
Magic
  • Simpler to explain Ockhams razor without hidden
    causes.

?
45
Reductio of Naturalism (Koons 2000)
  • Suppose that the crucial probabilities p(Tq T)
    in the Bayesian miracle argument are natural
    chances, so that Ockhams razor really is
    reliable.
  • Suppose that T is the fundamental theory of
    natural chance, so that Tq determines the true pq
    for some choice of q.
  • But if pt(Tq) is defined at all, it should be 1
    if t q and 0 otherwise.
  • So natural science can only produce fundamental
    knowledge of natural chance if there are
    non-natural chances.

46
Diagnosis
  • Indication or tracking
  • Too strong
  • Circles, evasions, or magic required.
  • Convergence
  • Too weak
  • Doesnt single out simplicity

Complex
Simple
Simple
Complex
47
Diagnosis
  • Indication or tracking
  • Too strong
  • Circles or magic required.
  • Convergence
  • Too weak
  • Doesnt single out simplicity
  • Straightest convergence
  • Just right?

Complex
Simple
Simple
Complex
Complex
Simple
48
II. Straightest Convergence
Complex
Simple
49
Empirical Problems
  • Set K of infinite input sequences.
  • Partition of K into alternative theories.

K
T1
T2
T3
50
Empirical Methods
  • Map finite input sequences to theories or to ?.

T3
K
T1
T2
T3
e
51
Method Choice
Output history
At each stage, scientist can choose a new method
(agreeing with past theory choices).
T1
T2
T3
e1
e2
e3
e4
Input history
52
Aim Converge to the Truth
T3
?
T2
?
T1
T1
T1
T1
. . .
T1
T1
T1
K
T1
T2
T3
53
Retraction
  • Choosing T and then not choosing T next

T
T
?
54
Aim Eliminate Needless Retractions
Truth
55
Aim Eliminate Needless Retractions
Truth
56
Aim Eliminate Needless Delays to Retractions
theory
57
Aim Eliminate Needless Delays to Retractions
application
theory
application
application
application
corollary
application
application
application
corollary
application
corollary
58
Easy Retraction Time Comparisons
Method 1
T1
T1
T2
T2
T2
T2
T4
T4
T4
. . .
T1
T1
T2
T2
T3
T3
T2
T4
T4
. . .
at least as many at least as late
Method 2
59
Worst-case Retraction Time Bounds
(1, 2, 8)
. . .
. . .
. . .
T1
T2
T3
T3
T3
T4
T3
. . .
T1
T2
T3
T3
T3
T4
T4
. . .
T1
T2
T3
T3
T4
T4
T4
. . .
T1
T2
T4
T3
T4
T4
T4
. . .
Output sequences
60
II. Ockham Without Circles, Evasions, or Magic
61
Curve Fitting
  • Data open intervals around Y at rational values
    of X.

62
Curve Fitting
  • No effects

63
Curve Fitting
  • First-order effect

64
Curve Fitting
  • Second-order effect

65
Empirical Effects
66
Empirical Effects
67
Empirical Effects
May take arbitrarily long to discover
68
Empirical Effects
May take arbitrarily long to discover
69
Empirical Effects
May take arbitrarily long to discover
70
Empirical Effects
May take arbitrarily long to discover
71
Empirical Effects
May take arbitrarily long to discover
72
Empirical Effects
May take arbitrarily long to discover
73
Empirical Effects
May take arbitrarily long to discover
74
Empirical Theories
  • True theory determined by which effects appear.

75
Empirical Complexity
More complex
76
Background Constraints
More complex
77
Background Constraints
?
More complex
78
Background Constraints
?
More complex
79
Ockhams Razor
  • Dont select a theory unless it is uniquely
    simplest in light of experience.

80
Weak Ockhams Razor
  • Dont select a theory unless it among the
    simplest in light of experience.

81
Stalwartness
  • Dont retract your answer while it is uniquely
    simplest

82
Stalwartness
  • Dont retract your answer while it is uniquely
    simplest

83
Uniform Problems
  • All paths of accumulating effects starting at a
    level have the same length.

84
Timed Retraction Bounds
  • r(M, e, n) the least timed retraction bound
    covering the total timed retractions of M along
    input streams of complexity n that extend e

M
. . .
. . .
Empirical Complexity
0
1
2
3
85
Efficiency of Method M at e
  • M converges to the truth no matter what
  • For each convergent M that agrees with M up to
    the end of e, and for each n
  • r(M, e, n) ? r(M, e, n)

M
M
. . .
. . .
Empirical Complexity
0
1
2
3
86
M is Strongly Beaten at e
  • There exists convergent M that agrees with M up
    to the end of e, such that
  • For each n, r(M, e, n) gt r(M, e, n).

M
M
. . .
. . .
Empirical Complexity
0
1
2
3
87
M is Weakly Beaten at e
  • There exists convergent M that agrees with M up
    to the end of e, such that
  • For each n, r(M, e, n) ? r(M, e, n)
  • Exists n, r(M, e, n) gt r(M, e, n).

M
M
. . .
. . .
Empirical Complexity
0
1
2
3
88
Idea
  • No matter what convergent M has done in the past,
    nature can force M to produce each answer down an
    arbitrary effect path, arbitrarily often.
  • Nature can also force violators of Ockhams razor
    or stalwartness either into an extra retraction
    or a late retraction in each complexity class.

89
Ockham Violation with Retraction
Extra retraction in each complexity class
Ockham violation
90
Ockham Violation without Retraction
Late retraction in each complexity class
Ockham violation
91
Uniform Ockham Efficiency Theorem
  • Let M be a solution to a uniform problem. The
    following are equivalent
  • M is strongly Ockham and stalwart at e
  • M is efficient at e
  • M is not strongly beaten at e.

92
Idea
  • Similar, but if convergent M already violates
    strong Ockhams razor by favoring an answer T at
    the root of a longer path, sticking with T may
    reduce retractions in complexity classes reached
    only along the longer path.

93
Violation Favoring Shorter Path
Non-uniform problem
?
Late or extra retraction in each complexity class
Ockham violation
94
Violation Favoring Longer Path without Retraction
Non-uniform problem
?
Ouch! Extra retraction in each complexity class!
Ockham violation
95
But at First Violation
Non-uniform problem
?
?
Breaks even each class.
?
First Ockham violation
96
But at First Violation
Non-uniform problem
?
?
Breaks even each class.
?
Loses in class 0 when truth is red.
First Ockham violation
97
Ockham Efficiency Theorem
  • Let M be a solution. The following are
    equivalent
  • M is always strongly Ockham and stalwart
  • M is always efficient
  • M is never weakly beaten.

98
Application Causal Inference
  • Causal graph theory more correlations ? more
    causes.
  • Idealized data list of conditional dependencies
    discovered so far.
  • Anomaly the addition of a conditional
    dependency to the list.

partial correlations
S
G(S)
99
Causal Path Rule
  • X, Y are dependent conditional on set S of
    variables not containing X, Y iff X, Y are
    connected by at least one path in which
  • no non-collider is in S and
  • each collider has a descendent in S.

X
Y
S
Pearl, SGS
100
Forcible Sequence of Models
X
Y
Z
W
101
Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W
102
Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, Y,W
103
Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W
104
Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W Z, X,Z
105
Forcible Sequence of Models
X
Y
Z
W
X dep Y Z, W, Z,W Y dep Z X, W,
X,W X dep Z Y, W, Y,W Z dep W X,
Y, X,Y Y dep W X, Z, X,Z
106
Policy Prediction
  • Consistent policy estimator can be forced into
    retractions.
  • Failure of uniform consistency.
  • No non-trivial confidence interval.

Y
Z
Y
Z
Y
Z
Robins, Wasserman, Zhang
Y
Z
107
Moral
  • Not true model vs. prediction.
  • Issue actual vs. counterfactual model selection
    and prediction.
  • In counterfactual prediction, form of model
    matters and retractions are unavoidable.

Y
Z
Y
Z
Y
Z
Y
Z
108
IV. Simplicity
109
Aim
  • General definition of simplicity.
  • Prove Ockham efficiency theorem for general
    definition.

110
Approach
  • Empirical complexity reflects nested problems of
    induction posed by the problem.
  • Hence, simplicity is problem-relative.

111
Empirical Problems
  • Set K of infinite input sequences.
  • Partition of K into alternative theories.

K
T1
T2
T3
112
Grove Systems
  • A sphere system for K is just a downward-nested
    sequence of subsets of K starting with K.

K
2
1
0
113
Grove Systems
  • Think of successive differences as levels of
    increasing empirical complexity in K.

2
1
0
114
Answer-preserving Grove Systems
  • No answer is split across levels.

2
1
0
115
Answer-preserving Grove Systems
  • Refine offending answer if necessary.

2
1
0
116
Data-driven Grove Systems
  • Each answer is decidable given a complexity
    level.
  • Each upward union of levels is verifiable.

Verifiable
Decidable
Decidable
117
Grove System Update
  • Update by restriction.

118
Grove System Update
  • Update by restriction

1
0
119
Forcible Grove Systems
  • At each stage, the data presented by a world at a
    level are compatible with the next level up (if
    there is a next level).

. . .
120
Forcible Path
  • A forcible restriction of a Grove system.

121
Forcible Path to Top
  • A forcible restriction of a Grove system that
    intersects with every level.

122
Simplicity Concept
  • A data-driven, answer-preserving Grove system for
    which each restriction to a possible data event
    has a forcible path to the top.

123
Uniform Simplicity Concepts
  • If a data event intersects a level, it intersects
    each higher level.

124
Uniform Ockham Efficiency Theorem
  • Let M be a solution to a uniform problem. The
    following are equivalent
  • M is strongly Ockham and stalwart at e
  • M is efficient at e
  • M is strongly beaten at e.

125
Ockham Efficiency Theorem
  • Let M be a solution. The following are
    equivalent
  • M is always strongly Ockham and stalwart
  • M is always efficient
  • M is never weakly beaten.

126
V. Stochastic Ockham

127
Mixed Strategies
  • Require that the strategy converge in chance to
    the true model.

Chance of producing true model at parameter q
. . .
Sample size
128
Retractions in Chance
  • Total drop in chance of producing an arbitrary
    answer as sample size increases.
  • Retraction in signal, not actual retractions due
    to noise.

Chance of producing true model at parameter q
. . .
Sample size
129
Ockham Efficiency
  • Bound retractions in chance by easy comparisons
    of time and magnitude.
  • Ockham efficiency still follows.

(0, 0, .5, 0, 0, 0, .5, 0, 0, )
Chance of producing true model at parameter q
. . .
Sample size
130
Classification Problems
  • Points from plane sampled IID, labeled with
    half-plane membership. Edge of half-plane is
    some polynomial. What is its degree?
  • Uniform Ockham efficiency theorem applies.

Cosma Shalizi
131
Model Selection Problems
  • Random variables.
  • IID sampling.
  • Joint distribution continuously parametrized.
  • Partition over parameter space.
  • Each partition cell is a model.
  • Method maps sample sequences to models.

132
Two Dimensional Example
  • Assume independent bivariate normal distribution
    of unit variance.
  • Question how many components of the joint mean
    are zero?
  • Intuition more nonzeros more complex
  • Puzzle How does it help to favor simplicity in
    less-than-simplest worlds?

133
A Standard Model Selection Method
  • Bayes Information Criterion (BIC)
  • BIC(M, sample)
  • - log(max prob that M can assign to sample)
  • log(sample size) ?? model complexity ? ½.
  • BIC method choose M with least BIC score.

134
Official BIC Property
  • In the limit, minimizing BIC finds a model with
    maximal conditional probability when the prior
    probability is flat over models and fairly flat
    over parameters within a model.
  • But it is also mind-change-efficient.

135
Toy Problem
  • Truth is bivariate normal of known covariance.
  • Count non-zero components of mean vector.

136
Pure Method
  • Acceptance zones for different answers in sample
    mean space.

Simple
Complex
137
Performance in Simplest World
  • n 2
  • m (0, 0).
  • Retractions 0

Simple
Complex
95
138
Performance in Simplest World
  • n 2
  • m (0, 0).
  • Retractions 0

Simple
Complex
139
Performance in Simplest World
  • n 100
  • m (0, 0).
  • Retractions 0

Simple
Complex
140
Performance in Simplest World
  • n 4,000,000
  • m (0, 0).
  • Retractions 0

Simple
Complex
141
Performance in Simplest World
  • n 20,000,000
  • m (0, 0).
  • Retractions 0

Simple
Complex
142
Performance in Complex World
  • n 2
  • m (.05, .005).
  • Retractions 0

Simple
Complex
95
143
Performance in Complex World
  • n 100
  • m (.05, .005).
  • Retractions 0

Simple
Complex
144
Performance in Complex World
  • n 30,000
  • m (.05, .005).
  • Retractions 1

Simple
Complex
145
Performance in Complex World
  • n 4,000,000 (!)
  • m (.05, .005).
  • Retractions 2

Simple
Complex
146
Causal Inference from Stochastic Data
  • Suppose that the true linear causal model is

Variables are standard normal
.998
X
Y
Z
W
.1
.99
-.99
147
Causal Inference from Stochastic Data
Scheines, Mayo-Wilson, and Fancsali
Sample size 40. In 9 out of 10 samples, PC
algorithm outputs
X
Y
Z
W
Sample size 100,000. In 9 out of 10 samples, PC
outputs truth
Variables standard normal
X
Y
Z
W
148
Deterministic Sub-problems
Membership Degree 1
w
Membership degree 0
n
  • Worst-case cost at w
  • supw mem(w, w) X cost(w)
  • Worst-case cost supw worst-case cost at w.

149
Statistical Sub-problems
Membership(p, p) 1 r(p, p)
p
p
p
  • Worst-case cost at p
  • supw mem(p, p) X cost(p)
  • Worst-case cost supp worst-case cost at p.

150
Future Direction
  • a-Consistency Converge to production of true
    answer with chance gt 1 - a.
  • Compare worst-case timed bounds on retractions in
    chance of a-consistent methods over each
    complexity class.
  • Generalized power minimizing retraction time
    forces simple acceptance zones to be powerful.
  • Generalized significance minimizing retractions
    forces simple zone to be size a
  • Balance balance depends on a.

151
V. Conclusion
152
Ockhams Razor
  • Necessary for staying on the straightest path to
    the truth
  • Does not point at or indicate the truth.
  • Works without circles, evasions, or magic.
  • Such a theory is motivated in counterfactual
    inference and estimation.

153
Further Reading
(with C. Glymour) Why Probability Does Not
Capture the Logic of Scientific Justification,
C. Hitchcock, ed., Contemporary Debates in the
Philosophy of Science, Oxford Blackwell,
2004. Justification as Truth-finding Efficiency
How Ockham's Razor Works, Minds and Machines 14
2004, pp. 485-505. Ockham's Razor, Efficiency,
and the Unending Game of Science, forthcoming in
proceedings, Foundations of the Formal Sciences
2004 Infinite Game Theory, Springer, under
review. How Simplicity Helps You Find the Truth
Without Pointing at it, forthcoming, V.
Harazinov, M. Friend, and N. Goethe,
eds. Philosophy of Mathematics and Induction,
Dordrecht Springer. Ockham's Razor, Empirical
Complexity, and Truth-finding Efficiency,
forthcoming, Theoretical Computer
Science. Learning, Simplicity, Truth, and
Misinformation, forthcoming inVan Benthem, J.
and Adriaans, P., eds. Philosophy of Information.
154
II. Navigation Without a Compass
155
Asking for Directions
Wheres
156
Asking for Directions
Turn around. The freeway ramp is on the left.
157
Asking for Directions
158
Helpful Advice
159
Best Route
160
Best Route to Any Goal
161
Disregarding Advice is Bad
Extra U-turn
162
Best Route to Any Goal
so fixed advice can help you reach a hidden
goal without circles, evasions, or magic.
163
  • There is no difference whatsoever in It. He goes
    from death to death, who sees difference, as it
    were, in It Brihadaranyaka 4.4.19-20
  • "Living in the midst of ignorance and considering
    themselves intelligent and enlightened, the
    senseless people go round and round, following
    crooked courses, just like the blind led by the
    blind." Katha Upanishad I. ii. 5.

164
Academic
165
Academic
Poof!
If there werent an apple on the table I wouldnt
be a brain in a vat, so I wouldnt see one.
Write a Comment
User Comments (0)
About PowerShow.com