Title: Computational Discovery of Communicable Knowledge
1The Computational Discovery of Communicable
Knowledge
Pat Langley Computational Learning
Laboratory Center for the Study of Language and
Information Stanford University, Stanford, CA
94304 http//hypatia.stanford.edu/cll/ langley_at_csl
i.stanford.edu
Also affiliated with the DaimlerChrysler Research
Technology Center and the Institute for the
Study of Learning and Expertise.
2The Problem and the Potential
Our society is collecting increasing amounts of
data in commercial and scientific domains. These
include complex spatial/temporal data sets like
- traces of traffic behavior from GPS and cell
phones - prices of stocks and currencies from exchanges
- measurements of climate and ecosystem variables
Computational techniques should let us find
relations in these data that are useful for
business and society.
3Drawbacks of Current Approaches
The fields of machine learning and data mining
have developed methods to find regularities in
data. Despite many successful applications, most
techniques
- assume attribute-value representations that
cannot handle time or space - cannot tell interesting discoveries from mundane
ones - state the discovered knowledge in some opaque form
This indicates the need for alternative methods
that can address these issues.
4Paradigms for Machine Learning
decision-tree induction
induction of logical rules
case-based learning
neural networks
probabilistic induction
5Paradigms for Scientific Discovery
taxonomy formation
qualitative law discovery
equation discovery
structural model construction
process model formation
6Discovering Numeric Laws
- Statement of the task
- Given Quantitative measurements about objects or
events in the world. - Find Numeric relations that hold among variables
that describe these items and that predict future
behavior.
- Historical examples
- Keplers three laws of planetary motion
- Archimedes principle of displacement in water
- Blacks law relating specific heat, mass, and
temperature - Prousts and Gay-Lussacs laws of definite
proportions
7BACON on Keplers Third Law
BACON carries out heuristic search through a
space of numeric terms, looking for constant
values and linear relations.
This example shows the systems progression from
primitive variables (distance and period of
Jupiters moons) to a complex term that has a
nearly constant value.
8Some Laws Discovered by BACON
- Basic numeric relations
- Ideal gas law PV aNT bN
- Keplers third law D3 (A - k) / t2 j
- Coulombs law FD2 / Q1Q2 c
- Ohms law TD2 / (LI - rI) r
- Relations with intrinsic properties
- Snells law of refraction sin I / sin R n1 /
n2 - Archimedes law C V i
- Momentum conservation m1V1 m2V2
- Blacks specific heat law c1m1T1 c2m2T2
(c1m1 c2m2 ) Tf
9Temporal Laws of Ecological Behavior (Todorovski
Dzeroski, 1997)
Input
time phyt zoo phosp
temp
time 1 phyt 1 zoo 1 phosp 1
temp 1
time 2 phyt 2 zoo 2 phosp 2
temp 2
. . . .
.
. . . .
.
time m phyt m zoo m phosp m
temp m
Input a context-free grammar of domain
constraints
10Formulating Structural Models
- Statement of the task
- Given Qualitative or numeric empirical laws that
describe observed phenomena. - Find Explanatory models of these phenomena in
terms of component objects and their relations.
- Historical examples
- Daltons and Avogadros molecular models of
chemicals - Mendels genetic model of inherited traits
- Quark models of elementary particles
- Structural models of planets, comets, and stars
11DALTON on Chemical Reactions
Initial state (reacts in hydrogen oxygen
out water)(reacts in hydrogen nitrogen
out ammonia)(reacts in oxygen nitrogen
out nitrous oxide) . . .
Final state 2 hydrogen 1 oxygen ? 2
water3 hydrogen 1 nitrogen ? 2 ammonia2
oxygen 1 nitrogen ? 2 nitrous
oxidehydrogen ? h h water ? h h o
oxygen ? h h ammonia ? h h h n
nitrogen ? h h nitrous oxide ? n o
o . . .
DALTON finds these structural models through a
depth-first search process constrained by
conservation assumptions.
12Constructing Process Models
- Statement of the task
- Given Qualitative or numeric empirical laws that
describe temporal phenomena. - Find Explanatory models of these phenomena in
terms of processes among component objects.
- Historical examples
- Caloric and kinetic theories of heat phenomena
- Reaction pathways in chemistry and
nucleosynthesis - Models of continental drift and plate tectonics
- Process models of stellar evolution and
destruction
13ASTRA on Nucleosynthesis
Inputs - quantum properties for elements and
isotopes- conservation relations among these
properties- an element to be explained (e.g., O
or C)- elements to be assumed (e.g., H or He)
Outputs - elementary reactions that obey
conservation laws - reaction pathways that
explain the elements evolution
ASTRA uses depth-first search to find reaction
pathways for- proton and neutron captures -
neutron and deuteron production- generation of
helium (He) from hydrogen (H)- generation of
carbon (C) and oxygen (O)
14Three Pathways for Carbon Synthesis
Standard pathway 4He 4He ? 8Be4He 8Be
? 12C
Alternative pathways 4He D ? 6Li3He
6Li ? 9Be 4He 9Be ? 12C n 4He
D ? 6Li4He 6Li ? 10Be 4He 10Be ?
12C D
ASTRA generates many pathways novel to
astrophysics, some of which have viable reaction
rates.
15Proposed Research
We plan to develop and evaluate discovery methods
that
- are designed to process temporal and structured
data - use techniques from computational scientific
discovery - describe new knowledge in a communicable form
Likely notations for the discovered knowledge
include
- structural models of relations among entities
- process models of change over time
- sets of simultaneous differential equations
We will apply our methods to domains that benefit
from such communicable representations.
16Benefits of the Approach
Unlike most previous work on data mining and
knowledge discovery, our methods will
- support discoveries in domains that involve
complex spatial, temporal, or relational data - use domain knowledge to filter only discoveries
that are interesting and novel to the domain user - present the new knowledge in some understandable
notation that can be communicated among humans
Such techniques will improve the way we
manipulate and understand complex data.