Computational Discovery of Communicable Knowledge - PowerPoint PPT Presentation

About This Presentation

Title:

Computational Discovery of Communicable Knowledge

Description:

This research was funded in part by NTT Communication Science Laboratories ... induction of differential equation models, though without a process formalism. ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 21

Provided by: Lang8

Learn more at: http://www.isle.org

Category:

more less

Transcript and Presenter's Notes

Title: Computational Discovery of Communicable Knowledge

1
Robust Induction of Process Models from
Time-Series Data
Pat Langley Dileep George Stephen
Bay Computational Learning Laboratory Center for
the Study of Language and Information Stanford
University, Stanford, CA Kazumi Saito NTT
Communication Science Laboratories Soraku, Kyoto,
JAPAN
This research was funded in part by NTT
Communication Science Laboratories and in part by
Grant NCC 2-1220 from NASA Ames Research Center.
2
A Process Model for an Aquatic Ecosystem
model AquaticEcosystem variables phyto, zoo,
nitro, residue observables phyto, nitro process
phyto_exponential_decay equations dphyto,t,1
? 0.307 ? phyto dresidue,t,1 0.307 ?
phyto process zoo_exponential_decay
equations dzoo,t,1 ? 0.251 ?
zoo dresidue,t,1 0.251 process
zoo_phyto_predation equations dzoo,t,1
0.615 ? 0.495 ? zoo dresidue,t,1 0.385 ?
0.495 ? zoo dphyto,t,1 ? 0.495 ?
zoo process nitro_uptake conditions nitro gt
0 equations dphyto,t,1 0.411 ?
phyto dnitro,t,1 ? 0.098 ? 0.411 ?
phyto process nitro_remineralization
equations dnitro,t,1 0.005 ?
residue dresidue,t,1 ? 0.005 ? residue
3
Predictions from the Ecosystem Model
4
Advantages of Quantitative Process Models
Process models are a good target for discovery
systems because

they refer to notations and mechanisms familiar
to scientists
they embed quantitative relations within
qualitative structure
they provide dynamical predictions of changes
over time
they offer causal and explanatory accounts of
phenomena
while retaining the modularity needed to support
induction.

Quantitative process models provide an important
alternative to formalisms used currently in
machine learning and discovery.
5
Inductive Process Modeling
training data
Observed values for a set of continuous
variables as they vary over time or situations
learned model
A specific process model that explains the
observed values and predicts future data
accurately
Induction
background knowledge
Generic processes that characterize causal
relationships among variables in terms
of conditional equations
6
Generic Processes as Background Knowledge
Our framework casts background knowledge as
generic processes that specify

the variables involved in a process and their
types
the parameters appearing in a process and their
ranges
the forms of conditions on the process and
the forms of associated equations and their
parameters.

Generic processes are building blocks from which
one can compose a specific quantitative process
model.
7
Generic Processes for Aquatic Ecosystems
generic process exponential_decay generic process
remineralization variables Sspecies,
Ddetritus variables Nnutrient,
Ddetritus parameters ? 0, 1 parameters
? 0, 1 equations dS,t,1 ?1 ? ? ? S
equations dN, t,1 ? ? D dD,t,1 ? ?
S dD, t,1 ?1 ? ? ? D generic process
predation generic process constant_inflow
variables S1species, S2species, Ddetritus
variables Nnutrient parameters ? 0, 1, ?
0, 1 parameters ? 0, 1
equations dS1,t,1 ? ? ? ? S1
equations dN,t,1 ? dD,t,1 (1 ? ? ) ? ?
? S1 dS2,t,1 ?1 ? ? ? S1 generic process
nutrient_uptake variables Sspecies,
Nnutrient parameters ? 0, ?, ? 0, 1, ?
0, 1 conditions N gt ? equations dS,t,1
? ? S dN,t,1 ?1 ? ? ? ? ? S
8
Previous Results The IPM Algorithm
Langley et al. (2002) reported IPM, an algorithm
that constructs process models from generic
components in four stages
1. Find all ways to instantiate known generic
processes with specific variables, subject to
type constraints 2. Combine instantiated
processes into candidate generic models, with
limits on the total number of processes 3. For
each generic model, carry out gradient descent
search through parameter space to find good
parameter values 4. Select the parameterized
model that produces the lowest mean squared error
on the training data.
We showed that IPM could induce accurate process
models from noisy time series, but it tended to
include extra processes.
9
The Revised IPM Algorithm
We have revised and extended the IPM algorithm so
that it now

Accepts as input those variables that can appear
in the induced model, both observable and
unobservable
Utilizes the parameter-fitting routine to
estimate initial values for unobservable
variables
Invokes the parameter-fitting method to induce
the thresholds on process conditions and
Selects the parameterized model with the lowest
description length Md (Mv Mc ) ? log (n)
n ? log (Me ) .

We have evaluated the new system on synthetic and
natural data.
10
Evaluation of the IPM Algorithm
To demonstrate IPM's ability to induce process
models, we ran it on synthetic data for a known
system
1. We used the aquatic ecosystem model to
generate data sets over 100 time steps for the
variables nitro and phyto 2. We replaced each
true value x with x ? (1 r ? n), where r
followed a Gaussian distribution (? 0, ? 1)
and n gt 0 3. We ran IPM on these noisy data,
giving it type constraints and generic processes
as background knowledge.
In two experiments, we let IPM determine the
initial values and thresholds given the correct
structure in a third study, we let it search
through a space of 256 generic model structures.
11
Experimental Results with IPM
The main results of our studies with IPM on
synthetic data were
1. The system infers accurate estimates for the
initial values of unobservable variables like zoo
and residue 2. The system induces estimates of
condition thresholds on nitro that are close to
the target values and 3. The MDL criterion
selects the correct model structure in all runs
with 5 noise, but only 40 of runs with 10
noise.
These suggest that the basic approach is sound,
but that we should consider other MDL schemes and
other responses to overfitting.
12
Results with Unobserved Initial Values
13
Electric Power on the International Space Station
14
Telemetry Data from Space Station Batteries
Predictor variables included the batterys
current and temperature.
15
Induced Process Model for Battery Behavior
model Battery variables Rs, Vcb, soc , Vt, i,
temperature observable soc, Vt, i, temperature
process voltage_charge process
voltage_discharge conditions i ? 0
conditions i lt 0 equations Vt Vcb 6.105 ?
Rs ? i equations Vt Vcb ? 1.0 / (Rs 1.0)
process charge_transfer equations dsoc,t,1
i ? Vcb/179.38 process quadratic_influence_Vcb
_soc equations Vcb 41.32 ? soc ?
soc process linear_influence_Vcb_temp
equations Vcb 0.2592 ? temperature process
linear_influence_Rs_soc equations Rs 0.03894
? soc
16
Results on Battery Test Data
17
Best Fit to Data on Protozoan Predation
18
Intellectual Influences
Our work on inductive process modeling
incorporates ideas from many traditions

computational scientific discovery (e.g., Langley
et al., 1983)
knowledge-based learning methods (e.g., ILP,
theory revision)
qualitative physics and simulation (e.g., Forbus,
1984)
scientific simulation environments (e.g., STELLA,
MATLAB)

However, the most similar research comes from
Todorovski and Dzeroski (1997) and from Bradley,
Easley, and Stolle (2001). Their approaches also
use knowledge to guide the induction of
differential equation models, though without a
process formalism.
19
Directions for Future Research
Despite our progress to date, we need further
work in order to