Title: Computational Discovery of Communicable Knowledge
1Computational Revision of Ecological Process
Models
Nima Asgharbeygi, Pat Langley, Stephen Bay Center
for the Study of Language and Information Stanford
University Kevin Arrigo Department of Geophysics
Stanford University
Thanks to S. Dzeroski, J. Sanchez, K. Saito, J.
Shrager, and L. Todorovski for their
contributions to this research, which is funded
by the US National Science Foundation.
2Data Mining vs. Scientific Discovery
There exist two computational paradigms for
discovering explicit knowledge from data. The
data mining movement develops computational
methods that
- induce predictive models from large (often
business) data sets - represent models in notations invented by AI
researchers.
In contrast, computational scientific discovery
focuses on
- constructing models from (often small) scientific
data sets - stated in formalisms invented by scientists
themselves.
This talk focuses on applications of the second
framework to environmental and ecosystem
modeling.
3Observations from the Ross Sea
4A Model of Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ? phyto dzoo,t,1 ? 0.251 ? zoo
0.615 ? 0.495 ? zoo dresidue,t,1 0.307 ?
phyto 0.251 ? zoo 0.385 ? 0.495 ? zoo ? 0.005
? residue dnitro,t,1 ? 0.098 ? 0.411 ? phyto
0.005 ? residue
5Inductive Revision of Ecosystem Models
observations
revised model
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ?
phyto dzoo,t,1 ? 0.251 ? zoo 0.615 ? 0.495
? zoo dresidue,t,1 0.307 ? phyto 0.251 ?
zoo 0.385 ? 0.495 ?
zoo ? 0.005 ? residue dnitro,t,1 ? 0.098 ?
0.411 ? phyto 0.005 ? residue
Revision
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ?
phyto dzoo,t,1 ? 0.251 ? zoo 0.615 ? 0.495
? zoo dresidue,t,1 0.307 ? phyto 0.251 ?
zoo 0.385 ? 0.495 ?
zoo ? 0.005 ? residue dnitro,t,1 ? 0.098 ?
0.411 ? phyto 0.005 ? residue
initial model
6A Space of Ecosystem Models
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Model revision requires ways to constrain search
through this space.
7Phytoplankton Loss in Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ? phyto dzoo,t,1 ? 0.251 ? zoo
0.615 ? 0.495 ? zoo dresidue,t,1 0.307 ?
phyto 0.251 ? zoo 0.385 ? 0.495 ? zoo ? 0.005
? residue dnitro,t,1 ? 0.098 ? 0.411 ? phyto
0.005 ? residue
Phytoplankton loss is a process that affects two
variables no model should include one influence
without the other.
8Grazing in the Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ? phyto dzoo,t,1 ? 0.251 ? zoo
0.615 ? 0.495 ? zoo dresidue,t,1 0.307 ?
phyto 0.251 ? zoo 0.385 ? 0.495 ? zoo ? 0.005
? residue dnitro,t,1 ? 0.098 ? 0.411 ? phyto
0.005 ? residue
We can view an ecosystem model as a set of
processes that provide an alternative way to
encode its assumptions.
9Process Model of Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto, nitro process
phyto_loss equations dphyto,t,1 ? 0.307 ?
phyto dresidue,t,1 0.307 ? phyto process
zoo_loss equations dzoo,t,1 ? 0.251 ?
zoo dresidue,t,1 0.251 ? zoo process
zoo_phyto_grazing equations dzoo,t,1 0.615
? 0.495 ? zoo dresidue,t,1 0.385 ? 0.495 ?
zoo dphyto,t,1 ? 0.495 ? zoo process
nitro_uptake equations dphyto,t,1 0.411 ?
phyto dnitro,t,1 ? 0.098 ? 0.411 ?
phyto process nitro_remineralization
equations dnitro,t,1 0.005 ?
residue dresidue,t,1 ? 0.005 ? residue
10Inductive Revision of Process Models
Revision
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ?
phyto dzoo,t,1 ? 0.251 ? zoo 0.615 ? 0.495
? zoo dresidue,t,1 0.307 ? phyto 0.251 ?
zoo 0.385 ? 0.495 ?
zoo ? 0.005 ? residue dnitro,t,1 ? 0.098 ?
0.411 ? phyto 0.005 ? residue
initial model
generic processes
11Generic Processes for Aquatic Ecosystems
generic process exponential_loss generic process
remineralization variables Sspecies,
Ddetritus variables Nnutrient,
Ddetritus parameters ? 0, 1 parameters
? 0, 1 equations dS,t,1 ?1 ? ? ? S
equations dN, t,1 ? ? D dD,t,1 ? ?
S dD, t,1 ?1 ? ? ? D generic process
grazing generic process constant_inflow
variables S1species, S2species, Ddetritus
variables Nnutrient parameters ? 0, 1, ?
0, 1 parameters ? 0, 1
equations dS1,t,1 ? ? ? ? S1
equations dN,t,1 ? dD,t,1 (1 ? ?) ? ? ?
S1 dS2,t,1 ?1 ? ? ? S1 generic process
nutrient_uptake variables Sspecies,
Nnutrient parameters ? 0, ?, ? 0, 1, ?
0, 1 conditions N gt ? equations dS,t,1
? ? S dN,t,1 ?1 ? ? ? ? ? S
12A Method for Process Model Revision
We have implemented RPM, an algorithm that
revises an initial process model in four main
stages
1. Find all ways to instantiate available generic
processes with specific variables, subject to
type constraints 2. Generate candidate model
structures by deleting the current processes and
adding new ones, subject to complexity limits
3. For each generic model, carry out search
through parameter space to find good coefficients
difficult 4. Return a list of revised models
ordered by their overall scores.
The evaluation metric can be squared error or
description length based on error and distance
from the initial model.
13Observations from the Ross Sea
14Revised Model of Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue, light, G, growth_rate,
nitro_rate, light_rate observables phyto, nitro,
light dphyto,t,1 ? 0.307 ? phyto ? G ? zoo
growth_rate ? phyto dzoo,t,1 0.615 ? G ?
zoo dresidue,t,1 0.307 ? phyto 0.385 ? G ?
zoo ? 0.083 ? residue dnitro,t,1 ? 1 ?
n_to_c ? growth_rate ? phyto 0.083 ? n_to_c ?
residue G 0.415 ? (1 exp( 1 ? 0.27 ?
phyto) growth_rate r_max ? min(nitro_rate,
light_rate) nitro_rate nitro / (nitro
4.33) light_rate light / (light
11.67) n_to_c 0.251, r_max 0.194, remin_rate
0.0676
15Initial Results on Ross Sea Training Data
The best revised model reproduces the
observations quite well.
16Initial Results on Ross Sea Test Data
But the model predicts nearly the same behavior
for both years.
17Revised Results on Ross Sea Test Data
Refitting initial values for zooplankton gives
better generalization.
18Results on Data from Protist Study
19Results on Data from Rinkobing Fjord
20Interfacing with Scientists
Because few scientists want to be replaced, we
are developing PROMETHEUS, an interactive
environment that lets users
- specify a quantitative process model of the
target system - display and edit the models structure and
details graphically - simulate the models behavior over time and
situations - compare the models predicted behavior to
observations - invoke a revision module in response to detected
anomalies.
The environment offers computational assistance
in forming and evaluating models but lets the
user retain control.
21Viewing and Editing a Process Model
22Intellectual Influences
Our approach to computational discovery
incorporates ideas from many traditions
- computational scientific discovery (e.g., Langley
et al., 1983) - theory revision in machine learning (e.g.,
Towell, 1991) - qualitative physics and simulation (e.g., Forbus,
1984) - languages for scientific simulation (e.g.,
STELLA, MATLAB) - interactive tools for data analysis (e.g.,
Schneiderman, 2001).
Our work combines ideas from machine learning,
AI, programming languages, and human-computer
interaction.
23Directions for Future Research
Despite our progress to date, we need further
work in order to
- produce additional results on other ecosystem
modeling tasks - develop improved methods for fitting model
parameters - implement heuristic methods for searching the
structure space - utilize knowledge of subsystems to further
constrain search - augment the modeling environment to make it more
usable
Process modeling has great potential to aid model
development in environmental science.
24Contributions of the Research
In summary, our work on computational discovery
has produced
- a new formalism for representing scientific
process models - an encoding for background knowledge as generic
processes - an algorithm for revising process models with
time-series data - an interactive environment for model
construction/utilization.
We have demonstrated this approach to model
revision on both ecosystem modeling and an
environmental domain. The PROMETHEUS
modeling/revision environment is available at
http//www.isle.org/process.
html
25End of Presentation
26The Challenge of Systems Science
Disciplines like Earth science differ from
traditional disciplines by
- focusing on synthesis rather than analysis in
their operation - using computer modeling as one of their central
methods - developing system-level models with many
variables / relations - evaluating models on observational, not
experimental, data.
Constructing such models are complex tasks that
would benefit from computational aids, but
existing methods are insufficient.
27Why Are Process Models Interesting?
Process models are a crucial target for machine
learning because
- they incorporate scientific formalisms rather
than AI notations - that are easily communicable to scientists and
engineers - they move beyond descriptive generalization to
explanation - while retaining the modularity needed to support
induction.
These reasons point to process models as an ideal
representation for scientific and engineering
knowledge. Process models are an important
alternative to formalisms used currently in
machine learning.
28Advantages of Quantitative Process Models
Process models offer scientists a promising
framework because
- they embed quantitative relations within
qualitative structure - that refer to notations and mechanisms familiar
to experts - they provide dynamical predictions of changes
over time - they offer causal and explanatory accounts of
phenomena - while retaining the modularity needed to support
induction.
Quantitative process models provide an important
alternative to formalisms used currently in
ecosystem modeling.
29Inductive Process Modeling
Our response is to design, construct, and
evaluate computational methods for inductive
process modeling, which
- represent scientific models as sets of
quantitative processes - use these models to predict and explain
observational data - search a space of process models to find good
candidates - utilize background knowledge to constrain this
search.
This framework has great potential to aid
environmental science, but it raises new
computational challenges.
30Challenges of Inductive Process Modeling
Process model induction differs from typical
learning tasks in that
- process models characterize behavior of dynamical
systems - variables are continuous but can have
discontinuous behavior - observations are not independently and
identically distributed - models may contain unobservable processes and
variables - multiple processes can interact to produce
complex behavior.
Compensating factors include a focus on
deterministic systems and the availability of
background knowledge.
31Generating Predictions and Explanations
To utilize or evaluate a given process model, we
must simulate its behavior over time
- specify initial values for input variables and
time step size - on each time step, determine which processes are
active - solve active algebraic/differential equations
with known values - propagate values and recursively solve other
active equations - when multiple processes influence the same
variable, assume their effects are additive.
This performance method makes specific
predictions that we can compare to observations.
32Generic Processes as Background Knowledge
Our framework casts background knowledge as
generic processes that specify
- the variables involved in a process and their
types - the parameters appearing in a process and their
ranges - the forms of conditions on the process and
- the forms of associated equations and their
parameters.
Generic processes are building blocks from which
one can compose a specific process model.
33Estimating Parameters in Process Models
To estimate the parameters for each generic model
structure, the IPM algorithm
1. Selects random initial values that fall within
ranges specified in the generic processes 2.
Improves these parameters using the
Levenberg-Marquardt method until it reaches a
local optimum 3. Generates new candidate values
through random jumps along dimensions of the
parameter vector and continue search 4. If no
improvement occurs after N jumps, it restarts the
search from a new random initial point.
This multi-level method gives reasonable fits to
time-series data from a number of domains, but it
is computationally intensive.
34A Process Model for an Aquatic Ecosystem
model Ross_Sea_Ecosystem variables phyto,
nitro, residue, light, growth_rate,
effective_light, ice_factor observables phyto,
nitro, light, ice_factor process phyto_loss
equations dphyto,t,1 ? 0.1 ?
phyto dresidue,t,1 0.1 ? phyto process
phyto_growth equations dphyto,t,1
growth_rate ? phyto process phyto_uptakes_nitro
conditions nitro gt 0 equations dnitro,t,1
? 1 ? 0.204 ? growth_rate ? phyto process
growth_limitation equations growth_rate 0.23
? min(nitrate_rate, light_rate) process
nitrate_availability equations nitrate_rate
nitrate / (nitrate 5) process
light_availability equations light_rate
effective_light / (effective_light 50) process
light_attenuation equations effective_light
light ? ice_factor
35Generic Processes for Aquatic Ecosystems
generic process exponential_loss generic process
remineralization variables Sspecies,
Ddetritus variables Nnutrient,
Ddetritus parameters ? 0, 1 parameters
? 0, 1 equations dS,t,1 ?1 ? ? ? S
equations dN, t,1 ? ? D dD,t,1 ? ?
S dD, t,1 ?1 ? ? ? D generic process
grazing generic process constant_inflow
variables S1species, S2species, Ddetritus
variables Nnutrient parameters ? 0, 1, ?
0, 1 parameters ? 0, 1
equations dS1,t,1 ? ? ? ? S1
equations dN,t,1 ? dD,t,1 (1 ? ?) ? ? ?
S1 dS2,t,1 ?1 ? ? ? S1 generic process
nutrient_uptake variables Sspecies,
Nnutrient parameters ? 0, ?, ? 0, 1, ?
0, 1 conditions N gt ? equations dS,t,1
? ? S dN,t,1 ?1 ? ? ? ? ? S
36Inductive Process Modeling
37The NPPc Portion of CASA
NPPc Smonth max (E IPAR, 0) E 0.56 T1
T2 W T1 0.8 0.02 Topt 0.0005
Topt2 T2 1.18 / (1 e 0.2 (Topt
Tempc 10) ) (1 e 0.3 (Tempc Topt 10)
) W 0.5 0.5 EET / PET PET
1.6 (10 Tempc / AHI)A PET-TW-M if Tempc gt
0 PET 0 if Tempc lt 0 A
0.00000068 AHI3 0.000077 AHI2 0.018 AHI
0.49 IPAR 0.5 FPAR-FAS Monthly-Solar
Sol-Conver FPAR-FAS min (SR-FAS 1.08)
/ SR (UMD-VEG) , 0.95 SR-FAS
(Mon-FAS-NDVI 1000) / (Mon-FAS-NDVI 1000)
38Results of Revising the NPP Model
Initial model E 0.56 T1 T2 W
T2 1.18 / (1 e 0.2 (Topt Tempc 10) )
(1 e 0.3 (Tempc Topt 10) ) PET
1.6 (10 Tempc / AHI)A PET-TW-M SR ?
3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05,
4.05, 5.09, 4.05 RMSE on training data 465.212
and r 2 0.799 Revised model E 0.353
T10.00 T2 0.08 W 0.00 T2 0.83 / (1
e 1.0 (Topt Tempc 6.34) ) (1 e 1.0
(Tempc Topt 11.52) ) PET 1.6 (10
Tempc / AHI) A PET-TW-M SR ? 0.61, 3.99,
2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85,
1.61 Cross-validated RMSE 397.306 and r 2
0.853 15 reduction
39Generic Processes for Photosynthesis Regulation
generic process translation generic process
transcription variables Pprotein, MmRNA
variables MmRNA, Rrate parameters ? 0,
1 parameters equations dP,t,1 ? ? M
equations dM,t,1 R generic process
regulate_one generic process regulate_two
variables Rrate, Ssignal variables
Rrate, Ssignal parameters ? ?1 , 1
parameters ? ?1 , 1, ? 0, 1 equations R
? ? S equations R ? ? S dS, t,1 ?1
? ? ? S generic process automatic_degradation gen
eric process controlled_degradation variables
Cconcentration variables Dconcentration,
Econcentration conditions C gt 0
conditions D gt 0, E gt 0 parameters ? 0, 1
parameters ? 0, 1 equations dC,t,1 ?1 ?
? ? C equations dD,t,1 ?1 ? ? ?
E dE,t,1 ?1 ? ? ? E generic process
photosynthesis variables Llight, Pprotein,
Rredox, SROS parameters ? 0, 1, ? 0,
1 equations dR,t,1 ? ? L ? P dS,t,1
? ? L ? P
40A Process Model for Photosynthetic Regulation
model photo_regulation variables light,
mRNA_protein, ROS, redox, transcription_rate obser
vables light, mRNA process photosynthesis
equations dredox,t,1 0.0155 ? light ?
protein dROS,t,1 0.019 ? light ?
protein process protein_translation process
mRNA_transcription equations dprotein,t,1
7.54 ? mRNA equations dmRNA,t,1
transcription_rate process regulate_one_1 process
regulate_two_2 equations transcription_rate
0.99 ? light equations transcription_rate
1.203 ? redox dredox,t,1 ? 0.0002 ?
redox process automatic_degradation_1 process
controlled_degradation_1 conditions protein gt
0 conditions redox gt 0, ROS gt 0
equations dprotein,t,1 ? 1.91 ? protein
equations dredox,t,1 ? 0.0003 ?
ROS dROS,t,1 ? 0.0003 ? ROS