Computational Discovery of Communicable Knowledge - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Discovery of Communicable Knowledge

Description:

Computational Revision of Ecological Process Models Nima Asgharbeygi, Pat Langley, Stephen Bay Center for the Study of Language and Information – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 41
Provided by: Lang145
Learn more at: http://www.isle.org
Category:

less

Transcript and Presenter's Notes

Title: Computational Discovery of Communicable Knowledge


1
Computational Revision of Ecological Process
Models
Nima Asgharbeygi, Pat Langley, Stephen Bay Center
for the Study of Language and Information Stanford
University Kevin Arrigo Department of Geophysics
Stanford University
Thanks to S. Dzeroski, J. Sanchez, K. Saito, J.
Shrager, and L. Todorovski for their
contributions to this research, which is funded
by the US National Science Foundation.
2
Data Mining vs. Scientific Discovery
There exist two computational paradigms for
discovering explicit knowledge from data. The
data mining movement develops computational
methods that
  • induce predictive models from large (often
    business) data sets
  • represent models in notations invented by AI
    researchers.

In contrast, computational scientific discovery
focuses on
  • constructing models from (often small) scientific
    data sets
  • stated in formalisms invented by scientists
    themselves.

This talk focuses on applications of the second
framework to environmental and ecosystem
modeling.
3
Observations from the Ross Sea
4
A Model of Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ? phyto dzoo,t,1 ? 0.251 ? zoo
0.615 ? 0.495 ? zoo dresidue,t,1 0.307 ?
phyto 0.251 ? zoo 0.385 ? 0.495 ? zoo ? 0.005
? residue dnitro,t,1 ? 0.098 ? 0.411 ? phyto
0.005 ? residue
5
Inductive Revision of Ecosystem Models
observations
revised model
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ?
phyto dzoo,t,1 ? 0.251 ? zoo 0.615 ? 0.495
? zoo dresidue,t,1 0.307 ? phyto 0.251 ?
zoo 0.385 ? 0.495 ?
zoo ? 0.005 ? residue dnitro,t,1 ? 0.098 ?
0.411 ? phyto 0.005 ? residue
Revision
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ?
phyto dzoo,t,1 ? 0.251 ? zoo 0.615 ? 0.495
? zoo dresidue,t,1 0.307 ? phyto 0.251 ?
zoo 0.385 ? 0.495 ?
zoo ? 0.005 ? residue dnitro,t,1 ? 0.098 ?
0.411 ? phyto 0.005 ? residue
initial model
6
A Space of Ecosystem Models
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Model revision requires ways to constrain search
through this space.
7
Phytoplankton Loss in Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ? phyto dzoo,t,1 ? 0.251 ? zoo
0.615 ? 0.495 ? zoo dresidue,t,1 0.307 ?
phyto 0.251 ? zoo 0.385 ? 0.495 ? zoo ? 0.005
? residue dnitro,t,1 ? 0.098 ? 0.411 ? phyto
0.005 ? residue
Phytoplankton loss is a process that affects two
variables no model should include one influence
without the other.
8
Grazing in the Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ? phyto dzoo,t,1 ? 0.251 ? zoo
0.615 ? 0.495 ? zoo dresidue,t,1 0.307 ?
phyto 0.251 ? zoo 0.385 ? 0.495 ? zoo ? 0.005
? residue dnitro,t,1 ? 0.098 ? 0.411 ? phyto
0.005 ? residue
We can view an ecosystem model as a set of
processes that provide an alternative way to
encode its assumptions.
9
Process Model of Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto, nitro process
phyto_loss equations dphyto,t,1 ? 0.307 ?
phyto dresidue,t,1 0.307 ? phyto process
zoo_loss equations dzoo,t,1 ? 0.251 ?
zoo dresidue,t,1 0.251 ? zoo process
zoo_phyto_grazing equations dzoo,t,1 0.615
? 0.495 ? zoo dresidue,t,1 0.385 ? 0.495 ?
zoo dphyto,t,1 ? 0.495 ? zoo process
nitro_uptake equations dphyto,t,1 0.411 ?
phyto dnitro,t,1 ? 0.098 ? 0.411 ?
phyto process nitro_remineralization
equations dnitro,t,1 0.005 ?
residue dresidue,t,1 ? 0.005 ? residue
10
Inductive Revision of Process Models
Revision
model RossSeaEcosystem variables phyto, zoo,
nitro, residue observables phyto,
nitro dphyto,t,1 ? 0.307 ? phyto ? 0.495 ?
zoo 0.411 ?
phyto dzoo,t,1 ? 0.251 ? zoo 0.615 ? 0.495
? zoo dresidue,t,1 0.307 ? phyto 0.251 ?
zoo 0.385 ? 0.495 ?
zoo ? 0.005 ? residue dnitro,t,1 ? 0.098 ?
0.411 ? phyto 0.005 ? residue
initial model
generic processes
11
Generic Processes for Aquatic Ecosystems
generic process exponential_loss generic process
remineralization variables Sspecies,
Ddetritus variables Nnutrient,
Ddetritus parameters ? 0, 1 parameters
? 0, 1 equations dS,t,1 ?1 ? ? ? S
equations dN, t,1 ? ? D dD,t,1 ? ?
S dD, t,1 ?1 ? ? ? D generic process
grazing generic process constant_inflow
variables S1species, S2species, Ddetritus
variables Nnutrient parameters ? 0, 1, ?
0, 1 parameters ? 0, 1
equations dS1,t,1 ? ? ? ? S1
equations dN,t,1 ? dD,t,1 (1 ? ?) ? ? ?
S1 dS2,t,1 ?1 ? ? ? S1 generic process
nutrient_uptake variables Sspecies,
Nnutrient parameters ? 0, ?, ? 0, 1, ?
0, 1 conditions N gt ? equations dS,t,1
? ? S dN,t,1 ?1 ? ? ? ? ? S
12
A Method for Process Model Revision
We have implemented RPM, an algorithm that
revises an initial process model in four main
stages
1. Find all ways to instantiate available generic
processes with specific variables, subject to
type constraints 2. Generate candidate model
structures by deleting the current processes and
adding new ones, subject to complexity limits
3. For each generic model, carry out search
through parameter space to find good coefficients
difficult 4. Return a list of revised models
ordered by their overall scores.
The evaluation metric can be squared error or
description length based on error and distance
from the initial model.
13
Observations from the Ross Sea
14
Revised Model of Ross Sea Ecosystem
model RossSeaEcosystem variables phyto, zoo,
nitro, residue, light, G, growth_rate,
nitro_rate, light_rate observables phyto, nitro,
light dphyto,t,1 ? 0.307 ? phyto ? G ? zoo
growth_rate ? phyto dzoo,t,1 0.615 ? G ?
zoo dresidue,t,1 0.307 ? phyto 0.385 ? G ?
zoo ? 0.083 ? residue dnitro,t,1 ? 1 ?
n_to_c ? growth_rate ? phyto 0.083 ? n_to_c ?
residue G 0.415 ? (1 exp( 1 ? 0.27 ?
phyto) growth_rate r_max ? min(nitro_rate,
light_rate) nitro_rate nitro / (nitro
4.33) light_rate light / (light
11.67) n_to_c 0.251, r_max 0.194, remin_rate
0.0676
15
Initial Results on Ross Sea Training Data
The best revised model reproduces the
observations quite well.
16
Initial Results on Ross Sea Test Data
But the model predicts nearly the same behavior
for both years.
17
Revised Results on Ross Sea Test Data
Refitting initial values for zooplankton gives
better generalization.
18
Results on Data from Protist Study
19
Results on Data from Rinkobing Fjord
20
Interfacing with Scientists
Because few scientists want to be replaced, we
are developing PROMETHEUS, an interactive
environment that lets users
  • specify a quantitative process model of the
    target system
  • display and edit the models structure and
    details graphically
  • simulate the models behavior over time and
    situations
  • compare the models predicted behavior to
    observations
  • invoke a revision module in response to detected
    anomalies.

The environment offers computational assistance
in forming and evaluating models but lets the
user retain control.
21
Viewing and Editing a Process Model
22
Intellectual Influences
Our approach to computational discovery
incorporates ideas from many traditions
  • computational scientific discovery (e.g., Langley
    et al., 1983)
  • theory revision in machine learning (e.g.,
    Towell, 1991)
  • qualitative physics and simulation (e.g., Forbus,
    1984)
  • languages for scientific simulation (e.g.,
    STELLA, MATLAB)
  • interactive tools for data analysis (e.g.,
    Schneiderman, 2001).

Our work combines ideas from machine learning,
AI, programming languages, and human-computer
interaction.
23
Directions for Future Research
Despite our progress to date, we need further
work in order to
  • produce additional results on other ecosystem
    modeling tasks
  • develop improved methods for fitting model
    parameters
  • implement heuristic methods for searching the
    structure space
  • utilize knowledge of subsystems to further
    constrain search
  • augment the modeling environment to make it more
    usable

Process modeling has great potential to aid model
development in environmental science.
24
Contributions of the Research
In summary, our work on computational discovery
has produced
  • a new formalism for representing scientific
    process models
  • an encoding for background knowledge as generic
    processes
  • an algorithm for revising process models with
    time-series data
  • an interactive environment for model
    construction/utilization.

We have demonstrated this approach to model
revision on both ecosystem modeling and an
environmental domain. The PROMETHEUS
modeling/revision environment is available at
http//www.isle.org/process.
html
25
End of Presentation
26
The Challenge of Systems Science
Disciplines like Earth science differ from
traditional disciplines by
  • focusing on synthesis rather than analysis in
    their operation
  • using computer modeling as one of their central
    methods
  • developing system-level models with many
    variables / relations
  • evaluating models on observational, not
    experimental, data.

Constructing such models are complex tasks that
would benefit from computational aids, but
existing methods are insufficient.
27
Why Are Process Models Interesting?
Process models are a crucial target for machine
learning because
  • they incorporate scientific formalisms rather
    than AI notations
  • that are easily communicable to scientists and
    engineers
  • they move beyond descriptive generalization to
    explanation
  • while retaining the modularity needed to support
    induction.

These reasons point to process models as an ideal
representation for scientific and engineering
knowledge. Process models are an important
alternative to formalisms used currently in
machine learning.
28
Advantages of Quantitative Process Models
Process models offer scientists a promising
framework because
  • they embed quantitative relations within
    qualitative structure
  • that refer to notations and mechanisms familiar
    to experts
  • they provide dynamical predictions of changes
    over time
  • they offer causal and explanatory accounts of
    phenomena
  • while retaining the modularity needed to support
    induction.

Quantitative process models provide an important
alternative to formalisms used currently in
ecosystem modeling.
29
Inductive Process Modeling
Our response is to design, construct, and
evaluate computational methods for inductive
process modeling, which
  • represent scientific models as sets of
    quantitative processes
  • use these models to predict and explain
    observational data
  • search a space of process models to find good
    candidates
  • utilize background knowledge to constrain this
    search.

This framework has great potential to aid
environmental science, but it raises new
computational challenges.
30
Challenges of Inductive Process Modeling
Process model induction differs from typical
learning tasks in that
  • process models characterize behavior of dynamical
    systems
  • variables are continuous but can have
    discontinuous behavior
  • observations are not independently and
    identically distributed
  • models may contain unobservable processes and
    variables
  • multiple processes can interact to produce
    complex behavior.

Compensating factors include a focus on
deterministic systems and the availability of
background knowledge.
31
Generating Predictions and Explanations
To utilize or evaluate a given process model, we
must simulate its behavior over time
  • specify initial values for input variables and
    time step size
  • on each time step, determine which processes are
    active
  • solve active algebraic/differential equations
    with known values
  • propagate values and recursively solve other
    active equations
  • when multiple processes influence the same
    variable, assume their effects are additive.

This performance method makes specific
predictions that we can compare to observations.
32
Generic Processes as Background Knowledge
Our framework casts background knowledge as
generic processes that specify
  • the variables involved in a process and their
    types
  • the parameters appearing in a process and their
    ranges
  • the forms of conditions on the process and
  • the forms of associated equations and their
    parameters.

Generic processes are building blocks from which
one can compose a specific process model.
33
Estimating Parameters in Process Models
To estimate the parameters for each generic model
structure, the IPM algorithm
1. Selects random initial values that fall within
ranges specified in the generic processes 2.
Improves these parameters using the
Levenberg-Marquardt method until it reaches a
local optimum 3. Generates new candidate values
through random jumps along dimensions of the
parameter vector and continue search 4. If no
improvement occurs after N jumps, it restarts the
search from a new random initial point.
This multi-level method gives reasonable fits to
time-series data from a number of domains, but it
is computationally intensive.
34
A Process Model for an Aquatic Ecosystem
model Ross_Sea_Ecosystem variables phyto,
nitro, residue, light, growth_rate,
effective_light, ice_factor observables phyto,
nitro, light, ice_factor process phyto_loss
equations dphyto,t,1 ? 0.1 ?
phyto dresidue,t,1 0.1 ? phyto process
phyto_growth equations dphyto,t,1
growth_rate ? phyto process phyto_uptakes_nitro
conditions nitro gt 0 equations dnitro,t,1
? 1 ? 0.204 ? growth_rate ? phyto process
growth_limitation equations growth_rate 0.23
? min(nitrate_rate, light_rate) process
nitrate_availability equations nitrate_rate
nitrate / (nitrate 5) process
light_availability equations light_rate
effective_light / (effective_light 50) process
light_attenuation equations effective_light
light ? ice_factor
35
Generic Processes for Aquatic Ecosystems
generic process exponential_loss generic process
remineralization variables Sspecies,
Ddetritus variables Nnutrient,
Ddetritus parameters ? 0, 1 parameters
? 0, 1 equations dS,t,1 ?1 ? ? ? S
equations dN, t,1 ? ? D dD,t,1 ? ?
S dD, t,1 ?1 ? ? ? D generic process
grazing generic process constant_inflow
variables S1species, S2species, Ddetritus
variables Nnutrient parameters ? 0, 1, ?
0, 1 parameters ? 0, 1
equations dS1,t,1 ? ? ? ? S1
equations dN,t,1 ? dD,t,1 (1 ? ?) ? ? ?
S1 dS2,t,1 ?1 ? ? ? S1 generic process
nutrient_uptake variables Sspecies,
Nnutrient parameters ? 0, ?, ? 0, 1, ?
0, 1 conditions N gt ? equations dS,t,1
? ? S dN,t,1 ?1 ? ? ? ? ? S
36
Inductive Process Modeling
37
The NPPc Portion of CASA
NPPc Smonth max (E IPAR, 0) E 0.56 T1
T2 W T1 0.8 0.02 Topt 0.0005
Topt2 T2 1.18 / (1 e 0.2 (Topt
Tempc 10) ) (1 e 0.3 (Tempc Topt 10)
) W 0.5 0.5 EET / PET PET
1.6 (10 Tempc / AHI)A PET-TW-M if Tempc gt
0 PET 0 if Tempc lt 0 A
0.00000068 AHI3 0.000077 AHI2 0.018 AHI
0.49 IPAR 0.5 FPAR-FAS Monthly-Solar
Sol-Conver FPAR-FAS min (SR-FAS 1.08)
/ SR (UMD-VEG) , 0.95 SR-FAS
(Mon-FAS-NDVI 1000) / (Mon-FAS-NDVI 1000)
38
Results of Revising the NPP Model
Initial model E 0.56 T1 T2 W
T2 1.18 / (1 e 0.2 (Topt Tempc 10) )
(1 e 0.3 (Tempc Topt 10) ) PET
1.6 (10 Tempc / AHI)A PET-TW-M SR ?
3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05,
4.05, 5.09, 4.05 RMSE on training data 465.212
and r 2 0.799 Revised model E 0.353
T10.00 T2 0.08 W 0.00 T2 0.83 / (1
e 1.0 (Topt Tempc 6.34) ) (1 e 1.0
(Tempc Topt 11.52) ) PET 1.6 (10
Tempc / AHI) A PET-TW-M SR ? 0.61, 3.99,
2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85,
1.61 Cross-validated RMSE 397.306 and r 2
0.853 15 reduction



39
Generic Processes for Photosynthesis Regulation
generic process translation generic process
transcription variables Pprotein, MmRNA
variables MmRNA, Rrate parameters ? 0,
1 parameters equations dP,t,1 ? ? M
equations dM,t,1 R generic process
regulate_one generic process regulate_two
variables Rrate, Ssignal variables
Rrate, Ssignal parameters ? ?1 , 1
parameters ? ?1 , 1, ? 0, 1 equations R
? ? S equations R ? ? S dS, t,1 ?1
? ? ? S generic process automatic_degradation gen
eric process controlled_degradation variables
Cconcentration variables Dconcentration,
Econcentration conditions C gt 0
conditions D gt 0, E gt 0 parameters ? 0, 1
parameters ? 0, 1 equations dC,t,1 ?1 ?
? ? C equations dD,t,1 ?1 ? ? ?
E dE,t,1 ?1 ? ? ? E generic process
photosynthesis variables Llight, Pprotein,
Rredox, SROS parameters ? 0, 1, ? 0,
1 equations dR,t,1 ? ? L ? P dS,t,1
? ? L ? P
40
A Process Model for Photosynthetic Regulation
model photo_regulation variables light,
mRNA_protein, ROS, redox, transcription_rate obser
vables light, mRNA process photosynthesis
equations dredox,t,1 0.0155 ? light ?
protein dROS,t,1 0.019 ? light ?
protein process protein_translation process
mRNA_transcription equations dprotein,t,1
7.54 ? mRNA equations dmRNA,t,1
transcription_rate process regulate_one_1 process
regulate_two_2 equations transcription_rate
0.99 ? light equations transcription_rate
1.203 ? redox dredox,t,1 ? 0.0002 ?
redox process automatic_degradation_1 process
controlled_degradation_1 conditions protein gt
0 conditions redox gt 0, ROS gt 0
equations dprotein,t,1 ? 1.91 ? protein
equations dredox,t,1 ? 0.0003 ?
ROS dROS,t,1 ? 0.0003 ? ROS
Write a Comment
User Comments (0)
About PowerShow.com