Extracting Cellular Automaton Rules Directly from Experimental data1 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Extracting Cellular Automaton Rules Directly from Experimental data1

Description:

Assign fitness to the new members, compare with those previously ... Alternate definition of mutual information. I(x,y) = H(x) H(y) H(x,y) Fitness function ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 23
Provided by: geoHunt
Category:

less

Transcript and Presenter's Notes

Title: Extracting Cellular Automaton Rules Directly from Experimental data1


1
Extracting Cellular Automaton Rules Directly from
Experimental data1
  • Richards, F.C.,T.P. Meyer and N.H. Packard
    Physica D 45 (1990) 189-202

1Extensive quoting and paraphrasing
2
Objectives
  • constructing models for two-dimensional spatial
    patterns directly from experimental data
  • Use probabilistic CA rules as possible models for
    data
  • Use genetic (learning) algorithm to search space
    of rules to find the ones that most accurately
    predict behavior of evolving patterns.

3
Background
  • Studies of CA show complicated global patterns
    can be generated by relatively simple rules
  • CA simplest model for investigating spatially
    dynamic phenomena
  • Assumption observed global structure can be
    generated with a rule that is local in both space
    and time

4
CA rules
  • Maps the state of a given site on a discrete
    lattice to a future state
  • Future site is some function of states of the
    sites in a neighborhood containing the given site
  • Therefore need to specify not only mapping
    function but also structure of this neighborhood
    for the CA rule.

5
Mapping function
  • Template used to determines which of the sites in
    a neighborhood around a given lattice point are
    used to define an input state for a CA rule
  • Let experimental data determine mapping function
    for any given template
  • Mapping function will be a probability histogram
    derived from the frequency of occurrence of each
    input/output-state pair viewing sites of
    experimental data through the template
  • Number of templates can number in the 100 of
    millions
  • Need to search through templates that rank
    highest using a fitness function
  • Use genetic algorithm to search through space of
    possible templates to find the ones most fit.

6
CA rule space
  • ai,jt1 Fyi,jt
  • yi,jt value of the lattice sites in some
    neighborhood around site ai,j
  • F is a local map which takes yi,j to ai,j
  • indices i,j indicate the spatial position of the
    site.

7
Spatial temporal domain
  • Expand definition to include dynamics on
    different times scales and different length
    scales
  • Time ai,jt1 is a function of the state of the
    four nearest neighbors at time t and at time t-1
  • Two step hope to capture dynamics
  • May also be able to model fast and slow growth

8
Spatial temporal domain (cont.)
  • Expand definition of F to incorporate information
    from different length scales
  • Choice length scale where behavior is dominated
    by dynamics instead of noise
  • Represent data in a pyramid for were each level
    represents information about a different length
    scale (3x3, 9x9, etc.)

9
Model
  • ai,jt1 Fyi,jt, , yi,jt, yi,jt, yi,jt-1,
    yi,jt-1, yi,jt-1
  • Where different neighborhood configurations y,
    y, y are taken from the different levels of
    the data pyramid.
  • So yi,jt represents values of sites taken from
    the first level of the pyramid for some
    neighborhood around ai,jt .
  • State vector yi,jt represents values of the site
    from the second level neighborhood around ai,jt
    and
  • State vector yi,jt represents values of the
    site from the third level neighborhood around
    ai,jt
  • Therefore the future value of ai,jt1, depends on
    neighborhood configurations around the site at
    time t and t-1 and these neighborhood
    configurations can be derived from data at many
    different length scales.

10
Model (cont.)
  • ai,jt1 FYi,jt, t-1
  • Y neighborhood values at multiple scales of the
    pyramid around ai,j
  • Question in order to find a CA rule which best
    determines the future state of a given site, and
    given an input configuration, how do we determine
    the future site value (i.e. what is F?)
  • Ans The local mapping function, F, will be
    determined empirically from the pattern data.
  • For every input state we record how many times
    ai,jt1 is 0 and how many times it is 1, in order
    to develop a probability histogram.
  • The probability histogram defines a probabilistic
    CA rule
  • So for any given set of sites defining an input
    state there will be only one probabilistic CA
    rule defined by the observational data.
  • Neighborhood configurations will be defined by a
    template.

11
Genetic algorithm
  • If a template has M sites we can construct up to
    2M different templates
  • Its not possible to construct probabilistic CA
    rules for each template to find the one that best
    describes the pattern dynamics.
  • Need to employ a genetic algorithm to find the
    optimal (or nearly) solution to a problem given a
    large set of possible solutions

12
Genetic algorithm outline
  • Define population, in this case the population is
    the set of CA rule templates
  • Select some small subset of population at random
  • Assign a fitness to each
  • Retain the fittest members an discard the least
    fit members
  • To the labels of the fittest members, apply the
    transformations operators, appropriately named
    the genetic operators to produce new labels of
    other members in the population (mutate)
  • Assign fitness to the new members, compare with
    those previously selected, retain the fittest,
    discard the least fit and re-integrate the
    process.

13
Genetic algorithm questions
  • How do we represent the CA rules in a symbolic
    genetic form?
  • What do we mean by fitness?
  • What are genetic operators?

14
CA rules Master template
  • ai,jt1 may depend on
  • Any of eight nearest-neighbor (nn) values from
    time t
  • Any of the four nn values from t-1
  • The four 3x3 nn values from 2nd level at time t
    and t-1
  • The four 9x9 nn values from 3rd level at time t
    and t-1

15
CA rules sub-template
  • Rules within the rule space
  • Only those sites from the master template that
    are used to determine the input states for a rule
  • Use a 28-bit integer (master template has 28
    sites, 228 templates), analogous to gene patterns
    in biology, specifies for of Y.
  • By manipulating the bits, create a new template.
  • Each sub-template will have a probability
    associated with it collected from the
    transition-probability histogram.
  • Use the genetic operators to move us through the
    rule space in search of the optimal rule for a
    given set of data.

16
CA rules fitness
  • How well the rule can regenerate the behavior
    observed in the experimental data
  • Fitness rule reproduce global and local behavior
  • Rate based on how much information their past and
    present site contain on average about the future
    site value.

17
Shannons measure of information and mutual
information
  • Information
  • H(x) - sum i, P(xi) log P(xi)
  • Mutual information
  • I(x,y) sum i, P(xi,yj) log P(xi,yj) / P(xi)
    P(yj)
  • P(xi,yj) is the joint probability distribution or
    the probabiity of finding the variable x in state
    i and simultaneously find y in state j.
  • Alternate definition of mutual information
  • I(x,y) H(x) H(y) H(x,y)

18
Fitness function
  • F I(x,y) 2m/N
  • M is the number of cells in our rule template
  • N is the number of experimentally determined data
    points
  • Y represents the state of the future site
    variable, and x represents the state of the sites
    in the sub-template
  • 2m/N corrects for overestimation
  • Fitness measured by scanning the pattern data
    with the appropriate sub-template and collecting
    probability histograms for P(x), P(y) and P(x,y)
    used to calculate I(x,y), taken with template
    size m and data-set size N, to calculate F.

19
Genetic operators
  • Having defined space of possible models for
    pattern data and a notion of fitness by which to
    rank rules, now need to efficiently search
    through this space to find the fittest model.
  • Genetic operators
  • Must be close with respect to the space of rules
  • Should transform the rule labels, which are
    integer representations of the sub-templates,
    from one value to another.

20
Genetic operators (cont.)
  • Point-mutation operator
  • Change just one bit (remember each bit
    corresponds to a site in the master template)
  • Crossover operator
  • Replaces two rules with those of section from
    another
  • In hopes o generating a better rule

21
Genetic operators (cont.)
  • Genetic operators provide motion in the space of
    CA sub-templates
  • Point operators small changes
  • Crossover operations provides a long-jump
    mechanism

22
Summary of Methodology
  • Defined space of templates and hence a space of
    probabilistic models for 2-D pattern data using
    the concept of Master templates
  • Member are specified as sub-templates
  • Each has an integer label
  • Provided a measure of fitness for members of the
    population
  • Introduced the genetic operators
  • Point mutations
  • Crossovers
  • Thus can move through the given space of 2-D
    probabilistic models in search for the best one
Write a Comment
User Comments (0)
About PowerShow.com