Continuous States and Distributed Symbols: Toward a Biological Theory of Computation - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Continuous States and Distributed Symbols: Toward a Biological Theory of Computation

Description:

For this reason, it makes little sense to view mental states as countable, let alone finite. ... Although both continuous-state computation and distributed ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 2
Provided by: csW7
Category:

less

Transcript and Presenter's Notes

Title: Continuous States and Distributed Symbols: Toward a Biological Theory of Computation


1
Continuous States and Distributed Symbols Toward
a Biological Theory of Computation
Simon D. LevyComputer Science Department Washingt
on and Lee University Lexington, VA
24450 levys_at_wlu.edu
Introduction
If we want to imitate human memory with models,
we must take account of the weaknesses of the
nervous system as well as its powers. D. Gabor
1
The classical theory of computation rests on two
fundamental assumptions states are finite, and
symbols are atomic. Although automata built on
these assumptions are extremely successful at
solving many computational tasks, the assumptions
are highly implausible for human and animal
cognition. First, the signals used by the brain
and other biological systems are mainly
continuous, as evidenced by the widespread use of
differential equations in modeling these systems.
For this reason, it makes little sense to view
mental states as countable, let alone finite.
Second, there is very little reason to believe
that mental representations involve
locally-stored atomic symbols. Consequently,
classical pointer-based discrete structures over
such symbols, and algorithms operating on such
structures, are not biologically realistic.
Experimental evidence instead favors a view in
which the representations of entities, concepts,
relations, etc., are distributed over a large
number of individually meaningless elements in a
way that supports similarity metrics and
content-based retrieval. Although both
continuous-state computation and distributed
representations have received a fair amount of
research attention, it is uncommon to see them
discussed together in the unconventional-computati
on literature (except, perhaps, as part of a
general survey). In our presentation we argue
that a biologically plausible theory of
computation will require both a continuous-state
automaton component and a distributed-memory
component, much as a classical pushdown
automaton uses both a finite-state automaton and
a pushdown stack. We show further that
stack-like operations (PUSH and POP) over
distributed representations can be performed as
simple vector addition and scalar multiplication,
in a way reminiscent of foreground/background
effects in visual processing. This possibility
suggests that higher mental functions like
language and abstract thought might be exploiting
existing neural circuitry already available for
other purposes. We conclude with a simple visual
example and some speculation about possible new
directions and guiding principles for
biologically-inspired unconventional
computation.
Figure 3. Unbinding the tensor representation of
John loves Mary by probing with the LOVEE role
produces a noisy version of MARY.
Because the dimension of the role/filler binding
increases with each binding operation, tensor
products grow exponentially as more recursive
embedding is performed. The solution is to
collapse the bound N N role/filler matrix back
into a length-N vector. As shown in Figure 4,
there are two ways of doing this. In Binary
Spatter Codes (BSC 9), only the elements along
the main diagonal are kept, and the rest are
discarded. In Holographic Reduced
Representations, or HRR 10, the sum of each
diagonal is taken, with wraparound (circular
convolution) keeping the length of all diagonals
equal. Both approaches use very large (gt 1000
element) vectors of random values drawn from a
fixed set or interval. Despite the size of the
representations, both approaches are
computationally efficient, requiring no
back-propagation or other costly iterative
algorithm, and can be done in parallel. Even in a
serial implementation, the BSC approach is O(N)
for a vector of length N, and the HRR approach
can be implemented using the Fast Fourier
Transform, which is O(N log N). The price paid is
that the binding operation becomes a variety of
lossy compression, collapsing N 2 pieces of
information down to N. As with the noise
introduced by the unbinding operation, this noise
can be dealt with by a cleanup memory.
Figure 4. Two methods for maintaining fixed
dimensionality in tensor-product representations.
Holographic Reduced Representation sums along
main diagonal and off diagonals. Binary Spatter
Code uses only elements on main diagonal.
Dynamical Automata
Stacking Distributed Representations
Dynamical automata 2, 3, 4 are
neurally-inspired models designed specifically to
behave like pushdown automata (PDA) recognizing
infinite languages. The back- propagation
algorithm 5 can be used to train dynamical
automata to recognize different languages 2,
or the automata can be hard-wired to recognize
specific languages. 4 Either way, their
functioning is built on a simple principle given
sufficient numerical precision, the state of the
machine can be represented as a single number or
set of numbers. Push and pop operations can then
represented as multiplication by various
constants, implemented as weighted connections in
a recurrent neural network. For example, Figure
1 illustrates a dynamical automaton that
recognizes the language of balanced parentheses.
This automaton starts with an empty-stack value
of 1, multiplies by 0.5 on seeing a left
parenthesis, and by 2 modulo 2 on seeing a right
parenthesis. 2 Input is accepted when the stack
value gets sufficiently close to 1 (e.g., gt
0.75). The stack and discrete state space of the
classical PDA are thus replaced by a continuous
state space having a fractal property. 6
Dynamical automata thus provide a principle by
which infinite-state recursion can be performed
in a neural-like architecture.
Consistent with an approach driven by neural
plausibility, we would like a way to represent
the classical automata-theoretic operations
(PUSH, POP) in a neurally plausible algorithm.
One way to do this is to modify the Hopfield
network learning equation via a coefficient on
each vector to be learned
Here, wij is the weight to be learned between the
ith and jth elements of vector x p is the
number of such vector patterns to be learned, and
a is the strength coefficient assigned to each
vector. By increasing or decreasing a over the
sequence of patterns, a stack- or queue-like
behavior can be easily implemented. As with a
standard Hopfield network, the top pattern on
the stack/queue can then be recovered by
iterating the following equation until its output
converges
0.5
?
where ui 1 initially then ui 0 for si lt0 ui
1 otherwise. Popping the stack then corresponds
to running the Hopfield unlearning algorithm on
this recovered image, using the additive inverse
of the coefficient associated with learning it.
Because it is difficult to make visual sense of
random vectors, Figure 5 illustrates the use of
this Hopfield Stack network to encode images
specifically, 7050-pixel images of three Civil
War Generals. Patterns are represented as bit
vectors of length 3500, with a 3500 3500 matrix
of weights implementing the stack. Contrast the
stack visualization at right with a classical
stack, in which only the top (last-in) object
would be visible.
g
f
z
?
?
2.0
Figure 1. A neural network implementing a
dynamical automaton that accepts the language of
balanced parentheses. The node labeled z holds a
state variable whose initial value is 1. Nodes
labeled P compute the products of their inputs,
node labeled ? the sum, weighted by the values on
the edges. (Unlabeled edges have a weight of
1.0.) Node f is the modulo 2 function. The dashed
line represents a feedback connection that copies
f s output back to z at each input step. Node g
is a threshold function that outputs 1 (accept)
when its input goes above 0.75, and 0 (reject)
otherwise. The left-parenthesis symbol is encoded
as the vector 1 0T , and the right parenthesis
as 0 1T .
Figure 5. A Hopfield Stack for images. Image
at right represents the result of pushing the
first three images in the order given, with a1
1.0 a2 2.0 a3 3.0.
Conclusions and Future Work
Tensor Products and Related Models
This presentation has shown concrete ways in
which biologically motivated representations can
be used to perform some of the critical
operations associated with classical computation
namely, tracking state and maintaining a stack
or queue of recursively decomposable symbols.
This approach contrasts favorably with localist
representations, which merely implement classical
computation with a one neuron per symbol
approach, or which ignore the issue of symbol
content entirely. The next obvious step would be
to build a model integrating the state and stack
components in a way that supports learning
grammar-like mappings between recursively
structured meanings and symbol sequences for
parsing and generating language. Such a model
could represent a significant step in overcoming
the unnaturalness of conventional computing
approaches to cognition, AI, and related fields.
Classical computation relies crucially on the
ability to bind values (e.g., 3.14) to variables
(x) and roles (AGENT/PATIENT) to fillers
(JOHN/MARY), and to represent such bindings in a
way that supports recursion. Tensor-product
models represent an effort to deal explicitly
role/filler, variable/value, and other binding
tasks in a distributed representation.
Tensor-product models represent fillers (and
roles) as vectors of values, supporting
distributed representations of arbitrary size. In
the simplest formulation 7, roles are vectors
of the same length as their fillers. Binding is
implemented by taking the tensor (outer) product
of a role vector and a filler vector, resulting
in a mathematical object (matrix) having one more
dimension than the filler. Given vectors of
sufficient length, each role/filler matrix will
be unique. As shown in Figure 2, another crucial
property of such representations is that
role/filler s can be bundled to produce more
complex structures, through simple element-wise
addition. This capability opens the door to
recursion, allowing entire bundles of structure
(John loves Mary) to fill roles (Bill thinks John
loves Mary).
References
1 Gabor, D. Improved holographic model of
temporal recall. Nature 217 (1968) 1288-1289 2
Pollack, J. The induction of dynamical
recognizers. Machine Learning 7 (1991)
227-252 3 Moore, C. Dynamical recognizers
Real-time language recognition by analog
computers. Theoretical Computer Science
201 (1998) 99-136 4 Tabor, W. Dynamical
automata. Technical Report Tr98-1694, Computer
Science Department, Cornell University
(1998) 5 Rumelhart, D., Hinton, G., Williams,
R. Learning internal representation by error
propagation. In Rumelhart, D., McClelland, J.,
eds. Parallel Distributed Processing
Explorations in the Microstructure of Cognition.
Volume 1. MIT Press (1986) 6 Mandelbrot, B.B.
The Fractal Geometry of Nature. W.H. Freeman and
Company (1988) 7 Smolensky, P. Tensor
product variable binding and the representation
of symbolic structures in connectionist
systems. Artificial Intelligence 46 (1990)
159216 8 Hopfield, J. Neural networks and
physical systems with emergent collective com-
putational abilities. Proceedings of the
National Academy of Sciences 79 (1982)
25542558 9 Kanerva, P. The binary spatter
code for encoding concepts at many levels. In
Marinaro, M., Morasso, P., eds. ICANN 94
Proceedings of International Conference on
Artificial Neural Networks. Volume 1., London,
Springer-Verlag (1994) 226229 10 Plate, T.A.
Holographic Reduced Representation Distributed
Representation for Cognitive Science. CSLI
Publications (2003)
Figure 2. Binding and bundling with tensor
products to represent the proposition John loves
Mary. Roles are column vectors, fillers row
vectors. Black squares represent nonzero values,
white zero values.
Crucially, the original fillers of all roles must
be recoverable from such bundled
representations i.e., there must be an unbinding
operation as well. Figure 3 shows example of
unbinding, in which the transposed role vector
LOVEE is multiplied by the bundled tensor product
for John loves Mary, producing a noisy version of
the original vector for the filler MARY. The
introduction of noise requires that the unbinding
process employ a cleanup memory to restore
the fillers to their original form. The cleanup
memory can be implemented using
neurally-plausible Hebbian auto-association, like
a Hopfield Network. 8 In such networks the
original fillers are attractor basins in the
networks dynamical state space.
Write a Comment
User Comments (0)
About PowerShow.com