Connectionist Knowledge Representation and Reasoning (Part I)

About This Presentation

Title:

Connectionist Knowledge Representation and Reasoning (Part I)

Description:

Title: Intelligent Systems on the World Wide Web OWL Author: Marc Ehrig Last modified by: barbara Created Date: 4/30/2003 10:00:19 AM – PowerPoint PPT presentation

Number of Views:161

Avg rating:3.0/5.0

Slides: 61

Provided by: MarcEh150

Category:

more less

Transcript and Presenter's Notes

Title: Connectionist Knowledge Representation and Reasoning (Part I)

1
Connectionist Knowledge Representation and
Reasoning(Part I)
SCREECH

Neural Networks and Structured Knowledge

Fodor, Pylyshin Whats deeply wrong with
Connectionist architecture is this Because it
acknowledges neither syntactic nor semantic
structure in mental representations, it perforce
treats them not as a generated set but as a
list. Connectionism and Cognitive Architecture
88
Our claim state-of-the-art connectionist
architectures do adequately deal with structures!
2
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

3
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

4
The good old days KBANN and co.
feedforward neural network

black box
distributed representation
connection to rules for symbolic I/O ?

y
x
neuron
fw Rn ? Ro
5
The good old days KBANN and co.

Knowledge Based Artificial Neural Networks
Towell/Shavlik, AI 94
start with a network which represents known rules
train using additional data
extract a set of symbolic rules after training

6
The good old days KBANN and co.
7
The good old days KBANN and co.
data
train
use some form of backpropagation, add a penalty
to the error e.g. for changing the weights

The initial network biases the training result,
but
There is no guarantee that the initial rules are
preserved
There is no guarantee that the hidden neurons
maintain their semantic

8
The good old days KBANN and co.
(complete) rules

There is no exact direct correspondence of a
neuron and a single rule, although each neuron
(and the overall mapping) can be approximated by
a set of rules arbitrarily well
It is NP complete to find a minimum logical
description for a trained network Golea,
AISB'96
Therefore, a couple of different rule extraction
algorithms have been proposed, and this is still
a topic of ongoing research

9
The good old days KBANN and co.
(complete) rules
decompositional approach
pedagogical approach
10
The good old days KBANN and co.

Decompositional approaches
subset algorithm, MofN algorithm describe single
neurons by sets of active predecessors
Craven/Shavlik, 94
local activation functions (RBF like) allow an
approximate direct description of single neurons
Andrews/Geva, 96
MLP2LN biases the weights towards 0/-1/1 during
training and can then extract exact rules Duch
et al., 01
prototype based networks can be decomposed along
relevant input dimensions by decision tree nodes
Hammer et al., 02
Observation
usually some variation of if-then rules is
achieved
small rule sets are only achieved if further
constraints guarantee that single weights/neurons
have a meaning
tradeoff between accuracy and size of the
description

11
The good old days KBANN and co.

Pedagogical approaches
extraction of conjunctive rules by extensive
search Saito/Nakano 88
interval propagation Gallant 93, Thrun 95
extraction by minimum separation
Tickle/Diderich, 94
extraction of decision trees Craven/Shavlik, 94
evolutionary approaches Markovska, 05
Observation
usually some variation of if-then rules is
achieved
symbolic rule induction required with a little
(or a bit more) help of a neural network

12
The good old days KBANN and co.

Where is this good for?
Nobody uses FNNs these days ?
Insertion of prior knowledge might be valuable.
But efficient training algorithms allow to
substitute this by additional training data
(generated via rules) ?
Validation of the network output might be
valuable, but there exist alternative (good)
guarantees from statistical learning theory ?
If-then rules are not very interesting since
there exist good symbolic learners for learning
propositional rules for classification ?
Propositional rule insertion/extraction is often
an essential part of more complex rule
insertion/extraction mechanisms ?
Demonstrates a key problem, different modes of
representation, in a very nice way ?
Some people e.g. in the medical domain also want
an explanation for a classification ?
There are at least two application domains where
if-then rules are very interesting and not so
easy to learn fuzzy-control and unsupervised
data mining ?

13
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

14
Useful neurofuzzy systems
process
input
observation
control
Fuzzy control
if (observation ? FMI) then (control ? FMO)
15
Useful neurofuzzy systems
Fuzzy control
if (observation ? FMI) then (control ? FMO)
Neurofuzzy control
Benefit the form of the fuzzy rules (i.e. neural
architecture) and the shape of the fuzzy sets
(i.e. neural weights) can be learned from data!
16
Useful neurofuzzy systems

NEFCON implements Mamdani control
Nauck/Klawonn/Kruse, 94
ANFIS implements Takagi-Sugeno control Jang, 93
and many other
Learning
of rules evolutionary or clustering
of fuzzy set parameters reinforcement learning
or some form of Hebbian learning

17
Useful data mining pipeline

Task describe given inputs (no class
information) by if-then rules
Data mining with emergent SOM, clustering, and
rule extraction Ultsch, 91

18
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

19
State of the art structure kernels
kernel k(x,x)
data
just compute pairwise distances for this complex
data using structure information
sets, sequences, tree structures, graph
structures
20
State of the art structure kernels

Closure properties of kernels Haussler, Watkins
Principled problems for complex structures
computing informative graph kernels is at least
as hard as graph isomorphism Gärtner
Several promising proposals - taxonomy Gärtner

derived from local transformations
semantic
count common substructures
derived from a probabilistic model
syntax
21
State of the art structure kernels

Count common substructures

GA AG AT
Efficient computation dynamic programming suffix
trees
GAGAGA
3 2 0
3
GAT
1 0 1
locality improved kernel Sonnenburg et al., bag
of words Joachims string kernel Lodhi et al.,
spectrum kernel Leslie et al. word-sequence
kernel Cancedda et al.
convolution kernels for language Collins/Duffy,
Kashima/Koyanagi, Suzuki et al. kernels for
relational learning Zelenko et al.,Cumby/Roth,
Gärtner et al.
graph kernels based on paths or subtrees Gärtner
et al.,Kashima et al. kernels for prolog trees
based on similar symbols Passerini/Frasconi/deRae
dt
22
State of the art structure kernels

Derived from a probabilistic model

describe by probabilistic model P(x)
compare characteristics of P(x)
Fisher kernel Jaakkola et al., Karchin et al.,
Pavlidis et al., Smith/Gales, Sonnenburg et al.,
Siolas et al. tangent vector of log odds Tsuda
et al. marginalized kernels Tsuda et al.,
Kashima et al.
kernel of Gaussian models Moreno et al.,
Kondor/Jebara
23
State of the art structure kernels

Derived from local transformations

is similar to
expand to a global kernel
local neighborhood, generator H
diffusion kernel Kondor/Lafferty,
Lafferty/Lebanon, Vert/Kanehisa
24
State of the art structure kernels

Intelligent preprocessing (kernel extraction)
allows an adequate integration of
semantic/syntactic structure information
This can be combined with state of the art neural
methods such as SVM
Very promising results for
Classification of documents, text Duffy, Leslie,
Lodhi,
Detecting remote homologies for genomic sequences
and further problems in genome analysis
Haussler, Sonnenburg, Vert,
Quantitative structure activity relationship in
chemistry Baldi et al.

25
Conclusions feedforward networks

propositional rule insertion and extraction (to
some extend) are possible ?
useful for neurofuzzy systems, data mining ?
structure-based kernel extraction followed by
learning with SVM yields state of the art results
?
but sequential instead of fully integrated
neuro-symbolic approach ?
FNNs itself are restricted to flat data which can
be processed in one shot. No recurrence ?

26
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics partially recurrent networks
Lots of theory principled capacity and
limitations
To do challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

27
The basics partially recurrent networks
Elman, Finding structure in time, CogSci
90 very natural architecture for processing
speech/temporal signals/control/robotics

can process time series of arbitrary length
interesting for speech processing see e.g.
Kremer, 02
training using a variation of backpropagation
see e.g. Pearlmutter, 95

xt1 f(xt,It)
28
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

29
Lots of theory principled capacity and
limitations

RNNs and finite automata Omlin/Giles, 96

input
output
state
dynamics of the transition function of a DFA
30
Lots of theory principled capacity and
limitations

DFA ? RNN

unary input
implement (approximate) the boolean formula
corresponding to the state transition within a
two-layer network
unary state representation
? RNNs can exactly simulate finite automata
31
Lots of theory principled capacity and
limitations

RNN ? DFA

unary input
cluster into disjoint subsets corresponding to
states and observe their behavior ? approximate
description
in general distributed state representation
? approximate extraction of automata rules is
possible
32
Lots of theory principled capacity and
limitations

The principled capacity of RNNs can be
characterized exactly

RNNs with arbitrary weights non uniform Boolean
circuits (super Turing capability)
Siegelmann/Sontag
RNNs with rational weights Turing
machines Siegelmann/Sontag
RNNs with limited noise finite state
automata Omlin/Giles, Maass/Orponen
RNNs with small weights or Gaussian noise
finite memory models Hammer/Tino, Maass/Sontag
33
Lots of theory principled capacity and
limitations

However, learning might be difficult
gradient based learning schemes face the problem
of long-term-dependencies Bengio/Frasconi
RNNs are not PAC-learnable (infinite VC-dim),
only distribution dependent bounds can be derived
Hammer
there exist only few general guarantees for the
long term behavior of RNNs, e.g. stability
Suykens, Steil,

error
?
?
tatata
tatatatatata
tatatatatatatatatatatata
tatatatatatatatatatatatatatatatatatatatatatatatata
ta
34
Lots of theory principled capacity and
limitations

RNNs
naturally process time series ?
incorporate plausible regularization such as a
bias towards finite memory models ?
have sufficient power for interesting dynamics
(context free, context sensitive arbitrary
attractors and chaotic behavior) ?
but
training is difficult ?
only limited guarantees for the long term
behavior and generalization ability ?
? symbolic description/knowledge can provide
solutions

35
Lots of theory principled capacity and
limitations
recurrent symbolic system
RNN
correspondence? e.g. attractor/repellor for
counting ? anbncn
real numbers iterated function systems give rise
to fractals/attractors/chaos, implicit memory
discrete states crisp boolean function on the
states explicit memory
36
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

37
To do challenges

Training RNNs
search for appropriate regularizations inspired
by a focus on specific functionalities
architecture (e.g. local), weights (e.g.
bounded), activation function (e.g. linear), cost
term (e.g. additional penalties)
Hochreiter,Boden,Steil,Kremer
insertion of prior knowledge finite automata
and beyond (e.g. context free/sensitive, specific
dynamical patterns/attractors) Omlin,Croog,
Long term behavior
enforce appropriate constraints while training
investigate the dynamics of RNNs rule
extraction, investigation of attractors, relating
dynamics and symbolic processing
Omlin,Pasemann,Haschke,Rodriguez,Tino,

38
To do challenges

Some further issues
processing spatial data
unsupervised processing

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
bicausal networks, Pollastri et al., contextual
RCC, Micheli et al.
TKM, Chappell/Taylor, RecSOM, Voegtlin, SOMSD,
Sperduti et al., MSOM, Hammer et al., general
formulation, Hammer et al.
39
Conclusions recurrent networks

the capacity of RNNs is well understood and
promising e.g. for natural language processing,
control, ?
recurrence of symbolic systems has a natural
counterpart in the recurrence of RNNs ?
training and generalization faces problems which
could be solved by hybrid systems ?
discrete dynamics with explicit memory versus
real-valued iterated function systems ?
sequences are nice, but not enough ?

40
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

41
The general idea recursive distributed
representations

How to turn tree structures/acyclic graphs into a
connectionist representation?

42
The general idea recursive distributed
representations
recursion!
inp.
fRixRcxRc?Rc yields
output
cont.
fenc where fenc(?)0 fenc(a(l,r)) f(a,fenc(l),fen
c(r))
cont.
43
The general idea recursive distributed
representations
encoding fenc(Rn)2?Rc fenc(?)
0 fenc(a(l,r)) f(a,fenc(l),fenc(r))
right
decoding hdec Ro ?(Rn)2 hdec(0) ? hdec(x)
h0(x) (hdec(h1(x)), hdec(h2(x)))
fRn2c?Rc
gRc?Ro
hRo?Rn2o
44
The general idea recursive distributed
representations

recursive distributed description Hinton,90
general idea without concrete implementation ?
tensor construction Smolensky, 90
encoding/decoding given by (a,b,c) ? a?b?c
increasing dimensionality ?
Holographic reduced representation Plate, 95
circular correlation/convolution
fixed encoding/decoding with fixed dimensionality
(but potential loss of information) ?
necessity of chunking or clean-up for decoding ?
Binary spatter codes Kanerva, 96
binary operations, fixed dimensionality,
potential loss
necessity of chunking or clean-up for decoding ?
RAAM Pollack,90, LRAAM Sperduti, 94
trainable networks, trained for the identity,
fixed dimensionality
encoding optimized for the given training set ?

45
The general idea recursive distributed
representations

Nevertheless results not promising ?
Theorem Hammer
There exists a fixed size neural network which
can uniquely encode tree structures of arbitrary
depth with discrete labels ?
For every code, decoding of all trees up to
height T requires O(2T) neurons for sigmoidal
networks ?
? encoding seems possible, but no fixed size
architecture exists for decoding

46
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

47
One breakthrough recursive networks

Recursive networks Goller/Küchler, 96
do not use decoding
combine encoding and mapping
train this combination directly for the given
task with backpropagation through structure
? efficient data and problem adapted encoding is
learned

encoding
transformation
y
48
One breakthrough recursive networks

Applications
term classification Goller, Küchler, 1996
automated theorem proving Goller, 1997
learning tree automata Küchler, 1998
QSAR/QSPR problems Schmitt, Goller, 1998
Bianucci, Micheli, Sperduti, Starita, 2000
Vullo, Frasconi, 2003
logo recognition, image processing Costa,
Frasconi, Soda, 1999, Bianchini et al. 2005
natural language parsing Costa, Frasconi, Sturt,
Lombardo, Soda, 2000,2005
document classification Diligenti, Frasconi,
Gori, 2001
fingerprint classification Yao, Marcialis, Roli,
Frasconi, Pontil, 2001
prediction of contact maps Baldi, Frasconi,
Pollastri, Vullo, 2002
protein secondary structure prediction Frasconi
et al., 2005

49
One breakthrough recursive networks
Desired approximation completeness - for every
(reasonable) function f and egt0 exists a RecNN
which approximates f up to e (with appropriate
distance measure)

Approximation properties can be measured in
several ways
given f, e, probability P, data points xi, find
fw such that
P(x f(x)-fw(x) gt e ) small (L1 norm) or
f(x)-fw(x) lt e for all x (max norm) or
f(xi) fw(xi) for all xi (interpolation of
points)

50
One breakthrough recursive networks

Approximation properties for RecNNs and
tree-structured data
? capable of approximating every continuous
function in max-norm with restricted height,
every measurable function in L1-norm
(ssquashing) Hammer
? capable of interpolating every set
f(x1),,f(xm) with O(m2) neurons (ssquashing,
C2 in environment of t s.t. s(t)?0) Hammer
? can approximate every tree automaton for
arbitrary large inputs Küchler
? ... cannot approximate every f12?0,1
(for realistic s) Hammer
fairly good results - 31 ?

51
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

52
Going on towards more complex structures

More general trees
arbitrary number of not positioned children

fencf(1/ch ?ch w fenc(ch)
labeledge,label)
approximation complete for appropriate edge
labels Bianchini et al. 2005
53
Going on towards more complex structures

Planar graphs

Baldi,Frasconi,,2002

54
Going on towards more complex structures

Acyclic graphs

q1
Contextual cascade correlation Micheli,Sperduti,0
3 Approximation complete (under a mild
structural restriction) even for structural
transduction Hammer,Micheli,Sperduti,05
55
Going on towards more complex structures

Cyclic graphs

neighbor
Micheli,05
56
Conclusions recursive networks

Very promising neural architectures for direct
processing of tree structures ?
Successful applications and mathematical
background ?
Connections to symbolic mechanisms (tree
automata) ?
Extensions to more complex structures (graphs)
are under development ?
Only few approaches which achieve structured
outputs ?

57
Tutorial Outline (Part I)Neural networks and
structured knowledge

Feedforward networks
The good old days KBANN and co.
Useful neurofuzzy systems, data mining pipeline
State of the art structure kernels
Recurrent networks
The basics Partially recurrent networks
Lots of theory Principled capacity and
limitations
To do Challenges
Recursive data structures
The general idea recursive distributed
representations
One breakthrough recursive networks
Going on towards more complex structures

58
Conclusions (Part I)

Overview literature
FNN and rules Duch,Setiono,Zurada,Computational
intelligence methods for understanding of data,
Proc. of the IEEE 92(5)771- 805, 2004
Structure kernels Gärtner,Lloyd,Flach, Kernels
and distances for structured data, Machine
Learning, 57, 2004. (new overview is forthcoming)
RNNs Hammer,Steil, Perspectives on learning with
recurrent networks, in Verleysen, ESANN'2002,
D-side publications, 357-368, 2002
RNNs and rules Jacobsson, Rule extraction from
recurrent neural networks a taxonomy and review,
Neural Computation, 171223-1263, 2005
Recursive representations Hammer, Perspectives
on Learning Symbolic Data with Connectionistic
Systems, in Kühn, Menzel, Menzel, Ratsch,
Richter, Stamatescu, Adaptivity and Learning,
141-160, Springer, 2003.
Recursive networks Frasconi,Gori,Sperduti, A
General Framework for Adaptive Processing of Data
Structures, IEEE Transactions on Neural Networks,
9(5)768-786,1998
Neural networks and structures Hammer,Jain,
Neural methods for non-standard data, in
Verleysen, ESANN'2004, D-side publications,
281-292, 2004

59
Conclusions (Part I)

There exist networks which can directly deal with
structures (sequences, trees, graphs) with good
success kernel machines, recurrent and recursive
networks
Efficient training algorithms and theoretical
foundations exist
(Loose) connections to symbolic processing have
been established and indicate benefits
Now towards strong connections
? PART II Logic and neural networks

60
(No Transcript)

Write a Comment

User Comments (0)