Connectionist Knowledge Representation and Reasoning (Part I) - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Connectionist Knowledge Representation and Reasoning (Part I)

Description:

Title: Intelligent Systems on the World Wide Web OWL Author: Marc Ehrig Last modified by: barbara Created Date: 4/30/2003 10:00:19 AM – PowerPoint PPT presentation

Number of Views:161
Avg rating:3.0/5.0
Slides: 61
Provided by: MarcEh150
Category:

less

Transcript and Presenter's Notes

Title: Connectionist Knowledge Representation and Reasoning (Part I)


1
Connectionist Knowledge Representation and
Reasoning(Part I)
SCREECH
  • Neural Networks and Structured Knowledge

Fodor, Pylyshin Whats deeply wrong with
Connectionist architecture is this Because it
acknowledges neither syntactic nor semantic
structure in mental representations, it perforce
treats them not as a generated set but as a
list. Connectionism and Cognitive Architecture
88
Our claim state-of-the-art connectionist
architectures do adequately deal with structures!
2
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

3
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

4
The good old days KBANN and co.
feedforward neural network
  1. black box
  2. distributed representation
  3. connection to rules for symbolic I/O ?

y
x
neuron
fw Rn ? Ro
5
The good old days KBANN and co.
  • Knowledge Based Artificial Neural Networks
    Towell/Shavlik, AI 94
  • start with a network which represents known rules
  • train using additional data
  • extract a set of symbolic rules after training

6
The good old days KBANN and co.
7
The good old days KBANN and co.
data
train
use some form of backpropagation, add a penalty
to the error e.g. for changing the weights
  1. The initial network biases the training result,
    but
  2. There is no guarantee that the initial rules are
    preserved
  3. There is no guarantee that the hidden neurons
    maintain their semantic

8
The good old days KBANN and co.
(complete) rules
  1. There is no exact direct correspondence of a
    neuron and a single rule, although each neuron
    (and the overall mapping) can be approximated by
    a set of rules arbitrarily well
  2. It is NP complete to find a minimum logical
    description for a trained network Golea,
    AISB'96
  3. Therefore, a couple of different rule extraction
    algorithms have been proposed, and this is still
    a topic of ongoing research

9
The good old days KBANN and co.
(complete) rules
decompositional approach
pedagogical approach
10
The good old days KBANN and co.
  • Decompositional approaches
  • subset algorithm, MofN algorithm describe single
    neurons by sets of active predecessors
    Craven/Shavlik, 94
  • local activation functions (RBF like) allow an
    approximate direct description of single neurons
    Andrews/Geva, 96
  • MLP2LN biases the weights towards 0/-1/1 during
    training and can then extract exact rules Duch
    et al., 01
  • prototype based networks can be decomposed along
    relevant input dimensions by decision tree nodes
    Hammer et al., 02
  • Observation
  • usually some variation of if-then rules is
    achieved
  • small rule sets are only achieved if further
    constraints guarantee that single weights/neurons
    have a meaning
  • tradeoff between accuracy and size of the
    description

11
The good old days KBANN and co.
  • Pedagogical approaches
  • extraction of conjunctive rules by extensive
    search Saito/Nakano 88
  • interval propagation Gallant 93, Thrun 95
  • extraction by minimum separation
    Tickle/Diderich, 94
  • extraction of decision trees Craven/Shavlik, 94
  • evolutionary approaches Markovska, 05
  • Observation
  • usually some variation of if-then rules is
    achieved
  • symbolic rule induction required with a little
    (or a bit more) help of a neural network

12
The good old days KBANN and co.
  • Where is this good for?
  • Nobody uses FNNs these days ?
  • Insertion of prior knowledge might be valuable.
    But efficient training algorithms allow to
    substitute this by additional training data
    (generated via rules) ?
  • Validation of the network output might be
    valuable, but there exist alternative (good)
    guarantees from statistical learning theory ?
  • If-then rules are not very interesting since
    there exist good symbolic learners for learning
    propositional rules for classification ?
  • Propositional rule insertion/extraction is often
    an essential part of more complex rule
    insertion/extraction mechanisms ?
  • Demonstrates a key problem, different modes of
    representation, in a very nice way ?
  • Some people e.g. in the medical domain also want
    an explanation for a classification ?
  • There are at least two application domains where
    if-then rules are very interesting and not so
    easy to learn fuzzy-control and unsupervised
    data mining ?

13
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

14
Useful neurofuzzy systems
process
input
observation
control
Fuzzy control
if (observation ? FMI) then (control ? FMO)
15
Useful neurofuzzy systems
Fuzzy control
if (observation ? FMI) then (control ? FMO)
Neurofuzzy control
Benefit the form of the fuzzy rules (i.e. neural
architecture) and the shape of the fuzzy sets
(i.e. neural weights) can be learned from data!
16
Useful neurofuzzy systems
  • NEFCON implements Mamdani control
    Nauck/Klawonn/Kruse, 94
  • ANFIS implements Takagi-Sugeno control Jang, 93
  • and many other
  • Learning
  • of rules evolutionary or clustering
  • of fuzzy set parameters reinforcement learning
    or some form of Hebbian learning

17
Useful data mining pipeline
  • Task describe given inputs (no class
    information) by if-then rules
  • Data mining with emergent SOM, clustering, and
    rule extraction Ultsch, 91

18
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

19
State of the art structure kernels
kernel k(x,x)
data
just compute pairwise distances for this complex
data using structure information
sets, sequences, tree structures, graph
structures
20
State of the art structure kernels
  • Closure properties of kernels Haussler, Watkins
  • Principled problems for complex structures
    computing informative graph kernels is at least
    as hard as graph isomorphism Gärtner
  • Several promising proposals - taxonomy Gärtner

derived from local transformations
semantic
count common substructures
derived from a probabilistic model
syntax
21
State of the art structure kernels
  • Count common substructures

GA AG AT
Efficient computation dynamic programming suffix
trees
GAGAGA
3 2 0
3
GAT
1 0 1
locality improved kernel Sonnenburg et al., bag
of words Joachims string kernel Lodhi et al.,
spectrum kernel Leslie et al. word-sequence
kernel Cancedda et al.
convolution kernels for language Collins/Duffy,
Kashima/Koyanagi, Suzuki et al. kernels for
relational learning Zelenko et al.,Cumby/Roth,
Gärtner et al.
graph kernels based on paths or subtrees Gärtner
et al.,Kashima et al. kernels for prolog trees
based on similar symbols Passerini/Frasconi/deRae
dt
22
State of the art structure kernels
  • Derived from a probabilistic model

describe by probabilistic model P(x)
compare characteristics of P(x)
Fisher kernel Jaakkola et al., Karchin et al.,
Pavlidis et al., Smith/Gales, Sonnenburg et al.,
Siolas et al. tangent vector of log odds Tsuda
et al. marginalized kernels Tsuda et al.,
Kashima et al.
kernel of Gaussian models Moreno et al.,
Kondor/Jebara
23
State of the art structure kernels
  • Derived from local transformations

is similar to
expand to a global kernel
local neighborhood, generator H
diffusion kernel Kondor/Lafferty,
Lafferty/Lebanon, Vert/Kanehisa
24
State of the art structure kernels
  • Intelligent preprocessing (kernel extraction)
    allows an adequate integration of
    semantic/syntactic structure information
  • This can be combined with state of the art neural
    methods such as SVM
  • Very promising results for
  • Classification of documents, text Duffy, Leslie,
    Lodhi,
  • Detecting remote homologies for genomic sequences
    and further problems in genome analysis
    Haussler, Sonnenburg, Vert,
  • Quantitative structure activity relationship in
    chemistry Baldi et al.

25
Conclusions feedforward networks
  • propositional rule insertion and extraction (to
    some extend) are possible ?
  • useful for neurofuzzy systems, data mining ?
  • structure-based kernel extraction followed by
    learning with SVM yields state of the art results
    ?
  • but sequential instead of fully integrated
    neuro-symbolic approach ?
  • FNNs itself are restricted to flat data which can
    be processed in one shot. No recurrence ?

26
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics partially recurrent networks
  • Lots of theory principled capacity and
    limitations
  • To do challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

27
The basics partially recurrent networks
Elman, Finding structure in time, CogSci
90 very natural architecture for processing
speech/temporal signals/control/robotics
  1. can process time series of arbitrary length
  2. interesting for speech processing see e.g.
    Kremer, 02
  3. training using a variation of backpropagation
    see e.g. Pearlmutter, 95

xt1 f(xt,It)
28
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

29
Lots of theory principled capacity and
limitations
  • RNNs and finite automata Omlin/Giles, 96

input
output
state
dynamics of the transition function of a DFA
30
Lots of theory principled capacity and
limitations
  • DFA ? RNN

unary input
implement (approximate) the boolean formula
corresponding to the state transition within a
two-layer network
unary state representation
? RNNs can exactly simulate finite automata
31
Lots of theory principled capacity and
limitations
  • RNN ? DFA

unary input
cluster into disjoint subsets corresponding to
states and observe their behavior ? approximate
description
in general distributed state representation
? approximate extraction of automata rules is
possible
32
Lots of theory principled capacity and
limitations
  • The principled capacity of RNNs can be
    characterized exactly

RNNs with arbitrary weights non uniform Boolean
circuits (super Turing capability)
Siegelmann/Sontag
RNNs with rational weights Turing
machines Siegelmann/Sontag
RNNs with limited noise finite state
automata Omlin/Giles, Maass/Orponen
RNNs with small weights or Gaussian noise
finite memory models Hammer/Tino, Maass/Sontag
33
Lots of theory principled capacity and
limitations
  • However, learning might be difficult
  • gradient based learning schemes face the problem
    of long-term-dependencies Bengio/Frasconi
  • RNNs are not PAC-learnable (infinite VC-dim),
    only distribution dependent bounds can be derived
    Hammer
  • there exist only few general guarantees for the
    long term behavior of RNNs, e.g. stability
    Suykens, Steil,

error
?
?
tatata
tatatatatata
tatatatatatatatatatatata
tatatatatatatatatatatatatatatatatatatatatatatatata
ta
34
Lots of theory principled capacity and
limitations
  • RNNs
  • naturally process time series ?
  • incorporate plausible regularization such as a
    bias towards finite memory models ?
  • have sufficient power for interesting dynamics
    (context free, context sensitive arbitrary
    attractors and chaotic behavior) ?
  • but
  • training is difficult ?
  • only limited guarantees for the long term
    behavior and generalization ability ?
  • ? symbolic description/knowledge can provide
    solutions

35
Lots of theory principled capacity and
limitations
recurrent symbolic system
RNN
correspondence? e.g. attractor/repellor for
counting ? anbncn
real numbers iterated function systems give rise
to fractals/attractors/chaos, implicit memory
discrete states crisp boolean function on the
states explicit memory
36
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

37
To do challenges
  • Training RNNs
  • search for appropriate regularizations inspired
    by a focus on specific functionalities
    architecture (e.g. local), weights (e.g.
    bounded), activation function (e.g. linear), cost
    term (e.g. additional penalties)
    Hochreiter,Boden,Steil,Kremer
  • insertion of prior knowledge finite automata
    and beyond (e.g. context free/sensitive, specific
    dynamical patterns/attractors) Omlin,Croog,
  • Long term behavior
  • enforce appropriate constraints while training
  • investigate the dynamics of RNNs rule
    extraction, investigation of attractors, relating
    dynamics and symbolic processing
    Omlin,Pasemann,Haschke,Rodriguez,Tino,

38
To do challenges
  • Some further issues
  • processing spatial data
  • unsupervised processing

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
bicausal networks, Pollastri et al., contextual
RCC, Micheli et al.
TKM, Chappell/Taylor, RecSOM, Voegtlin, SOMSD,
Sperduti et al., MSOM, Hammer et al., general
formulation, Hammer et al.
39
Conclusions recurrent networks
  • the capacity of RNNs is well understood and
    promising e.g. for natural language processing,
    control, ?
  • recurrence of symbolic systems has a natural
    counterpart in the recurrence of RNNs ?
  • training and generalization faces problems which
    could be solved by hybrid systems ?
  • discrete dynamics with explicit memory versus
    real-valued iterated function systems ?
  • sequences are nice, but not enough ?

40
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

41
The general idea recursive distributed
representations
  • How to turn tree structures/acyclic graphs into a
    connectionist representation?

42
The general idea recursive distributed
representations
recursion!
inp.
fRixRcxRc?Rc yields
output
cont.
fenc where fenc(?)0 fenc(a(l,r)) f(a,fenc(l),fen
c(r))
cont.
43
The general idea recursive distributed
representations
encoding fenc(Rn)2?Rc fenc(?)
0 fenc(a(l,r)) f(a,fenc(l),fenc(r))
right
decoding hdec Ro ?(Rn)2 hdec(0) ? hdec(x)
h0(x) (hdec(h1(x)), hdec(h2(x)))
fRn2c?Rc
gRc?Ro
hRo?Rn2o
44
The general idea recursive distributed
representations
  • recursive distributed description Hinton,90
  • general idea without concrete implementation ?
  • tensor construction Smolensky, 90
  • encoding/decoding given by (a,b,c) ? a?b?c
  • increasing dimensionality ?
  • Holographic reduced representation Plate, 95
  • circular correlation/convolution
  • fixed encoding/decoding with fixed dimensionality
    (but potential loss of information) ?
  • necessity of chunking or clean-up for decoding ?
  • Binary spatter codes Kanerva, 96
  • binary operations, fixed dimensionality,
    potential loss
  • necessity of chunking or clean-up for decoding ?
  • RAAM Pollack,90, LRAAM Sperduti, 94
  • trainable networks, trained for the identity,
    fixed dimensionality
  • encoding optimized for the given training set ?

45
The general idea recursive distributed
representations
  • Nevertheless results not promising ?
  • Theorem Hammer
  • There exists a fixed size neural network which
    can uniquely encode tree structures of arbitrary
    depth with discrete labels ?
  • For every code, decoding of all trees up to
    height T requires O(2T) neurons for sigmoidal
    networks ?
  • ? encoding seems possible, but no fixed size
    architecture exists for decoding

46
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

47
One breakthrough recursive networks
  • Recursive networks Goller/Küchler, 96
  • do not use decoding
  • combine encoding and mapping
  • train this combination directly for the given
    task with backpropagation through structure
  • ? efficient data and problem adapted encoding is
    learned

encoding
transformation
y
48
One breakthrough recursive networks
  • Applications
  • term classification Goller, Küchler, 1996
  • automated theorem proving Goller, 1997
  • learning tree automata Küchler, 1998
  • QSAR/QSPR problems Schmitt, Goller, 1998
    Bianucci, Micheli, Sperduti, Starita, 2000
    Vullo, Frasconi, 2003
  • logo recognition, image processing Costa,
    Frasconi, Soda, 1999, Bianchini et al. 2005
  • natural language parsing Costa, Frasconi, Sturt,
    Lombardo, Soda, 2000,2005
  • document classification Diligenti, Frasconi,
    Gori, 2001
  • fingerprint classification Yao, Marcialis, Roli,
    Frasconi, Pontil, 2001
  • prediction of contact maps Baldi, Frasconi,
    Pollastri, Vullo, 2002
  • protein secondary structure prediction Frasconi
    et al., 2005

49
One breakthrough recursive networks
Desired approximation completeness - for every
(reasonable) function f and egt0 exists a RecNN
which approximates f up to e (with appropriate
distance measure)
  • Approximation properties can be measured in
    several ways
  • given f, e, probability P, data points xi, find
    fw such that
  • P(x f(x)-fw(x) gt e ) small (L1 norm) or
  • f(x)-fw(x) lt e for all x (max norm) or
  • f(xi) fw(xi) for all xi (interpolation of
    points)

50
One breakthrough recursive networks
  • Approximation properties for RecNNs and
    tree-structured data
  • ? capable of approximating every continuous
    function in max-norm with restricted height,
    every measurable function in L1-norm
    (ssquashing) Hammer
  • ? capable of interpolating every set
    f(x1),,f(xm) with O(m2) neurons (ssquashing,
    C2 in environment of t s.t. s(t)?0) Hammer
  • ? can approximate every tree automaton for
    arbitrary large inputs Küchler
  • ? ... cannot approximate every f12?0,1
    (for realistic s) Hammer
  • fairly good results - 31 ?

51
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

52
Going on towards more complex structures
  • More general trees
  • arbitrary number of not positioned children

fencf(1/ch ?ch w fenc(ch)
labeledge,label)
approximation complete for appropriate edge
labels Bianchini et al. 2005
53
Going on towards more complex structures
  • Planar graphs

Baldi,Frasconi,,2002

54
Going on towards more complex structures
  • Acyclic graphs

q1
Contextual cascade correlation Micheli,Sperduti,0
3 Approximation complete (under a mild
structural restriction) even for structural
transduction Hammer,Micheli,Sperduti,05
55
Going on towards more complex structures
  • Cyclic graphs

neighbor
Micheli,05
56
Conclusions recursive networks
  • Very promising neural architectures for direct
    processing of tree structures ?
  • Successful applications and mathematical
    background ?
  • Connections to symbolic mechanisms (tree
    automata) ?
  • Extensions to more complex structures (graphs)
    are under development ?
  • Only few approaches which achieve structured
    outputs ?

57
Tutorial Outline (Part I)Neural networks and
structured knowledge
  • Feedforward networks
  • The good old days KBANN and co.
  • Useful neurofuzzy systems, data mining pipeline
  • State of the art structure kernels
  • Recurrent networks
  • The basics Partially recurrent networks
  • Lots of theory Principled capacity and
    limitations
  • To do Challenges
  • Recursive data structures
  • The general idea recursive distributed
    representations
  • One breakthrough recursive networks
  • Going on towards more complex structures

58
Conclusions (Part I)
  • Overview literature
  • FNN and rules Duch,Setiono,Zurada,Computational
    intelligence methods for understanding of data,
    Proc. of the IEEE 92(5)771- 805, 2004
  • Structure kernels Gärtner,Lloyd,Flach, Kernels
    and distances for structured data, Machine
    Learning, 57, 2004. (new overview is forthcoming)
  • RNNs Hammer,Steil, Perspectives on learning with
    recurrent networks, in Verleysen, ESANN'2002,
    D-side publications, 357-368, 2002
  • RNNs and rules Jacobsson, Rule extraction from
    recurrent neural networks a taxonomy and review,
    Neural Computation, 171223-1263, 2005
  • Recursive representations Hammer, Perspectives
    on Learning Symbolic Data with Connectionistic
    Systems, in Kühn, Menzel, Menzel, Ratsch,
    Richter, Stamatescu, Adaptivity and Learning,
    141-160, Springer, 2003.
  • Recursive networks Frasconi,Gori,Sperduti, A
    General Framework for Adaptive Processing of Data
    Structures, IEEE Transactions on Neural Networks,
    9(5)768-786,1998
  • Neural networks and structures Hammer,Jain,
    Neural methods for non-standard data, in
    Verleysen, ESANN'2004, D-side publications,
    281-292, 2004

59
Conclusions (Part I)
  • There exist networks which can directly deal with
    structures (sequences, trees, graphs) with good
    success kernel machines, recurrent and recursive
    networks
  • Efficient training algorithms and theoretical
    foundations exist
  • (Loose) connections to symbolic processing have
    been established and indicate benefits
  • Now towards strong connections
  • ? PART II Logic and neural networks

60
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com