Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach

About This Presentation

Title:

Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach

Description:

Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center for Mind, Brain ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 68

Provided by: JayM60

Category:

more less

Transcript and Presenter's Notes

Title: Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach

1
Organization and Emergence ofSemantic Knowledge
A Parallel-Distributed Processing Approach

James L. McClelland
Department of Psychology
and
Center for Mind, Brain, and Computation
Stanford University

2
Some Phenomena in Conceptual Development

Progressive differentiation of concepts
Illusory correlations and U-shaped developmental
trajectories
Conceptual reorganization
Domain- and property-specific constraints on
generalization
Acquired sensitivity to an objects causal
properties
What underlies these phenomena?

3
Naïve Domain Theories?

Mechanisms of learning are thought to be too weak
They learn by contiguity and generalize by
similarity
But generalization is domain dependent.
.. so it is proposed instead that development
begins with initial constraints, in the form of
innately pre-specified proto-theories that guide
inference and learning.

4
An Alternative ViewSensitivity to Coherent
Covariation

Coherent Covariation
The tendency of properties of objects to co-occur
in clusters.
e.g.
Has wings
Can fly
Is light
Or
Has roots
Has rigid cell walls
Can grow tall

5
Our Answer in More Detail

Domain general mechanisms sensitive to experience
underlie the development and elaboration of
conceptual knowledge.
These mechanisms exploit the principles of
parallel-distributed processing.
Models built on these principles are sensitive to
coherent covariation.
This sensitivity is the main cause of all of the
phenomena.

6
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/

Processing occurs via interactions among
neuron-like processing units via weighted
connections.
A representation is a pattern of activation.
The knowledge is in the connections.
Learning occurs through gradual connection
adjustment, driven by experience.
Both representation and processing are affected.

H I N T
7
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/

Processing occurs via interactions among
neuron-like processing units via weighted
connections.
A representation is a pattern of activation.
The knowledge is in the connections.
Learning occurs through gradual connection
adjustment, driven by experience.
Both representation and processing are affected.

H I N T
8
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/

Processing occurs via interactions among
neuron-like processing units via weighted
connections.
A representation is a pattern of activation.
The knowledge is in the connections.
Learning occurs through gradual connection
adjustment, driven by experience.
Both representation and processing are affected.

H I N T
9
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/

Processing occurs via interactions among
neuron-like processing units via weighted
connections.
A representation is a pattern of activation.
The knowledge is in the connections.
Learning occurs through gradual connection
adjustment, driven by experience.
Both representation and processing are affected.

H I N T
10
Differentiation in Developmentand in a simple
PDP network
11
(No Transcript)
12
The Rumelhart Model
13
QuilliansHierarchicalPropositional Model
14
The Rumelhart Model Target output for robin
can input
15
The Training Data
All propositions true of items at the bottom
levelof the tree, e.g. Robin can fly, move,
grow
16
Forward Propagation of Activation
17
Back Propagation of Error (d)
aj
wij
ai
di Sdkwki
wki
dk (tk-ak)
Error-correcting learning At the output
layer Dwki edkai At the prior layer Dwij
edjaj
18
The Rumelhart Model
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
What Drives Progressive Differentiation?

Waves of differentiation reflect coherent
covariation of properties across items.
Patterns of coherent covariation are reflected in
the principal components of the property
covariance matrix.
Figure shows attribute loadings on the first
three principal components
1. Plants vs. animals
2. Birds vs. fish
3. Trees vs. flowers
Same color features covary in
component
Diff color anti-covarying
features

23
Now wait just a minute

Didnt you tell the network the taxonomic
organization directly?
Pine ISA Tree, Plant
Robin ISA Bird, Animal
Yes we did.
We do think names kids hear for things affect
their conceptual representations.
But labels arent necessary as long as an items
properties exhibit coherent covariation.

24
Properties Coherent Incoherent
CoherenceTraining Patterns
Items
No labels are provided Each item and each
property occurs with equal frequency
25
Properties Coherent Incoherent
ISCANHAS
Contexts
Items
Note coherence is present between, not within
training experiences!
26
Effects of Coherence on Learning
CoherentProperties
Incoherent Properties
27
Effect of Coherence on Representation
28
Effects of Coherent Variation on Learning in
Connectionist Models

Attributes that vary together create the acquired
concepts that populate the taxonomic hierarchy,
and determine which properties are central and
which are incidental to a given concept.
Labeling of these concepts or their properties is
in no way necessary.
But it is easy to learn names for such concepts.
Arbitrary properties (those that do not co-vary
with others) are very difficult to learn.
And it is harder to learn names for concepts that
are only differentiated by such arbitrary
properties.

29
Where are we on that list of Phenomena?

Progressive differentiation of concepts
Illusory correlations and U-shaped developmental
trajectories
Conceptual reorganization
Domain- and property-specific constraints on
generalization
Acquired sensitivity to an objects causal
properties

30
Illusory Correlations

Rochel Gelman found that children think that all
animals have feet.
Even animals that look like small furry balls and
dont seem to have any feet at all.

31
A typical property thata particular object
lacks e.g., pine has leaves
An infrequent, atypical property
32
Conceptual Reorganization (Carey, 1985)

Carey demonstrates that young children discover
the unity of plants and animals as living things
with many shared properties only around the age
of 10.
She suggests that the coalescence of the concept
of living thing depends on learning about diverse
aspects of plants and animals including
Nature of life sustaining processes
What it means to be dead vs. alive
Reproductive properties
Can reorganization occur in a connectionist net?

33
Conceptual Reorganization in the Model

Suppose superficial appearance information, which
is not coherent with much else, is always
available
And there is a pattern of coherent covariation
across information that is contingently available
in different contexts.
The model forms initial representations based on
superficial appearances.
Later, it discovers the shared structure that
cuts across the different contexts, reorganizing
its representations.

34
Organization of Conceptual Knowledge Early and
Late in Development
35
Inference and Generalizationin the PDP Model

A semantic representation for a new item can be
derived by error propagation from given
information, using knowledge already stored in
the weights.
Crucially
The similarity structure, and hence the pattern
of generalization depends on the knowledge
already stored in the weights.

36
Start with a neutral representation on the
representation units. Use backprop to adjust the
representation to minimize the error.
37
The result is a representation similar to that of
the average bird
38
Use the representation to infer what this new
thing can do.
39
Inference and Generalizationin the PDP Model

A semantic representation for a new item can be
derived by error propagation from given
information, using knowledge already stored in
the weights.
Crucially
The similarity structure, and hence the pattern
of generalization, depends on the knowledge
already stored in the weights.

40
Domain Specificity

What constraints are required for development and
elaboration of domain-specific knowledge?
Are domain specific constraints required?
Or are there general principles that allow for
acquisition of conceptual knowledge of all
different types?

41
Differential Importance (Marcario, 1991)

3-4 yr old children see a puppet and are told he
likes to eat, or play with, a certain object
(e.g., top object at right)
Children then must choose another one that will
be the same kind of thing to eat or that will
be the same kind of thing to play with.
In the first case they tend to choose the object
with the same color.
In the second case they will tend to choose the
object with the same shape.

Can the knowledge that one kind of property is
important for one type of thing while another is
important for a different type of thing be
learned?

43
Adjustments to Training Environment

Among the plants
All trees are large
All flowers are small
Either can be bright or dull
Among the animals
All birds are bright
All fish are dull
Either can be small or large
In other words
Size covaries with properties that differentiate
different types of plants
Brightness covaries with properties that
differentiate different types of animals

44
Testing Feature Importance

After partial learning, model is shown eight test
objects
Four Animals
All have skin
All combinations of bright/dull and large/small
Four Plants
All have roots
All combinations of bright/dull and large/small
Representations are generated by
usingback-propagation to representation.

45
Similarities of Obtained Representations
Brightness is relevant for Animals
Size is relevant for Plants
46

In Rogers and McClelland (2004) we also address
Conceptual differentiation in prelinguistic
infants.
Many of the phenomena addressed by classic work
on semantic knowledge from the 1970s
Basic level
Typicality
Frequency
Expertise
Disintegration of conceptual knowledge in
semantic dementia
How the model can be extended to capture causal
properties of objects and explanations.
What properties a network must have to be
sensitive to coherent covariation.

47
Coherence Requires Convergence
A
A
48
Semantic Representation in the Brain

Damage to temporal pole is associated with
semantic dementia, a domain-general loss of
semantic information
Imaging and lesion studies suggest that other
brain areas are associated with more specific
types of information.
We suggest that the temporal pole serves as the
convergent semantic representation in the brain.
With bi-directional connections to regions
containing modality specific information.
The interface with language occurs via
connections between language areas and temporal
pole.

In summary
Sensitivity to coherent co-variation in
experience can explain many aspects of conceptual
development.
PDP networks subject to a domain-general
architectural constraint provide the necessary
mechanisms.
Our simulations do not prove domain general
learning methods will turn out to be fully
sufficient.
There is still room for domain- or
content-specific constraints
And the framework is fully compatible with their
integration.
But our findings suggest it may be worth
exploring how far we can go without them.

50
Thanks for your attention!
51
(No Transcript)
52
Proposed Architecture for the Organization of
Semantic Memory
name
action
motion
Temporal pole
color
form
valance
53
Generalization of different property types

At different points in training, the network is
taught one of
Maple can queem
Maple is queem
Maple has queem
Only weights from hidden to output are allowed to
change.
Network is then tested to see how strongly
queem is activated then same relation is paired
with other items.

queem
54
Generalization to other concepts after training
with can, has, or is queem
55
(No Transcript)
56
Overview

The PDP Framework for Processing, Representation
and Learning
Complimentary Learning Systems in Hippocampus and
Neocortex
Differentiation and Reorganization of Conceptual
Knowledge
Inference and Generalization
How the Complimentary Learning Systems Cooperate
What kinds of innate constraints are necessary?

57
Modeling Inductive Inference(Osherson et al,
1990)

General
If a dolphin, a whale, and a zebra have biotin in
their blood, how strong is the implication that
all mammals have biotin in their blood?
Specific
If a seal and a cow have biotin in their blood,
how strong is the implication that a horse has
biotin in its blood?

58
PDP (as in Rogers McClelland, 2004)

Train a network on the item-feature matrix (50
animals have a 0 or 1 for each of 85 features)

Animals
Hidden Layers
Features
59
PDP

Add a new feature, and train the net using the
given examples. Only allow the weights to the new
feature node to change, and train to a threshold
of 0.85.

Animals
Hidden Layers
Features
60
Results

Using Osherson et als similarity-based model
with the networks hidden representations, which
emphasize coherent covariation, results in
improved performance
Kemp, Perfors and Tenenbaums Bayes Tree model
seems to do even better, but we suspect possible
over-fitting.

61
Use the hippocampal memorysystem to store a
memoryfor the learning episode.
Hippocampus
sparrow
If the pattern can be reinstated at a later time,
it can be usedto support further inferences.
62
Relation-specificrepresentations

IS Representations (top) reflect idiosyncratic
appearance properties.
HAS representations are similar to the
context-general representations (middle).
Can representations collapse differences between
plants, since there is little that plants can do.
The fish are all the same, because theres no
difference in what they can do.

63
What About Causal Knowledge?

Young children can attribute causal powers to
objects based on single observations of scenarios
in which the objects participate.
Gopniks blickett experiments
Causal powers are central to childrens
generalization of category membership
They assign the same name to objects with
different appearance properties but similar
causal powers.
Do we need an innate mechanism for causal
inference, as Gopnik suggests, to address these
findings and other aspects of childrens causal
reasoning abilities?

64
My Perspective

Causal relations are not that different than
other kinds of relations.
Domain general mechanisms that are sensitive to
experience underlie the development and
elaboration of causal as well as other forms of
conceptual knowledge.
These mechanisms acquire sensitivity to causal
structure through gradual learning.

65
Extension of the Model toCausal Inference (In my
dreams?)

Networks can learn to form internal
representations that capture causal powers of
objects, based on the consequences of their
participation in events.
Appearance properties dont covary that that well
with the causal powers of objects.
Furthermore, the names of (man-made) objects
co-vary with their causal powers, not with their
appearance.
Radio
Telephone
Razor
Switch
Thus, it would be natural for networks to learn
to generalize names for objects based on their
causal powers, rather than their appearance.

Item
Context (External and Internal)
Sequelae
66
But havent you still left something under the
rug?

No, not really
everything is right out in the open.
Heres the situation
Each example of an item always activates the same
input unit.
Each context always activates the same context
unit.
Each property always activates the same property
unit.
Each item, context, and property unit is like one
of Fodors atomic concept representations
A representation R expresses the property P in
virtue of its being a law that things that are P
cause tokenings of R.
Such stipulations are by no means unproblematic
But everyone has this problem, including Jerry

67
This Problem is Solved by Distributed
Representations

The localist input, context and output units can
be replaced with distributed patterns of
activation
(Rogers McClelland, Chapter 5 Rogers et al,
2005 Dilkina and McClelland).
The units correspond to atomic microconcepts
Each item, context, and property is represented
by a (possibly somewhat variable) ensemble of
them.
The number of possible concepts that can be
distinguished is now far greater (2N vs N).
Networks can learn
Which microconcepts are important (and which
combinations are important)
Which microconcepts should be treated as
equivalent
Which microconcepts should be ignored
All of this depends on patterns of covariation.
This is a very good thing for everyone (even
Jerry!)
it makes it possible for a system with finite
resources to cover the space of possible concepts
that might turn out to be needed.