Title: Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach
1Organization and Emergence ofSemantic Knowledge
A Parallel-Distributed Processing Approach
- James L. McClelland
- Department of Psychology
- and
- Center for Mind, Brain, and Computation
- Stanford University
2Some Phenomena in Conceptual Development
- Progressive differentiation of concepts
- Illusory correlations and U-shaped developmental
trajectories - Conceptual reorganization
- Domain- and property-specific constraints on
generalization - Acquired sensitivity to an objects causal
properties - What underlies these phenomena?
3Naïve Domain Theories?
- Mechanisms of learning are thought to be too weak
- They learn by contiguity and generalize by
similarity - But generalization is domain dependent.
- .. so it is proposed instead that development
begins with initial constraints, in the form of
innately pre-specified proto-theories that guide
inference and learning.
4An Alternative ViewSensitivity to Coherent
Covariation
- Coherent Covariation
- The tendency of properties of objects to co-occur
in clusters. - e.g.
- Has wings
- Can fly
- Is light
- Or
- Has roots
- Has rigid cell walls
- Can grow tall
5Our Answer in More Detail
- Domain general mechanisms sensitive to experience
underlie the development and elaboration of
conceptual knowledge. - These mechanisms exploit the principles of
parallel-distributed processing. - Models built on these principles are sensitive to
coherent covariation. - This sensitivity is the main cause of all of the
phenomena.
6Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
- Processing occurs via interactions among
neuron-like processing units via weighted
connections. - A representation is a pattern of activation.
- The knowledge is in the connections.
- Learning occurs through gradual connection
adjustment, driven by experience. - Both representation and processing are affected.
H I N T
7Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
- Processing occurs via interactions among
neuron-like processing units via weighted
connections. - A representation is a pattern of activation.
- The knowledge is in the connections.
- Learning occurs through gradual connection
adjustment, driven by experience. - Both representation and processing are affected.
H I N T
8Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
- Processing occurs via interactions among
neuron-like processing units via weighted
connections. - A representation is a pattern of activation.
- The knowledge is in the connections.
- Learning occurs through gradual connection
adjustment, driven by experience. - Both representation and processing are affected.
H I N T
9Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
- Processing occurs via interactions among
neuron-like processing units via weighted
connections. - A representation is a pattern of activation.
- The knowledge is in the connections.
- Learning occurs through gradual connection
adjustment, driven by experience. - Both representation and processing are affected.
H I N T
10Differentiation in Developmentand in a simple
PDP network
11(No Transcript)
12The Rumelhart Model
13QuilliansHierarchicalPropositional Model
14The Rumelhart Model Target output for robin
can input
15The Training Data
All propositions true of items at the bottom
levelof the tree, e.g. Robin can fly, move,
grow
16Forward Propagation of Activation
17Back Propagation of Error (d)
aj
wij
ai
di Sdkwki
wki
dk (tk-ak)
Error-correcting learning At the output
layer Dwki edkai At the prior layer Dwij
edjaj
18The Rumelhart Model
19(No Transcript)
20(No Transcript)
21(No Transcript)
22What Drives Progressive Differentiation?
- Waves of differentiation reflect coherent
covariation of properties across items. - Patterns of coherent covariation are reflected in
the principal components of the property
covariance matrix. - Figure shows attribute loadings on the first
three principal components - 1. Plants vs. animals
- 2. Birds vs. fish
- 3. Trees vs. flowers
- Same color features covary in
component - Diff color anti-covarying
features
23Now wait just a minute
- Didnt you tell the network the taxonomic
organization directly? - Pine ISA Tree, Plant
- Robin ISA Bird, Animal
- Yes we did.
- We do think names kids hear for things affect
their conceptual representations. - But labels arent necessary as long as an items
properties exhibit coherent covariation.
24Properties Coherent Incoherent
CoherenceTraining Patterns
Items
No labels are provided Each item and each
property occurs with equal frequency
25Properties Coherent Incoherent
ISCANHAS
Contexts
Items
Note coherence is present between, not within
training experiences!
26Effects of Coherence on Learning
CoherentProperties
Incoherent Properties
27Effect of Coherence on Representation
28Effects of Coherent Variation on Learning in
Connectionist Models
- Attributes that vary together create the acquired
concepts that populate the taxonomic hierarchy,
and determine which properties are central and
which are incidental to a given concept. - Labeling of these concepts or their properties is
in no way necessary. - But it is easy to learn names for such concepts.
- Arbitrary properties (those that do not co-vary
with others) are very difficult to learn. - And it is harder to learn names for concepts that
are only differentiated by such arbitrary
properties.
29Where are we on that list of Phenomena?
- Progressive differentiation of concepts
- Illusory correlations and U-shaped developmental
trajectories - Conceptual reorganization
- Domain- and property-specific constraints on
generalization - Acquired sensitivity to an objects causal
properties
30Illusory Correlations
- Rochel Gelman found that children think that all
animals have feet. - Even animals that look like small furry balls and
dont seem to have any feet at all.
31A typical property thata particular object
lacks e.g., pine has leaves
An infrequent, atypical property
32Conceptual Reorganization (Carey, 1985)
- Carey demonstrates that young children discover
the unity of plants and animals as living things
with many shared properties only around the age
of 10. - She suggests that the coalescence of the concept
of living thing depends on learning about diverse
aspects of plants and animals including - Nature of life sustaining processes
- What it means to be dead vs. alive
- Reproductive properties
- Can reorganization occur in a connectionist net?
33Conceptual Reorganization in the Model
- Suppose superficial appearance information, which
is not coherent with much else, is always
available - And there is a pattern of coherent covariation
across information that is contingently available
in different contexts. - The model forms initial representations based on
superficial appearances. - Later, it discovers the shared structure that
cuts across the different contexts, reorganizing
its representations.
34Organization of Conceptual Knowledge Early and
Late in Development
35Inference and Generalizationin the PDP Model
- A semantic representation for a new item can be
derived by error propagation from given
information, using knowledge already stored in
the weights. - Crucially
- The similarity structure, and hence the pattern
of generalization depends on the knowledge
already stored in the weights.
36Start with a neutral representation on the
representation units. Use backprop to adjust the
representation to minimize the error.
37The result is a representation similar to that of
the average bird
38Use the representation to infer what this new
thing can do.
39Inference and Generalizationin the PDP Model
- A semantic representation for a new item can be
derived by error propagation from given
information, using knowledge already stored in
the weights. - Crucially
- The similarity structure, and hence the pattern
of generalization, depends on the knowledge
already stored in the weights.
40Domain Specificity
- What constraints are required for development and
elaboration of domain-specific knowledge? - Are domain specific constraints required?
- Or are there general principles that allow for
acquisition of conceptual knowledge of all
different types?
41Differential Importance (Marcario, 1991)
- 3-4 yr old children see a puppet and are told he
likes to eat, or play with, a certain object
(e.g., top object at right) - Children then must choose another one that will
be the same kind of thing to eat or that will
be the same kind of thing to play with. - In the first case they tend to choose the object
with the same color. - In the second case they will tend to choose the
object with the same shape.
42- Can the knowledge that one kind of property is
important for one type of thing while another is
important for a different type of thing be
learned?
43Adjustments to Training Environment
- Among the plants
- All trees are large
- All flowers are small
- Either can be bright or dull
- Among the animals
- All birds are bright
- All fish are dull
- Either can be small or large
- In other words
- Size covaries with properties that differentiate
different types of plants - Brightness covaries with properties that
differentiate different types of animals
44Testing Feature Importance
- After partial learning, model is shown eight test
objects - Four Animals
- All have skin
- All combinations of bright/dull and large/small
- Four Plants
- All have roots
- All combinations of bright/dull and large/small
- Representations are generated by
usingback-propagation to representation.
45Similarities of Obtained Representations
Brightness is relevant for Animals
Size is relevant for Plants
46- In Rogers and McClelland (2004) we also address
- Conceptual differentiation in prelinguistic
infants. - Many of the phenomena addressed by classic work
on semantic knowledge from the 1970s - Basic level
- Typicality
- Frequency
- Expertise
- Disintegration of conceptual knowledge in
semantic dementia - How the model can be extended to capture causal
properties of objects and explanations. - What properties a network must have to be
sensitive to coherent covariation.
47Coherence Requires Convergence
A
A
48Semantic Representation in the Brain
- Damage to temporal pole is associated with
semantic dementia, a domain-general loss of
semantic information - Imaging and lesion studies suggest that other
brain areas are associated with more specific
types of information. - We suggest that the temporal pole serves as the
convergent semantic representation in the brain. - With bi-directional connections to regions
containing modality specific information. - The interface with language occurs via
connections between language areas and temporal
pole.
49- In summary
- Sensitivity to coherent co-variation in
experience can explain many aspects of conceptual
development. - PDP networks subject to a domain-general
architectural constraint provide the necessary
mechanisms. - Our simulations do not prove domain general
learning methods will turn out to be fully
sufficient. - There is still room for domain- or
content-specific constraints - And the framework is fully compatible with their
integration. - But our findings suggest it may be worth
exploring how far we can go without them.
50Thanks for your attention!
51(No Transcript)
52Proposed Architecture for the Organization of
Semantic Memory
name
action
motion
Temporal pole
color
form
valance
53Generalization of different property types
- At different points in training, the network is
taught one of - Maple can queem
- Maple is queem
- Maple has queem
- Only weights from hidden to output are allowed to
change. - Network is then tested to see how strongly
queem is activated then same relation is paired
with other items.
queem
54Generalization to other concepts after training
with can, has, or is queem
55(No Transcript)
56Overview
- The PDP Framework for Processing, Representation
and Learning - Complimentary Learning Systems in Hippocampus and
Neocortex - Differentiation and Reorganization of Conceptual
Knowledge - Inference and Generalization
- How the Complimentary Learning Systems Cooperate
- What kinds of innate constraints are necessary?
57Modeling Inductive Inference(Osherson et al,
1990)
- General
- If a dolphin, a whale, and a zebra have biotin in
their blood, how strong is the implication that
all mammals have biotin in their blood? - Specific
- If a seal and a cow have biotin in their blood,
how strong is the implication that a horse has
biotin in its blood?
58PDP (as in Rogers McClelland, 2004)
- Train a network on the item-feature matrix (50
animals have a 0 or 1 for each of 85 features)
Animals
Hidden Layers
Features
59PDP
- Add a new feature, and train the net using the
given examples. Only allow the weights to the new
feature node to change, and train to a threshold
of 0.85.
Animals
Hidden Layers
Features
60Results
- Using Osherson et als similarity-based model
with the networks hidden representations, which
emphasize coherent covariation, results in
improved performance - Kemp, Perfors and Tenenbaums Bayes Tree model
seems to do even better, but we suspect possible
over-fitting.
61Use the hippocampal memorysystem to store a
memoryfor the learning episode.
Hippocampus
sparrow
If the pattern can be reinstated at a later time,
it can be usedto support further inferences.
62Relation-specificrepresentations
- IS Representations (top) reflect idiosyncratic
appearance properties. - HAS representations are similar to the
context-general representations (middle). - Can representations collapse differences between
plants, since there is little that plants can do. - The fish are all the same, because theres no
difference in what they can do.
63What About Causal Knowledge?
- Young children can attribute causal powers to
objects based on single observations of scenarios
in which the objects participate. - Gopniks blickett experiments
- Causal powers are central to childrens
generalization of category membership - They assign the same name to objects with
different appearance properties but similar
causal powers. - Do we need an innate mechanism for causal
inference, as Gopnik suggests, to address these
findings and other aspects of childrens causal
reasoning abilities?
64My Perspective
- Causal relations are not that different than
other kinds of relations. - Domain general mechanisms that are sensitive to
experience underlie the development and
elaboration of causal as well as other forms of
conceptual knowledge. - These mechanisms acquire sensitivity to causal
structure through gradual learning.
65Extension of the Model toCausal Inference (In my
dreams?)
- Networks can learn to form internal
representations that capture causal powers of
objects, based on the consequences of their
participation in events. - Appearance properties dont covary that that well
with the causal powers of objects. - Furthermore, the names of (man-made) objects
co-vary with their causal powers, not with their
appearance. - Radio
- Telephone
- Razor
- Switch
- Thus, it would be natural for networks to learn
to generalize names for objects based on their
causal powers, rather than their appearance.
Item
Context (External and Internal)
Sequelae
66But havent you still left something under the
rug?
- No, not really
- everything is right out in the open.
- Heres the situation
- Each example of an item always activates the same
input unit. - Each context always activates the same context
unit. - Each property always activates the same property
unit. - Each item, context, and property unit is like one
of Fodors atomic concept representations - A representation R expresses the property P in
virtue of its being a law that things that are P
cause tokenings of R. - Such stipulations are by no means unproblematic
- But everyone has this problem, including Jerry
67This Problem is Solved by Distributed
Representations
- The localist input, context and output units can
be replaced with distributed patterns of
activation - (Rogers McClelland, Chapter 5 Rogers et al,
2005 Dilkina and McClelland). - The units correspond to atomic microconcepts
- Each item, context, and property is represented
by a (possibly somewhat variable) ensemble of
them. - The number of possible concepts that can be
distinguished is now far greater (2N vs N). - Networks can learn
- Which microconcepts are important (and which
combinations are important) - Which microconcepts should be treated as
equivalent - Which microconcepts should be ignored
- All of this depends on patterns of covariation.
- This is a very good thing for everyone (even
Jerry!) - it makes it possible for a system with finite
resources to cover the space of possible concepts
that might turn out to be needed.