Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach

Description:

Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach James L. McClelland Department of Psychology and Center for Mind, Brain ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 68
Provided by: JayM60
Category:

less

Transcript and Presenter's Notes

Title: Organization and Emergence of Semantic Knowledge: A Parallel-Distributed Processing Approach


1
Organization and Emergence ofSemantic Knowledge
A Parallel-Distributed Processing Approach
  • James L. McClelland
  • Department of Psychology
  • and
  • Center for Mind, Brain, and Computation
  • Stanford University

2
Some Phenomena in Conceptual Development
  • Progressive differentiation of concepts
  • Illusory correlations and U-shaped developmental
    trajectories
  • Conceptual reorganization
  • Domain- and property-specific constraints on
    generalization
  • Acquired sensitivity to an objects causal
    properties
  • What underlies these phenomena?

3
Naïve Domain Theories?
  • Mechanisms of learning are thought to be too weak
  • They learn by contiguity and generalize by
    similarity
  • But generalization is domain dependent.
  • .. so it is proposed instead that development
    begins with initial constraints, in the form of
    innately pre-specified proto-theories that guide
    inference and learning.

4
An Alternative ViewSensitivity to Coherent
Covariation
  • Coherent Covariation
  • The tendency of properties of objects to co-occur
    in clusters.
  • e.g.
  • Has wings
  • Can fly
  • Is light
  • Or
  • Has roots
  • Has rigid cell walls
  • Can grow tall

5
Our Answer in More Detail
  • Domain general mechanisms sensitive to experience
    underlie the development and elaboration of
    conceptual knowledge.
  • These mechanisms exploit the principles of
    parallel-distributed processing.
  • Models built on these principles are sensitive to
    coherent covariation.
  • This sensitivity is the main cause of all of the
    phenomena.

6
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
  • Processing occurs via interactions among
    neuron-like processing units via weighted
    connections.
  • A representation is a pattern of activation.
  • The knowledge is in the connections.
  • Learning occurs through gradual connection
    adjustment, driven by experience.
  • Both representation and processing are affected.

H I N T
7
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
  • Processing occurs via interactions among
    neuron-like processing units via weighted
    connections.
  • A representation is a pattern of activation.
  • The knowledge is in the connections.
  • Learning occurs through gradual connection
    adjustment, driven by experience.
  • Both representation and processing are affected.

H I N T
8
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
  • Processing occurs via interactions among
    neuron-like processing units via weighted
    connections.
  • A representation is a pattern of activation.
  • The knowledge is in the connections.
  • Learning occurs through gradual connection
    adjustment, driven by experience.
  • Both representation and processing are affected.

H I N T
9
Principles of Parallel Distributed Processing
/h/ /i/ /n/ /t/
  • Processing occurs via interactions among
    neuron-like processing units via weighted
    connections.
  • A representation is a pattern of activation.
  • The knowledge is in the connections.
  • Learning occurs through gradual connection
    adjustment, driven by experience.
  • Both representation and processing are affected.

H I N T
10
Differentiation in Developmentand in a simple
PDP network
11
(No Transcript)
12
The Rumelhart Model
13
QuilliansHierarchicalPropositional Model
14
The Rumelhart Model Target output for robin
can input
15
The Training Data
All propositions true of items at the bottom
levelof the tree, e.g. Robin can fly, move,
grow
16
Forward Propagation of Activation
17
Back Propagation of Error (d)
aj
wij
ai
di Sdkwki
wki
dk (tk-ak)
Error-correcting learning At the output
layer Dwki edkai At the prior layer Dwij
edjaj
18
The Rumelhart Model
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
What Drives Progressive Differentiation?
  • Waves of differentiation reflect coherent
    covariation of properties across items.
  • Patterns of coherent covariation are reflected in
    the principal components of the property
    covariance matrix.
  • Figure shows attribute loadings on the first
    three principal components
  • 1. Plants vs. animals
  • 2. Birds vs. fish
  • 3. Trees vs. flowers
  • Same color features covary in
    component
  • Diff color anti-covarying
    features

23
Now wait just a minute
  • Didnt you tell the network the taxonomic
    organization directly?
  • Pine ISA Tree, Plant
  • Robin ISA Bird, Animal
  • Yes we did.
  • We do think names kids hear for things affect
    their conceptual representations.
  • But labels arent necessary as long as an items
    properties exhibit coherent covariation.

24
Properties Coherent Incoherent
CoherenceTraining Patterns
Items
No labels are provided Each item and each
property occurs with equal frequency
25
Properties Coherent Incoherent
ISCANHAS
Contexts
Items
Note coherence is present between, not within
training experiences!
26
Effects of Coherence on Learning
CoherentProperties
Incoherent Properties
27
Effect of Coherence on Representation
28
Effects of Coherent Variation on Learning in
Connectionist Models
  • Attributes that vary together create the acquired
    concepts that populate the taxonomic hierarchy,
    and determine which properties are central and
    which are incidental to a given concept.
  • Labeling of these concepts or their properties is
    in no way necessary.
  • But it is easy to learn names for such concepts.
  • Arbitrary properties (those that do not co-vary
    with others) are very difficult to learn.
  • And it is harder to learn names for concepts that
    are only differentiated by such arbitrary
    properties.

29
Where are we on that list of Phenomena?
  • Progressive differentiation of concepts
  • Illusory correlations and U-shaped developmental
    trajectories
  • Conceptual reorganization
  • Domain- and property-specific constraints on
    generalization
  • Acquired sensitivity to an objects causal
    properties

30
Illusory Correlations
  • Rochel Gelman found that children think that all
    animals have feet.
  • Even animals that look like small furry balls and
    dont seem to have any feet at all.

31
A typical property thata particular object
lacks e.g., pine has leaves
An infrequent, atypical property
32
Conceptual Reorganization (Carey, 1985)
  • Carey demonstrates that young children discover
    the unity of plants and animals as living things
    with many shared properties only around the age
    of 10.
  • She suggests that the coalescence of the concept
    of living thing depends on learning about diverse
    aspects of plants and animals including
  • Nature of life sustaining processes
  • What it means to be dead vs. alive
  • Reproductive properties
  • Can reorganization occur in a connectionist net?

33
Conceptual Reorganization in the Model
  • Suppose superficial appearance information, which
    is not coherent with much else, is always
    available
  • And there is a pattern of coherent covariation
    across information that is contingently available
    in different contexts.
  • The model forms initial representations based on
    superficial appearances.
  • Later, it discovers the shared structure that
    cuts across the different contexts, reorganizing
    its representations.

34
Organization of Conceptual Knowledge Early and
Late in Development
35
Inference and Generalizationin the PDP Model
  • A semantic representation for a new item can be
    derived by error propagation from given
    information, using knowledge already stored in
    the weights.
  • Crucially
  • The similarity structure, and hence the pattern
    of generalization depends on the knowledge
    already stored in the weights.

36
Start with a neutral representation on the
representation units. Use backprop to adjust the
representation to minimize the error.
37
The result is a representation similar to that of
the average bird
38
Use the representation to infer what this new
thing can do.
39
Inference and Generalizationin the PDP Model
  • A semantic representation for a new item can be
    derived by error propagation from given
    information, using knowledge already stored in
    the weights.
  • Crucially
  • The similarity structure, and hence the pattern
    of generalization, depends on the knowledge
    already stored in the weights.

40
Domain Specificity
  • What constraints are required for development and
    elaboration of domain-specific knowledge?
  • Are domain specific constraints required?
  • Or are there general principles that allow for
    acquisition of conceptual knowledge of all
    different types?

41
Differential Importance (Marcario, 1991)
  • 3-4 yr old children see a puppet and are told he
    likes to eat, or play with, a certain object
    (e.g., top object at right)
  • Children then must choose another one that will
    be the same kind of thing to eat or that will
    be the same kind of thing to play with.
  • In the first case they tend to choose the object
    with the same color.
  • In the second case they will tend to choose the
    object with the same shape.

42
  • Can the knowledge that one kind of property is
    important for one type of thing while another is
    important for a different type of thing be
    learned?

43
Adjustments to Training Environment
  • Among the plants
  • All trees are large
  • All flowers are small
  • Either can be bright or dull
  • Among the animals
  • All birds are bright
  • All fish are dull
  • Either can be small or large
  • In other words
  • Size covaries with properties that differentiate
    different types of plants
  • Brightness covaries with properties that
    differentiate different types of animals

44
Testing Feature Importance
  • After partial learning, model is shown eight test
    objects
  • Four Animals
  • All have skin
  • All combinations of bright/dull and large/small
  • Four Plants
  • All have roots
  • All combinations of bright/dull and large/small
  • Representations are generated by
    usingback-propagation to representation.

45
Similarities of Obtained Representations
Brightness is relevant for Animals
Size is relevant for Plants
46
  • In Rogers and McClelland (2004) we also address
  • Conceptual differentiation in prelinguistic
    infants.
  • Many of the phenomena addressed by classic work
    on semantic knowledge from the 1970s
  • Basic level
  • Typicality
  • Frequency
  • Expertise
  • Disintegration of conceptual knowledge in
    semantic dementia
  • How the model can be extended to capture causal
    properties of objects and explanations.
  • What properties a network must have to be
    sensitive to coherent covariation.

47
Coherence Requires Convergence
A
A
48
Semantic Representation in the Brain
  • Damage to temporal pole is associated with
    semantic dementia, a domain-general loss of
    semantic information
  • Imaging and lesion studies suggest that other
    brain areas are associated with more specific
    types of information.
  • We suggest that the temporal pole serves as the
    convergent semantic representation in the brain.
  • With bi-directional connections to regions
    containing modality specific information.
  • The interface with language occurs via
    connections between language areas and temporal
    pole.

49
  • In summary
  • Sensitivity to coherent co-variation in
    experience can explain many aspects of conceptual
    development.
  • PDP networks subject to a domain-general
    architectural constraint provide the necessary
    mechanisms.
  • Our simulations do not prove domain general
    learning methods will turn out to be fully
    sufficient.
  • There is still room for domain- or
    content-specific constraints
  • And the framework is fully compatible with their
    integration.
  • But our findings suggest it may be worth
    exploring how far we can go without them.

50
Thanks for your attention!
51
(No Transcript)
52
Proposed Architecture for the Organization of
Semantic Memory
name
action
motion
Temporal pole
color
form
valance
53
Generalization of different property types
  • At different points in training, the network is
    taught one of
  • Maple can queem
  • Maple is queem
  • Maple has queem
  • Only weights from hidden to output are allowed to
    change.
  • Network is then tested to see how strongly
    queem is activated then same relation is paired
    with other items.

queem
54
Generalization to other concepts after training
with can, has, or is queem
55
(No Transcript)
56
Overview
  • The PDP Framework for Processing, Representation
    and Learning
  • Complimentary Learning Systems in Hippocampus and
    Neocortex
  • Differentiation and Reorganization of Conceptual
    Knowledge
  • Inference and Generalization
  • How the Complimentary Learning Systems Cooperate
  • What kinds of innate constraints are necessary?

57
Modeling Inductive Inference(Osherson et al,
1990)
  • General
  • If a dolphin, a whale, and a zebra have biotin in
    their blood, how strong is the implication that
    all mammals have biotin in their blood?
  • Specific
  • If a seal and a cow have biotin in their blood,
    how strong is the implication that a horse has
    biotin in its blood?

58
PDP (as in Rogers McClelland, 2004)
  • Train a network on the item-feature matrix (50
    animals have a 0 or 1 for each of 85 features)

Animals
Hidden Layers
Features
59
PDP
  • Add a new feature, and train the net using the
    given examples. Only allow the weights to the new
    feature node to change, and train to a threshold
    of 0.85.

Animals
Hidden Layers
Features
60
Results
  • Using Osherson et als similarity-based model
    with the networks hidden representations, which
    emphasize coherent covariation, results in
    improved performance
  • Kemp, Perfors and Tenenbaums Bayes Tree model
    seems to do even better, but we suspect possible
    over-fitting.

61
Use the hippocampal memorysystem to store a
memoryfor the learning episode.
Hippocampus
sparrow
If the pattern can be reinstated at a later time,
it can be usedto support further inferences.
62
Relation-specificrepresentations
  • IS Representations (top) reflect idiosyncratic
    appearance properties.
  • HAS representations are similar to the
    context-general representations (middle).
  • Can representations collapse differences between
    plants, since there is little that plants can do.
  • The fish are all the same, because theres no
    difference in what they can do.

63
What About Causal Knowledge?
  • Young children can attribute causal powers to
    objects based on single observations of scenarios
    in which the objects participate.
  • Gopniks blickett experiments
  • Causal powers are central to childrens
    generalization of category membership
  • They assign the same name to objects with
    different appearance properties but similar
    causal powers.
  • Do we need an innate mechanism for causal
    inference, as Gopnik suggests, to address these
    findings and other aspects of childrens causal
    reasoning abilities?

64
My Perspective
  • Causal relations are not that different than
    other kinds of relations.
  • Domain general mechanisms that are sensitive to
    experience underlie the development and
    elaboration of causal as well as other forms of
    conceptual knowledge.
  • These mechanisms acquire sensitivity to causal
    structure through gradual learning.

65
Extension of the Model toCausal Inference (In my
dreams?)
  • Networks can learn to form internal
    representations that capture causal powers of
    objects, based on the consequences of their
    participation in events.
  • Appearance properties dont covary that that well
    with the causal powers of objects.
  • Furthermore, the names of (man-made) objects
    co-vary with their causal powers, not with their
    appearance.
  • Radio
  • Telephone
  • Razor
  • Switch
  • Thus, it would be natural for networks to learn
    to generalize names for objects based on their
    causal powers, rather than their appearance.

Item
Context (External and Internal)
Sequelae
66
But havent you still left something under the
rug?
  • No, not really
  • everything is right out in the open.
  • Heres the situation
  • Each example of an item always activates the same
    input unit.
  • Each context always activates the same context
    unit.
  • Each property always activates the same property
    unit.
  • Each item, context, and property unit is like one
    of Fodors atomic concept representations
  • A representation R expresses the property P in
    virtue of its being a law that things that are P
    cause tokenings of R.
  • Such stipulations are by no means unproblematic
  • But everyone has this problem, including Jerry

67
This Problem is Solved by Distributed
Representations
  • The localist input, context and output units can
    be replaced with distributed patterns of
    activation
  • (Rogers McClelland, Chapter 5 Rogers et al,
    2005 Dilkina and McClelland).
  • The units correspond to atomic microconcepts
  • Each item, context, and property is represented
    by a (possibly somewhat variable) ensemble of
    them.
  • The number of possible concepts that can be
    distinguished is now far greater (2N vs N).
  • Networks can learn
  • Which microconcepts are important (and which
    combinations are important)
  • Which microconcepts should be treated as
    equivalent
  • Which microconcepts should be ignored
  • All of this depends on patterns of covariation.
  • This is a very good thing for everyone (even
    Jerry!)
  • it makes it possible for a system with finite
    resources to cover the space of possible concepts
    that might turn out to be needed.
Write a Comment
User Comments (0)
About PowerShow.com