Title: Optimality in Cognition and Grammar
1Optimality in Cognition and Grammar
- Paul Smolensky
- Cognitive Science Department, Johns Hopkins
University - Plan of lectures
- Cognitive architecture Symbols optimization
in neural networks - Optimization in grammar HG ? OTFrom numerical
to algebraic optimization in grammar - OT and nativismThe initial state
neural/genomic encoding of UG - ?
2The ICS Hypothesis
- The Integrated Connectionist/Symbolic Cognitive
Architecture (ICS) - In higher cognitive domains, representations and
fuctions are well approximated by symbolic
computation - The Connectionist Hypothesis is correct
- Thus, cognitive theory must supply a
computational reduction of symbolic functions to
PDP computation
3Levels
4The ICS Architecture
5Representation
6Tensor Product Representations
Depth 0
?
7Local tree realizations
8The ICS Isomorphism
Tensor product representations
Tensorial networks
?
9Tensor Product Representations
10Binding by Synchrony ?
r1 ? fbook fgive-obj
time
give(John, book, Mary)(Shastri Ajjanagadde
1993)
- s r1 ? fbook fgive-obj r3 ? fMary
frecipient r2 ? fgiver fJohn
Tesar Smolensky 1994
11The ICS Architecture
12Two Fundamental Questions
? Harmony maximization is satisfaction of
parallel, violable constraints
- 2. What are the constraints?
- Knowledge representation
- Prior question
- 1. What are the activation patterns data
structures mental representations evaluated
by these constraints?
13Representation
14Two Fundamental Questions
? Harmony maximization is satisfaction of
parallel, violable constraints
- 2. What are the constraints?
- Knowledge representation
- Prior question
- 1. What are the activation patterns data
structures mental representations evaluated
by these constraints?
15Constraints
NOCODA A syllable has no coda Maori/French/Engli
sh
H(as k æ t) sNOCODA lt 0
16The ICS Architecture
kæt
skæt
A
17The ICS Architecture
kæt
skæt
A
18Constraint Interaction I
- ICS ? Grammatical theory
- Harmonic Grammar
- Legendre, Miyata, Smolensky 1990 et seq.
19Constraint Interaction I
The grammar generates the representation that
maximizes H this best-satisfies the constraints,
given their differential strengths
Any formal language can be so generated.
20The ICS Architecture
?
G
kæt
skæt
A
21Harmonic Grammar Parser
- Simple, comprehensible network
- Simple grammar G
- X ? A B Y ? B A
- Language
Processing Completion
22The ICS Architecture
23Simple Network Parser
- Fully self-connected, symmetric network
- Like previously shown network
Except with 12 units representations and
connections shown below
24Harmonic Grammar Parser
H(Y, A) gt 0H(Y, B) gt 0
- Weight matrix for Y ? B A
25Harmonic Grammar Parser
- Weight matrix for X ? A B
26Harmonic Grammar Parser
- Weight matrix for entire grammar G
27Bottom-up Processing
28Top-down Processing
29Scaling up
- Not yet
- Still conceptual obstacles to surmount
30Explaining Productivity
- Approaching full-scale parsing of formal
languages by neural-network Harmony maximization - Have other networks (like PassiveNet) that
provably compute recursive functions - !? productive competence
- How to explain?
311. Structured representations
32 2. Structured connections
33 Proof of Productivity
- Productive behavior follows mathematically from
combining - the combinatorial structure of the vectorial
representations encoding inputs outputs - and
- the combinatorial structure of the weight
matrices encoding knowledge
34Explaining Productivity I
PSA ICS
Intra-level decomposition A B ? A, B
Inter-level decomposition A B ? 1,0,?1,,1
ICS
35Explaining Productivity II
Functions Semantics
ICS PSA
Intra-level decomposition G ? X?AB, Y?BA
Inter-level decomposition W(G ) ? 1,0,?1,0
36The ICS Architecture
37The ICS Architecture
38Constraint Interaction II OT
- ICS ? Grammatical theory
- Optimality Theory
- Prince Smolensky 1991, 1993/2004
39Constraint Interaction II OT
- Differential strength encoded in strict
domination hierarchies () - Every constraint has complete priority over all
lower-ranked constraints (combined) - Approximate numerical encoding employs special
(exponentially growing) weights - Grammars cant count
40Constraint Interaction II OT
- Stress is on the initial heavy syllable iff the
number of light syllables n obeys
No way, man
41Constraint Interaction II OT
- Differential strength encoded in strict
domination hierarchies () - Constraints are universal (Con)
- Candidate outputs are universal (Gen)
- Human grammars differ only in how these
constraints are ranked - factorial typology
- First true contender for a formal theory of
cross-linguistic typology - 1st innovation of OT constraint ranking
- 2nd innovation Faithfulness
42The Faithfulness/Markedness Dialectic
- cat /kat/ ? kæt NOCODA why?
- FAITHFULNESS requires pronunciation lexical
form - MARKEDNESS often opposes it
- Markedness-Faithfulness dialectic ? diversity
- English FAITH NOCODA
- Polynesian NOCODA FAITH (French)
- Another markedness constraint M
- Nasal Place Agreement Assimilation (NPA)
?g ? ?b, ?d velar
nd ? md, ?d coronal
mb ? nb, ?b labial
43The ICS Architecture
44Optimality Theory
- Diversity of contributions to theoretical
linguistics - Phonology phonetics
- Syntax
- Semantics pragmatics
- e.g., following lectures. Now
- Can strict domination be explained by
connectionism?
45Case study
- Syllabification in Berber
- Plan
- Data, then
OT grammar Harmonic Grammar Network
46Syllabification in Berber
- Dell Elmedlaoui, 1985 Imdlawn Tashlhit Berber
- Syllable nucleus can be any segment
- But driven by universal preference for nuclei to
be highest-sonority segments
47Berber syllable nuclei have maximal sonority
48OT Grammar BrbrOT
- HNUC A syllable nucleus is sonorous
- ONSET A syllable has an onset
Strict Domination
Prince Smolensky 93/04
49Harmonic Grammar BrbrHG
- HNUC A syllable nucleus is sonorous
- Nucleus of sonority s Harmony 2s?1
- s ? 1, 2, , 8 t, d, f, z, n, l, i, a
- ONSET VV Harmony ?28
- Theorem. The global Harmony maxima are the
correct Berber core syllabifications - of Dell Elmedlaoui no sonority plateaux, as
in OT analysis, here henceforth
50BrbrNet realizes BrbrHG
51BrbrNets Global Harmony Maximum is the correct
parse
- Contrasts with Goldsmiths Dynamic Linear Models
(Goldsmith Larson 90 Prince 93) - For a given input string, a state of BrbrNet is
a global Harmony maximum if and only if it
realizes the syllabification produced by the
serial Dell-Elmedlaoui algorithm
52BrbrNets Search Dynamics
- Greedy local optimization
- at each moment, make a small change of state so
as to maximally increase Harmony - (gradient ascent mountain climbing in fog)
- guaranteed to construct a local maximum
53/txznt/ ? tx.znt yousing stored
H
54The Hardest Case 12378/t.bx.ya
hypothetical, but compare t.bx.la.kkwshe
even behaved as a miser tbx.lakkw
55Subsymbolic Parsing
V
V
V
V
V
V
V
V
56Parsing sonority profile 8121345787
a.tb.kf.zn.yay
Finds best of infinitely many representations102
4 corners/parses
57BrbrNet has many Local Harmony Maxima
- An output pattern in BrbrNet is a local Harmony
maximum if and only if it realizes a sequence of
legal Berber syllables (i.e., an output of Gen) - That is, every activation value is 0 or 1, and
the sequence of values is that realizing a
sequence of substrings taken from the syllable
inventory CV, CVC, V, VC, - where C 0, V 1 and word edge
- Greedy optimization avoids local maxima why?
58HG ? OTs Strict Domination
- Strict Domination Baffling from a connectionist
perspective? - Explicable from a connectionist perspective?
- Exponential BrbrNet escapes local H maxima
- Linear BrbrNet does not
59Linear BrbrNet makes errors
- ( Goldsmith-Larson network)
- Error /12378/ ? .123.78. (correct .1.23.78.)
60Subsymbolic Harmony optimization can be stochastic
- The search for an optimal state can employ
randomness - Equations for units activation values have
random terms - pr(a) ? eH(a)/T
- T (temperature) randomness ? 0 during search
- Boltzmann Machine (Hinton and Sejnowski 1983,
1986) Harmony Theory (Smolensky 1983, 1986) - Can guarantee computation of global optimum in
principle - In practice how fast? Exponential vs. linear
BrbrNet
61Stochastic BrbrNetExponential can succeed fast
62Stochastic BrbrNet Linear cant succeed fast
63Stochastic BrbrNet (Linear)
5-run average
64The ICS Architecture