Learning Logic and Grammar

About This Presentation

Title:

Learning Logic and Grammar

Description:

... of children wiith cochlear implants. The challenge: ... Language development of children with cochlear implants ... Cochlear inplants: Research Questions ... – PowerPoint PPT presentation

Number of Views:171

Avg rating:3.0/5.0

Slides: 110

Provided by: padr1

Category:

more less

Transcript and Presenter's Notes

Title: Learning Logic and Grammar

1
Learning Logic and Grammar
Grammar Induction Semantic Learning Pieter
Adriaans Universiteit van Amsterdam pietera_at_scien
ce.uva.nl
2
Structure of talk grammar induction and semantic
learning

Introduction
Induction and learning
Motivation
Virtual Lab adaptive information disclosure
Language development of children wiith cochlear
implants
The challenge learning from text
Early Research Gold, Horning, Valiant
Emile 2 Kolmogorov Complexity Univeral
Distribution Shallowness
Emile 3 Charateristic expressions and contexts
bootstrapping
Emperical tests at first not very convincing.
Why?
Semantic Distributions Not a bug, a feature!
From Grammar Induction to Semantic Learning
grammar induction under semantic distributions
semantic learning
Conclusion

3
Induction A desperate discipline

400 BC. Herakleitos panta rhei
400 BC. Parmenides phusis kruptestai philei
200 BC 150 AD Pyrrhonism knowlegde is not
possible
1750 Hume induction problem
1910 Russell paradox ramified theory of types
1930 Godel incompleteness of arithmetic
1935 Turing/Church undecidability,
incomputability
1935 Popper asymmetry between verification and
falsification
1965 Kolmogorov complexity is nonconstructive
1967 Gold superfinite sets not identifiable in
the limit
1988 Pitt Warmuth selecting the smallest
automaton is NP-hard
1995 Wolpert Macready No free lunch theorem

4
There is no universal learning method

What we can do Identify classes of algorithms
that work for classes of problems
The number of such classes is in principle
infinite
Machine Learning is essentially heuristic
Machine Learning is essentially empirical which
methods work for which domains in reality?

5
Paul Feyerabend Against method 1975

Science is an essentially anarchistic enterprise
theoretical anarchism is more humanitarian and
more likely to encourage progress than its
law-and-order alternatives.
This is shown both by an examination of
historical episodes and by an abstract analysis
of the relation between idea and action. The only
principle that does not inhibit progress is
anything goes.

6
Learning adaptive systems

What is Machine Learning?The use of
computational techniques to detect structures in
datasets
What kind of datasets?continuous -
discretefinite - infiniteone-dimensional -
multidimensionalbinary - nonbinary strings
sets trees databases - pictures etc. etc.

7
Base case 2-part code optimization
Observed Data
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
8
Paradigm case Finite Binary string
000110100110011111010101011010100000

Data
Theory
Data
Theory
Theory Program input lt Data

Program
Input
010101010101010101010101010101010101
For i 1 to x print y
x-18 y 01
9
Unsupervised Learning
Non-random (computational) Proces
Input
Unknown System
Observed Output
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
10
Supervised Learning
Non-random (computational) Proces
Unknown System
Observed Output
Observed Input
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
11
Adaptive System
Non-random (computational) Proces
Unknown System
Observed Output
Observed Input
Learning
Input
Program
Learned Theory
12
Agent System
Unknown System
Non-random (computational) Proces
Adaptive systems
13
Practical applications

Virtual Lab adaptive information disclosure
(Pieter Adriaans)
TNO, IBM, UvA, Unilever
Language development of children with cochlear
implants
Jacqueline Van Kampen (UiL OTS, Utrecht), Pieter
Adriaans, Dick de Jongh (ILLC, Amsterdam), Guido
Smoorenburg (AZU, Utrecht).

14

Application cases
Application cases
Application cases
Application cases
Application cases
Application cases
E-Science Application P1
Generic Virtual Laboratory P2.
Large distributed System P3
15

Food Informatics SP.1.2
Telescience SP.1.6
Bio-diversity SP.1.4
Data intensive science SP.1.1
Medical diagnosis imaging SP.1.3
Bio-Inofrmatics SP.1.5
SP2.2
SP2.4
SP2.3
Collaborative information Management
Adaptive information Disclosing
SP2.5
SP2.1
Interactive PSE
User Interface Virtual reality
Virtual lab. System integration
HPDC Processor Data co-allocation
Security Generic AAA
Optical Networking
16
Adaptive Information Disclosure Research
Questions

The engineering challenge build a routine that
can enrich an under-specified model A on the
basis of a collection of ontologies B1,,Bn and
a collection of corpora C1,,Ck and present the
results in mode D
Research Question 1 Can we learn an ontology for
a certain domain from scratch on the basis of a
collection of documents describing that domain?
Research Question 2 Can we use grammar induction
on a collection of documents describing a domain
to formulate suggestions for the enrichment and
adaptation of an ontology for this domain?

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Hearing versus Speech/Language for CI group
21
Hearing versus Speech/Language for CI group
22
PET Duration of hearing impairment versus
brainactivity in auditive cortexNature, jan
2001Lee et al.
23
Cochlear inplants Research Questions

Research Question 1 Can we make a formal model
of language development of young children that
allows us to understand
Why the process is efficient?
Why the process is discontinuous?
Research Question 2 Can this formal model be
used to develop diagnostic tests for
(conginetally deaf) children with language
problems?

24
The Underlying Challenge learning natural
language from text
Searles Chinese Room
Bron illustratie Scientific American Jan. 1990
25
The underlying challenge

Research Question Can we learn natural language
efficiently from text?
How much text is needed. How much processing is
needed.
First hypothesis Clustering of expressions and
contexts seems a good idea
John (makes) tea
John (drinks) tea
Mary drinks tea
John drinks coffee
Second hypothesis context-free is a good first
approximation of natural language.

26
1 Clustering of expressions and contexts seems a
good idea

Lamb 1961
Wolff 1978, 1988
Langley 1980
Carroll Charniak 1992
Pereira Schabes 1992
Adriaans 1992, 1999
Brill 1993
Stolcke Omuhundro 1994
Adriaans Haas 1999
Van Zaanen 2000
Clark 2001
Klein Manning 2001, 2003

27
2 context-free is a good approximation of
natural language.

1967 Gold Identification In the Limit
Context-free are are not identifiable in the
limit from positive data
1969 Horning Probabilistic context-free
grammars can be learned from positive data.
1984 Valiant Distribution Free PAC learning
1991 Li Vitanyi PAC Learning under simple
distributions
1992 Adriaans Efficient learning of shallow
languages
2000 Vervoort implementation of Emile tool
2001 Van Zaanen Alignment Based Learning (ABL)
induction of bracket structure from strings
2004 Solan et al. ADIOS

28
The Gold paradigm
29
Game theory

Challenger selects language
Presents enumeration of language
Learner produces an infinite number of guesses
If there is a winning strategy for the learner
then the language is learnable
Theorem (Gold) In any grammar system, a class G
of grammars is not learnable if L(G) contains all
finite languages and at least one infinite
language
Limit points
Locking sequences
Finite elasticiy

30
Learnability results (Gold)
31
Why superfinite sets are not identifiable in the
limit

1 a
2 a,aa
3 a,aa,aaa
4 a,aa,aaa,aaaa
5 a,aa,aaa,aaaa,aaaaa
....
? a,aa,aaa,aaaa,aaaaa, ...
Overgeneralisation

32
So we have to turn our attention to probabilistic
solutions

1969 Horning Probabilistic context-free
grammars can be learned from positive data.
Given a text T and two grammars G1 and G2 we are
able to approximate max(P(G1T), P(G2T))
So lets try distribution free PAC (Probably
approximately Correct) learning Valiant 1984

33
PAC Learning
P on ??
??
34
PAC Learning
P on ??
??
f
35
PAC Learning
P on ??
??
g
36
PAC Learning
P on ??
??
f?g
g
37
PAC Learning
P on ??
??
f?g
g
P(f?g) ? ? with probability (1-?)
38
PAC Learning

For all target concepts f?F and all probability
distributions P on ?? the algorithm A outputs a
concept g?F such that with probability (1-?),
P(f?g) ? ?
F concept class? confidence parameter?
error parameterf?g (f-g) ? (g-f)
Polynomial in ? and ?
BAD NEWS! Powerlaws dominate wordfrequencies!

39
Scientific Text Bitterbase (Unilever)

The bitter taste of naringin and limonin was not
affected by glutamic acid rmflav 160 Exp.Ok
Naringin, the second of the two bitter principles
in citrus, has been shown to be a depressor of
limonin bitterness detection thresholds rmflav
1591 Florisil reduces bitterness and tartness
without altering ascorbic acid and soluble solids
(primarily sugars) content rmflav 584
nfluence pH on system was studied. The best
substrate for Rhodococcus fascians at pH 7.0 was
limonoate whereas at pH 4.0 to 5.5 it appeared to
be limonin. Results suggest that the citrus juice
debittering process start only once the natural
precursor of limonin (limonoate A ring lactone)
has been transformed into limonin, the
equilibrium displacement being governed by the
citrus juice pH. rmflav 474rmflav 504
Limonin D-ring lactone hydrolase, the enzyme
catalysing the reversible lactonization/hydrolysis
of D-ring in limonin, has been purified from
citrus seeds and immobilized on Q-Sepharose to
produce homogeneous limonoate A-ring lactone
solutions. The immobilized limonin D-ring lactone
hydrolase showed a good operational stability and
was stable after sixty-seventy operations and
storing at 4C for six months.

40
Study of Benign Distributions
41
Colloquial Speech Corpus Spoken Dutch

" omdat ik altijd iets met talen wilde
doen.""dat stond in elk geval uh voorop bij
mij.""en Nederlands leek me leuk.""da's
natuurlijk een erg afgezaagd antwoord maar dat
was 't wel.""en uhm ik ben d'r maar gewoon aan
begonnen aan de en ik uh heb 't met uh ggg
gezondheid.""ggg.""ik heb 't met uh met veel
plezier gedaan.""ja prima.""ja 'k vind 't nog
steeds leuk."

42
Study of Benign Distributions
43
Motherese Sarah-Jaqueline

JAC kijk, hier heb je ook puzzeltjes.
SAR die (i)s van mij.
JAC die zijn van jouw, ja.
SAR die (i)s ...
JAC kijken wat dit is.
SAR kijken.
JAC we hoeven natuurlijk niet alle zooi te
bewaren.
SAR en die.
SAR die (i)s van mij, die.
JAC die is niet kompleet.
JAC die legt mamma maar terug.
SAR die (i)s van mij.
SAR xxx.
SAR die ga in de kast, deze.
JAC die ltgaat in de kastgt ", ja.
JAC molenspel.
SAR mole(n)spel ".

44
Study of Benign Distributions
45
Observation

Word Frequencies in human utterances dominated by
powerlaws
High Frequency core
Low Frequency heavy tail
Third Hypothesis Language is open. Grammar is
elastic. Occurence of new words is natural
phenomenon. Syntactic/semantic bootstrapping must
play an important role in language learning.
Bootstrapping might be important for ontology
learning as well as child language acquisition
Better understanding of distributions is necessary

46
What kind of distributions?

Simple distributions

47
Kolmogorov complexity (for dummies)

01010101010101010101010..
11000110111001010010011101 ..
The Kolmogorov complexity of a binary object is
the length of the shortest program that generates
this object on a universal Turing machine
Random strings are not compressible
A message with low Kolmogorov complexity is
compressible
Problem Kolmogorov complexity is non
constructive!

48
Kolmogorov complexity

Let U be a universal Turing Machine then the
(prefix) Kolmogorov complexity of string x given
string y with respect to U isKU(xy) min
P P?0,1, U(P,y)xK(x) KU(x?)The
Universal probability of a binary string x P
U(x) ?U(P)x 2-P

49
Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
1
1 0 11 10 01 00 111 110 10011 101100
...
0
50
Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
1
1 0 11 10 01 00 111 110 10011 101100
...
?(x)
0
51
Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
i.e. c?(x)? ?(x) for fixed c independent of x
1
1 0 11 10 01 00 111 110 10011 101100
...
c?(x)
?(x)m(x)
0
52
The universal distribution m

The coding theorem (Levin) -log m(x) -log
PU(x)O(1)K(x)O(1)
A distribution is simple if it is dominated by a
recursively enumerable distribution
Li Vitanyi A concept class C is learnable
under m(x) iff C is also learnable under any
arbitrary simple distribution P(x) provided the
samples are taken according to m(x).

53
Problem Finite recursive grammar with infinite
sentences

How do we know we have enoughexamples?
The notion of a characteristic sample
InformallyA sample S of a language G is
characteristic if we can reconstruct G from S

54
How do can we draw a characteristic sample under m

Solution notion of shallowness Informally A
language is shallow if we only need short
examples to learn it
A language SL(G) is shallow is there exists a
characteristic sample CG forS such that? s ?
CG (s) ? c log K(G)

55
Simple and Shallow for dummies

Solution notion of shallowness Informally A
language is shallow if we only need short
examples to learn it
Shallow structures are constructed from an
exponential number of small buidingblocks
Simple distributions are typical for those sets
that are generated by a computational process
The Universal Distribution m is a non-computable
distribution that dominates all simple
distributions multiplicatively
Objects with low Kolmogorov complexity have high
probability
Li Vitanyi A concept class C is learnable
under m(x) iff C is also learnable under any
arbitrary simple distribution P(x) provided the
samples are taken according to m(x).

56
Characteristic sample
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
??
S
CG
57
Shallowness

Seems to be an independent category
There are finite languages that are not shallow

01 11 101 100 001 110 1110 0001 1011 0000000000011
00101011000010010010100101010
type0
C
Context sensitive
Context-free
regular
Shallow
58
Shallowness is very restrictive
rules
In terms of natural growth If one expands the
longest rule by 1 bit one must double the number
of rules
lt log G
Context-free Grammar G
ltG
Head Body
59
The learning algorithm

One can prove that, using clustering techniques,
shallow CFGs can be learned efficiently from
positive examples drawn under m.
General idea ????? ? ???\?/??
sentence? expression?\?/? context

?
?
? ? ? ?
? ?
60
Grammar Formalisms Context_free

Context_free GrammarSentence ? Name Verb
Sentence ? Name T_Verb Name Name ? Mary
JohnVerb ? WalksT_Verb ? Loves
Sentences John loves Mary Mary walks

61
Grammar Formalisms Categorial Grammars

Categorial Grammar (Lexicalistic)loves ?Name \
Sentence / Name walks Runs ? Name \
SentenceMary John ? Name
Parsing as deduction ? ? ?\????/? ? ???

Sentence
Name Name\Sentence
Name (Name\Sentence)/Name Name
John loves
Mary
62
Categorial Grammar Propositional calculus
without structural rules

Interchange x, A, y, B, z ? C x, B, y, A, z
? C
Contraction x, A, A, y ? C x, A, y ? C
Thinning x, y ? C x, A, y ? C
Logic A (A ? B) ? B (A ? B) A ? B
Grammar A ? (A \ B) ? B (A / B) ? A ? B

63
Categorial Grammar Formalism Algebraic
specification

M is a multiplicative system
A ? B x ? y ? M (x ?A) (y ?B)
C / B x ?M ? y?B (x ? y ?C)
A \ C y ?M ? x?A (x ? y ?C)

64
Categorial Grammar Formalism Algebraic
specification Data base operations

Name John, Mary
Verb walks, runs
S Name ? Verb John, Mary ? walks,
runs John walks John runs Mary
walks Mary runs

65
Categorial Grammar Formalism Algebraic
specification Data base operations

Name \ S
John, Mary \ John walks, John runs, Mary
Walks, Mary runs John walks Mary
walks John runs Mary runs walks,
runs
S / Verb
John walks, John runs, Mary Walks, Mary runs /
walks, runs John. Mary

66
EMILE 3.0 stages Take Sample
John loves Mary Mary walks
67
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
S/loves Mary ? John S/Mary ? John loves S ?
John loves Mary John\S/Mary ? loves John\S ?
loves Mary John loves\S ? Mary S/walks ?
Mary S ? Mary walks Mary\S ? walks
68
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
69
EMILE 3.0 stages Complete First order Explosion
John loves Mary Mary walks
70
EMILE 3.0 stages Clustering
John loves Mary Mary walks
71
EMILE 3.0 stages Clustering
John loves Mary Mary walks
72
EMILE 3.0 stages Clusters ? non-terminal names
John loves Mary Mary walks
A
B
C
D
E
73
EMILE 3.0 stages protorules
S/loves Mary ? A S/Mary ? B S ? C John\S/Mary
? D John\S ? E John loves\S ? A S/walks ?
A Mary\S ? E
A ? John B ? John loves C ? John loves
Mary D ? loves E ? loves Mary A ?
Mary C ? Mary walks E ? walks
74
Emile 3.0 stages generalize into Context-free
rules
John\S/Mary ? D _____________________________ S
? John D Mary A ? John
(Chacteristic expression) A ?
Mary (Chacteristic expression) __________________
___________ S ? A D A
Grammar S ? A D A S ? B A S ? C S ? A E A ?
John Mary B ? A D C ? A D A A E D ?
loves E ? A D walks
75
Theorem (Adriaans 92)

If a language L has a context_free grammar
Gis shallow is sampled according to the
Universal Distributionand there is a
member-check function availablethen then it can
be learned efficiently from text
Assumptions Natural language is
shallow Distributions of sentences in a text is
simple

76
EMILE 3.0 (1992) Problems, not very practical

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction

77
EMILE 3.0 (1992) Problems

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction
Supervised, no text, speakers do not give
negative examples

78
EMILE 3.0 (1992) Problems

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction
Supervised, no text, speakers do not give
negative examples
Polynomial, but very complex due to overlapping
clusters

79
EMILE 3.0 (1992) Only theoretical value

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction
Supervised, no text, speakers do not give
negative examples
Polynomial, but very complex due to overlapping
clusters
Batch oriented, not incremental

80
New theory necessary bootstrapping phenomena!

Lewis Caroll's famous poem Jabberwocky' starts
with
'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe
All mimsy were the borogoves
and the mome raths outgrabe.

81
New theory necessary bootstrapping phenomena!
Heavy Low Frequency Tail
Structured High Frequency Core
82
New theory necessary bootstrapping phenomena!

An expression of a type T is characteristic for T
if it only appears with contexts of type T
Similarly, a context of a type T is
characteristic for T if it only appears with
expressions of type T.
Let G be a grammar (context-free or otherwise) of
a language L. G has context separability if each
type of G has a characteristic context, and
expression separability if each type of G has a
characteristic expression.
Natural languages seem to be context- and
expression-separable.
This is nothing but stating that languages can
definine their own concepts internally (...is a
noun, ...is a verb).

83
Natural languages are shallow

A class of languages C is shallow if for each
language L it is possible to find a context- and
expression-separable grammar G, and a set of
sentences S inducing characteristic contexts and
expressions for all the types of G, such that the
size of S and the length of the sentences of S
are logarithmic in the descriptive length of
L(relative to C).
Seems to hold for natural languages ? Large
dictionaries

84
EMILE 4.1 (2000) Vervoort

Unsupervised
Two dimensional clustering random search for
maximized blocks in the matrix
Incremental thresholds for filling degree of
blocks
Simple (but sloppy) rule induction using
characteristic expressions

85
Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
86
Emile guaranteed to find types with right settings

Let T be a type with a characteristic context cch
and a characteristic expression ech. Suppose that
the maximum lengths for primary contexts and
expressions are set to at least cch and ech
and suppose that the total_support ,
expression_support and context_support
settings are all set to 100 . Let TltmaxC and
TltmaxE be the sets of contexts and expressions of
T that are small enough to be used as primary
contexts and expressions. If EMILE is given a
sample containing all combinations of contexts
from TltmaxC and expressions from TltmaxE, then
EMILE will find type T. (Vervoort 2000)

87
Original grammar

S ? NP V_i ADV
NP_a VP_a
NP_a V_s that S
NP ? NP_a
NP_p
VP_a ? V_t NP
V_t NP P NP_p
NP_a ? John Mary the man the child
NP_p ? the car the city the house the shop
P ? with near in from
V_i ? appears is seems looks
V_s ? thinks hopes tells says
V_t ? knows likes misses sees
ADV ? large small ugly beautiful

88
Learned Grammar after 100.000 examples

0 ?17 6
0 ?17 22 17 6
0 ?17 22 17 22 17 22 17 6
6 ? misses 17 likes 17 knows 17
sees 17
6 ?22 17 6
6 ? appears 34 looks 34 is 34 seems
34
6 ?6 near 17 6 from 17 6 in 17
6 ?6 with 17
17 ? the child Mary the city the man
John the car the house the shop
22 ? tells that thinks that hopes that
says that
22 ?22 17 22
34 ? small beautiful large ugly

89
Hypothesis Natural Languages are shallowWe can
learn them from text
C5103
Grammar Size
5107 sentences
Sample Size
90
Bible books

King James version
31102 verses of 82935 lines
4,8 Mb of English text
001001 In the beginning God created the heaven
and the earth.
66 Experiments with increasing sample size
Initially Book Genesis, Book Exodus,
Full run 40 minutes, 500 Mb on Ultra-2 Sparc

91
Bible books
92
GI on the bible

0 ? Thou shall not 582
0 ? Neither shalt thou 582
582 ? eat it
582 ? kill .
582 ? commit adultery .
582 ? steal .
582 ? bear false witness against thy neighbour
.
582 ? abhor an Edomite

93
Knowledge base in Bible

Dictionary Type 76
Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
Kohath, Merari, Aaron, Amram, Mushi, Shimei,
Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
Zophah, Elpaal, Jehieli
Dictionary Type 362
plague, leprosy
Dictionary Type 414
Simeon, Judah, Dan, Naphtali, Gad, Asher,
Issachar, Zebulun, Benjamin, Gershom
Dictionary Type 812
two, three, four

Dictionary Type 1056
priests, Levites, porters, singers, Nethinims
Dictionary Type 978
afraid, glad, smitten, subdued
Dictionary Type 2465
holy, rich, weak, prudent
Dictionary Type 3086
Egypt, Moab, Dumah, Tyre, Damascus
Dictionary Type 4082
heaven, Jerusalem

94
A simple partial grammar

0 --gt 12 ?
12 --gt Waar 23
12 --gt Wie 46
12 --gt Hoe 13
12 --gt Wat 31
23 --gt wonen jullie
23 --gt woon jij
23 --gt woont u
23 --gt wasje
23 --gt ben je geweest
46 --gt kan 49
46 --gt weet hoe laat de treinen uit Rotterdam
vertrekken
46 --gt heeft gewonnen
49 --gt dat betalen
49 --gt mij dat uitleggen
49 --gt dat verklaren
49 --gt voor dit verschijnsel een verklaring
geven

13 --gt heet jij
13 --gt heet u
13 --gt laat begint de les
13 --gt lang doet de trein over de afstand
Rotterdam-Den Haag
13 --gt komt dat
13 --gt vieren mensen eigenlijk feest
31 --gt is 33
31 --gt heb je gisteren gedaan
31 --gt vind jij daarvan
33 --gt jouw naam
33 --gt uw naam
33 --gt je leeftijd
33 --gt jouw mening hierover
33 --gt jouw opvatting over dit onderwerp

95
Problems

Results at first sight disappointing.
Conversion to meaningful syntactic type rarely
observed.
Types seem to be semantic rather than syntactic.
Why?
Hypothesis distribution in real life text is
semantic, not syntactic.
Semantic grammar is intermediate compression
level between term algebra syntactic algebra.

96
Characteristic sample Semantic Learning
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
??
S
True
CG
97
Syntactic Learning Substitution salva
beneformatione
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
Sentence Noun Name
98
Semantic Learning Substitution salva veritate
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
True
Sentence Noun Name
99
Semantic Learning Substitution salva veritate
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
Compositionality Semantics Intermediate
Compression level
True False Ed Fido Tweety
Mammal Horse Dog Bird
Sentence Noun Name
100
Not a bug, but a feature semantic learning

Dictionary Type 362
plague, leprosy
Dictionary Type 1056
priests, Levites, porters, singers, Nethinims
Dictionary Type 978
afraid, glad, smitten, subdued
Dictionary Type 2465
holy, rich, weak, prudent

101
EMILE 4.1 Molecular Biology of the Cell, 3rd
edition

2.5 megabytes of data (text only)
60 of the text used
Number of different sentences read 5461
Number of different words 13396
Number of different contexts 896343
Number of different expressions 782123
Number of different grammatical types 67
Number of dictionary types 17
Potential problems (1) language is learned
from redundancy
(2) high linguistic complexity
Potential solutions (1) more data
(2) provide seed grammars

102
EMILE 4.1 Molecular Biology of the Cell, 3rd
edition

Extracts cell information
0 --gt Eucaryotic Cells 14
14 --gt Contain Several Distinctive Organelles
14 --gt Depend on Mitochondria for Their
Oxidative Metabolism
14 --gt Contain a Rich Array of Internal
Membranes
14 --gt Have a Cytoskeleton
Extracts chemical information
0 --gt 1 .
1 --gt B-OH ATP - 19 2
19 --gt gt B-O-P ADP
19 --gt gt B-O-P-P AMP
0 --gt The 2
2 --gt 35 Is Asymmetrical
35 --gt DNA Replication Fork
35 --gt Lipid Bilayer

103
EMILE 4.1 Molecular Biology of the Cell, 3rd
edition

Extracts biological information
0 --gt 1 .
1 --gt This 3
3 --gt phenomenon is 37
37 --gt known as gene conversion
37 --gt called genomic imprinting
0 --gt 64 in the Golgi Apparatus
64 --gt Oligosaccharide Chains Are Processed
64 --gt Proteoglycans Are Assembled
0 --gt Mitochondria and Chloroplasts Contain
66
66 --gt Complete Genetic Systems
66 --gt Tissue-specific Proteins

104
The underlying challenge Conclusion

Research Question Can we learn natural language
efficiently from text?
How much text is needed. How much processing is
needed.
Yes, under reasonable assumptions, but the type
of text we need to do that is not available. We
find semantic distributions, not syntactic
distributed corpora. Child language acquisition
may be a good domain to look at. Bootstrapping
plays an important role.
Natural language is shallow, has context and
expression separability. That is what makes it
learnable.

105
Adaptive Information Disclosure Conclusion

The engineering challenge build a routine that
can enrich an under-specified model A on the
basis of a collection of ontologies B1,,Bn and
a collection of corpora C1,,Ck and present the
results in mode D
Research Question 1 Can we learn an ontology for
a certain domain from scratch on the basis of a
collection of documents describing that domain?
Probably, expression separability and context
separability should do the job.
Research Question 2 Can we use grammar induction
on a collection of documents describing a domain
to formulate suggestions for the enrichment and
adaptation of an ontology for this domain?
Yes!

106
Cochlear inplants Research Questions

Research Question 1 Can we make a formal model
of language development of young children that
allows us to understand
Why the process is efficient?
Why the process is discontinuous?
Promising!
Research Question 2 Can this formal model be
used to develop diagnostic tests for
(conginetally deaf) children with language
problems?
Maybe.

107
Language acquisition Phases Hybrid learning

Phase 1 (0-9) months Linking acoustics and
events
babling Model DFA, learning strategy
evidence based state merging
Phase 2 (9-24) months Children categorize words
into word classes and show evidence of early
sensitivity to syntax
wordclasses, Markers Model some complex
interaction between deixis and babbling, learning
strategy semantic learning
Phase 3 (2-3,5) years Language meaning and
syntax structure is acquired
recursive rules Model context-free grammar,
learning strategy Seginer, EMILE