Title: Learning Logic and Grammar
1Learning Logic and Grammar
Grammar Induction Semantic Learning Pieter
Adriaans Universiteit van Amsterdam pietera_at_scien
ce.uva.nl
2Structure of talk grammar induction and semantic
learning
- Introduction
- Induction and learning
- Motivation
- Virtual Lab adaptive information disclosure
- Language development of children wiith cochlear
implants - The challenge learning from text
- Early Research Gold, Horning, Valiant
- Emile 2 Kolmogorov Complexity Univeral
Distribution Shallowness - Emile 3 Charateristic expressions and contexts
bootstrapping - Emperical tests at first not very convincing.
Why? - Semantic Distributions Not a bug, a feature!
- From Grammar Induction to Semantic Learning
- grammar induction under semantic distributions
semantic learning - Conclusion
3Induction A desperate discipline
- 400 BC. Herakleitos panta rhei
- 400 BC. Parmenides phusis kruptestai philei
- 200 BC 150 AD Pyrrhonism knowlegde is not
possible - 1750 Hume induction problem
- 1910 Russell paradox ramified theory of types
- 1930 Godel incompleteness of arithmetic
- 1935 Turing/Church undecidability,
incomputability - 1935 Popper asymmetry between verification and
falsification - 1965 Kolmogorov complexity is nonconstructive
- 1967 Gold superfinite sets not identifiable in
the limit - 1988 Pitt Warmuth selecting the smallest
automaton is NP-hard - 1995 Wolpert Macready No free lunch theorem
4There is no universal learning method
- What we can do Identify classes of algorithms
that work for classes of problems - The number of such classes is in principle
infinite - Machine Learning is essentially heuristic
- Machine Learning is essentially empirical which
methods work for which domains in reality?
5Paul Feyerabend Against method 1975
- Science is an essentially anarchistic enterprise
theoretical anarchism is more humanitarian and
more likely to encourage progress than its
law-and-order alternatives. - This is shown both by an examination of
historical episodes and by an abstract analysis
of the relation between idea and action. The only
principle that does not inhibit progress is
anything goes.
6Learning adaptive systems
- What is Machine Learning?The use of
computational techniques to detect structures in
datasets - What kind of datasets?continuous -
discretefinite - infiniteone-dimensional -
multidimensionalbinary - nonbinary strings
sets trees databases - pictures etc. etc.
7Base case 2-part code optimization
Observed Data
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
8Paradigm case Finite Binary string
000110100110011111010101011010100000
- Data
- Theory
- Data
- Theory
- Theory Program input lt Data
Program
Input
010101010101010101010101010101010101
For i 1 to x print y
x-18 y 01
9Unsupervised Learning
Non-random (computational) Proces
Input
Unknown System
Observed Output
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
10Supervised Learning
Non-random (computational) Proces
Unknown System
Observed Output
Observed Input
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
11Adaptive System
Non-random (computational) Proces
Unknown System
Observed Output
Observed Input
Learning
Input
Program
Learned Theory
12Agent System
Unknown System
Non-random (computational) Proces
Adaptive systems
13Practical applications
- Virtual Lab adaptive information disclosure
(Pieter Adriaans) - TNO, IBM, UvA, Unilever
- Language development of children with cochlear
implants - Jacqueline Van Kampen (UiL OTS, Utrecht), Pieter
Adriaans, Dick de Jongh (ILLC, Amsterdam), Guido
Smoorenburg (AZU, Utrecht).
14 Application cases
Application cases
Application cases
Application cases
Application cases
Application cases
E-Science Application P1
Generic Virtual Laboratory P2.
Large distributed System P3
15 Food Informatics SP.1.2
Telescience SP.1.6
Bio-diversity SP.1.4
Data intensive science SP.1.1
Medical diagnosis imaging SP.1.3
Bio-Inofrmatics SP.1.5
SP2.2
SP2.4
SP2.3
Collaborative information Management
Adaptive information Disclosing
SP2.5
SP2.1
Interactive PSE
User Interface Virtual reality
Virtual lab. System integration
HPDC Processor Data co-allocation
Security Generic AAA
Optical Networking
16Adaptive Information Disclosure Research
Questions
- The engineering challenge build a routine that
can enrich an under-specified model A on the
basis of a collection of ontologies B1,,Bn and
a collection of corpora C1,,Ck and present the
results in mode D - Research Question 1 Can we learn an ontology for
a certain domain from scratch on the basis of a
collection of documents describing that domain? - Research Question 2 Can we use grammar induction
on a collection of documents describing a domain
to formulate suggestions for the enrichment and
adaptation of an ontology for this domain?
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Hearing versus Speech/Language for CI group
21Hearing versus Speech/Language for CI group
22PET Duration of hearing impairment versus
brainactivity in auditive cortexNature, jan
2001Lee et al.
23Cochlear inplants Research Questions
- Research Question 1 Can we make a formal model
of language development of young children that
allows us to understand - Why the process is efficient?
- Why the process is discontinuous?
- Research Question 2 Can this formal model be
used to develop diagnostic tests for
(conginetally deaf) children with language
problems?
24The Underlying Challenge learning natural
language from text
Searles Chinese Room
Bron illustratie Scientific American Jan. 1990
25The underlying challenge
- Research Question Can we learn natural language
efficiently from text? - How much text is needed. How much processing is
needed. - First hypothesis Clustering of expressions and
contexts seems a good idea - John (makes) tea
- John (drinks) tea
- Mary drinks tea
- John drinks coffee
- Second hypothesis context-free is a good first
approximation of natural language.
261 Clustering of expressions and contexts seems a
good idea
- Lamb 1961
- Wolff 1978, 1988
- Langley 1980
- Carroll Charniak 1992
- Pereira Schabes 1992
- Adriaans 1992, 1999
- Brill 1993
- Stolcke Omuhundro 1994
- Adriaans Haas 1999
- Van Zaanen 2000
- Clark 2001
- Klein Manning 2001, 2003
272 context-free is a good approximation of
natural language.
- 1967 Gold Identification In the Limit
Context-free are are not identifiable in the
limit from positive data - 1969 Horning Probabilistic context-free
grammars can be learned from positive data. - 1984 Valiant Distribution Free PAC learning
- 1991 Li Vitanyi PAC Learning under simple
distributions - 1992 Adriaans Efficient learning of shallow
languages - 2000 Vervoort implementation of Emile tool
- 2001 Van Zaanen Alignment Based Learning (ABL)
induction of bracket structure from strings - 2004 Solan et al. ADIOS
28The Gold paradigm
29Game theory
- Challenger selects language
- Presents enumeration of language
- Learner produces an infinite number of guesses
- If there is a winning strategy for the learner
then the language is learnable - Theorem (Gold) In any grammar system, a class G
of grammars is not learnable if L(G) contains all
finite languages and at least one infinite
language - Limit points
- Locking sequences
- Finite elasticiy
30Learnability results (Gold)
31Why superfinite sets are not identifiable in the
limit
- 1 a
- 2 a,aa
- 3 a,aa,aaa
- 4 a,aa,aaa,aaaa
- 5 a,aa,aaa,aaaa,aaaaa
- ....
- ? a,aa,aaa,aaaa,aaaaa, ...
- Overgeneralisation
32So we have to turn our attention to probabilistic
solutions
- 1969 Horning Probabilistic context-free
grammars can be learned from positive data. - Given a text T and two grammars G1 and G2 we are
able to approximate max(P(G1T), P(G2T)) - So lets try distribution free PAC (Probably
approximately Correct) learning Valiant 1984
33PAC Learning
P on ??
??
34PAC Learning
P on ??
??
f
35PAC Learning
P on ??
??
g
36PAC Learning
P on ??
??
f?g
g
37PAC Learning
P on ??
??
f?g
g
P(f?g) ? ? with probability (1-?)
38PAC Learning
- For all target concepts f?F and all probability
distributions P on ?? the algorithm A outputs a
concept g?F such that with probability (1-?),
P(f?g) ? ? - F concept class? confidence parameter?
error parameterf?g (f-g) ? (g-f) - Polynomial in ? and ?
- BAD NEWS! Powerlaws dominate wordfrequencies!
39Scientific Text Bitterbase (Unilever)
- The bitter taste of naringin and limonin was not
affected by glutamic acid rmflav 160 Exp.Ok
Naringin, the second of the two bitter principles
in citrus, has been shown to be a depressor of
limonin bitterness detection thresholds rmflav
1591 Florisil reduces bitterness and tartness
without altering ascorbic acid and soluble solids
(primarily sugars) content rmflav 584
nfluence pH on system was studied. The best
substrate for Rhodococcus fascians at pH 7.0 was
limonoate whereas at pH 4.0 to 5.5 it appeared to
be limonin. Results suggest that the citrus juice
debittering process start only once the natural
precursor of limonin (limonoate A ring lactone)
has been transformed into limonin, the
equilibrium displacement being governed by the
citrus juice pH. rmflav 474rmflav 504
Limonin D-ring lactone hydrolase, the enzyme
catalysing the reversible lactonization/hydrolysis
of D-ring in limonin, has been purified from
citrus seeds and immobilized on Q-Sepharose to
produce homogeneous limonoate A-ring lactone
solutions. The immobilized limonin D-ring lactone
hydrolase showed a good operational stability and
was stable after sixty-seventy operations and
storing at 4C for six months.
40Study of Benign Distributions
41Colloquial Speech Corpus Spoken Dutch
- " omdat ik altijd iets met talen wilde
doen.""dat stond in elk geval uh voorop bij
mij.""en Nederlands leek me leuk.""da's
natuurlijk een erg afgezaagd antwoord maar dat
was 't wel.""en uhm ik ben d'r maar gewoon aan
begonnen aan de en ik uh heb 't met uh ggg
gezondheid.""ggg.""ik heb 't met uh met veel
plezier gedaan.""ja prima.""ja 'k vind 't nog
steeds leuk."
42Study of Benign Distributions
43Motherese Sarah-Jaqueline
- JAC kijk, hier heb je ook puzzeltjes.
- SAR die (i)s van mij.
- JAC die zijn van jouw, ja.
- SAR die (i)s ...
- JAC kijken wat dit is.
- SAR kijken.
- JAC we hoeven natuurlijk niet alle zooi te
bewaren. - SAR en die.
- SAR die (i)s van mij, die.
- JAC die is niet kompleet.
- JAC die legt mamma maar terug.
- SAR die (i)s van mij.
- SAR xxx.
- SAR die ga in de kast, deze.
- JAC die ltgaat in de kastgt ", ja.
- JAC molenspel.
- SAR mole(n)spel ".
44Study of Benign Distributions
45Observation
- Word Frequencies in human utterances dominated by
powerlaws - High Frequency core
- Low Frequency heavy tail
- Third Hypothesis Language is open. Grammar is
elastic. Occurence of new words is natural
phenomenon. Syntactic/semantic bootstrapping must
play an important role in language learning. - Bootstrapping might be important for ontology
learning as well as child language acquisition - Better understanding of distributions is necessary
46What kind of distributions?
47Kolmogorov complexity (for dummies)
- 01010101010101010101010..
- 11000110111001010010011101 ..
- The Kolmogorov complexity of a binary object is
the length of the shortest program that generates
this object on a universal Turing machine - Random strings are not compressible
- A message with low Kolmogorov complexity is
compressible - Problem Kolmogorov complexity is non
constructive!
48Kolmogorov complexity
- Let U be a universal Turing Machine then the
(prefix) Kolmogorov complexity of string x given
string y with respect to U isKU(xy) min
P P?0,1, U(P,y)xK(x) KU(x?)The
Universal probability of a binary string x P
U(x) ?U(P)x 2-P
49Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
1
1 0 11 10 01 00 111 110 10011 101100
...
0
50Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
1
1 0 11 10 01 00 111 110 10011 101100
...
?(x)
0
51Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
i.e. c?(x)? ?(x) for fixed c independent of x
1
1 0 11 10 01 00 111 110 10011 101100
...
c?(x)
?(x)m(x)
0
52The universal distribution m
- The coding theorem (Levin) -log m(x) -log
PU(x)O(1)K(x)O(1) - A distribution is simple if it is dominated by a
recursively enumerable distribution - Li Vitanyi A concept class C is learnable
under m(x) iff C is also learnable under any
arbitrary simple distribution P(x) provided the
samples are taken according to m(x).
53Problem Finite recursive grammar with infinite
sentences
- How do we know we have enoughexamples?
- The notion of a characteristic sample
InformallyA sample S of a language G is
characteristic if we can reconstruct G from S
54How do can we draw a characteristic sample under m
- Solution notion of shallowness Informally A
language is shallow if we only need short
examples to learn it - A language SL(G) is shallow is there exists a
characteristic sample CG forS such that? s ?
CG (s) ? c log K(G)
55Simple and Shallow for dummies
- Solution notion of shallowness Informally A
language is shallow if we only need short
examples to learn it - Shallow structures are constructed from an
exponential number of small buidingblocks - Simple distributions are typical for those sets
that are generated by a computational process - The Universal Distribution m is a non-computable
distribution that dominates all simple
distributions multiplicatively - Objects with low Kolmogorov complexity have high
probability - Li Vitanyi A concept class C is learnable
under m(x) iff C is also learnable under any
arbitrary simple distribution P(x) provided the
samples are taken according to m(x).
56Characteristic sample
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
??
S
CG
57Shallowness
- Seems to be an independent category
- There are finite languages that are not shallow
01 11 101 100 001 110 1110 0001 1011 0000000000011
00101011000010010010100101010
type0
C
Context sensitive
Context-free
regular
Shallow
58Shallowness is very restrictive
rules
In terms of natural growth If one expands the
longest rule by 1 bit one must double the number
of rules
lt log G
Context-free Grammar G
ltG
Head Body
59The learning algorithm
- One can prove that, using clustering techniques,
shallow CFGs can be learned efficiently from
positive examples drawn under m. - General idea ????? ? ???\?/??
sentence? expression?\?/? context
?
?
? ? ? ?
? ?
60Grammar Formalisms Context_free
- Context_free GrammarSentence ? Name Verb
Sentence ? Name T_Verb Name Name ? Mary
JohnVerb ? WalksT_Verb ? Loves - Sentences John loves Mary Mary walks
61Grammar Formalisms Categorial Grammars
- Categorial Grammar (Lexicalistic)loves ?Name \
Sentence / Name walks Runs ? Name \
SentenceMary John ? Name - Parsing as deduction ? ? ?\????/? ? ???
Sentence
Name Name\Sentence
Name (Name\Sentence)/Name Name
John loves
Mary
62Categorial Grammar Propositional calculus
without structural rules
- Interchange x, A, y, B, z ? C x, B, y, A, z
? C - Contraction x, A, A, y ? C x, A, y ? C
- Thinning x, y ? C x, A, y ? C
- Logic A (A ? B) ? B (A ? B) A ? B
- Grammar A ? (A \ B) ? B (A / B) ? A ? B
63Categorial Grammar Formalism Algebraic
specification
- M is a multiplicative system
- A ? B x ? y ? M (x ?A) (y ?B)
- C / B x ?M ? y?B (x ? y ?C)
- A \ C y ?M ? x?A (x ? y ?C)
64Categorial Grammar Formalism Algebraic
specification Data base operations
- Name John, Mary
- Verb walks, runs
- S Name ? Verb John, Mary ? walks,
runs John walks John runs Mary
walks Mary runs
65Categorial Grammar Formalism Algebraic
specification Data base operations
- Name \ S
- John, Mary \ John walks, John runs, Mary
Walks, Mary runs John walks Mary
walks John runs Mary runs walks,
runs - S / Verb
- John walks, John runs, Mary Walks, Mary runs /
walks, runs John. Mary
66EMILE 3.0 stages Take Sample
John loves Mary Mary walks
67EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
S/loves Mary ? John S/Mary ? John loves S ?
John loves Mary John\S/Mary ? loves John\S ?
loves Mary John loves\S ? Mary S/walks ?
Mary S ? Mary walks Mary\S ? walks
68EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
69EMILE 3.0 stages Complete First order Explosion
John loves Mary Mary walks
70EMILE 3.0 stages Clustering
John loves Mary Mary walks
71EMILE 3.0 stages Clustering
John loves Mary Mary walks
72EMILE 3.0 stages Clusters ? non-terminal names
John loves Mary Mary walks
A
B
C
D
E
73EMILE 3.0 stages protorules
S/loves Mary ? A S/Mary ? B S ? C John\S/Mary
? D John\S ? E John loves\S ? A S/walks ?
A Mary\S ? E
A ? John B ? John loves C ? John loves
Mary D ? loves E ? loves Mary A ?
Mary C ? Mary walks E ? walks
74Emile 3.0 stages generalize into Context-free
rules
John\S/Mary ? D _____________________________ S
? John D Mary A ? John
(Chacteristic expression) A ?
Mary (Chacteristic expression) __________________
___________ S ? A D A
Grammar S ? A D A S ? B A S ? C S ? A E A ?
John Mary B ? A D C ? A D A A E D ?
loves E ? A D walks
75Theorem (Adriaans 92)
- If a language L has a context_free grammar
Gis shallow is sampled according to the
Universal Distributionand there is a
member-check function availablethen then it can
be learned efficiently from text - Assumptions Natural language is
shallow Distributions of sentences in a text is
simple
76EMILE 3.0 (1992) Problems, not very practical
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
77EMILE 3.0 (1992) Problems
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
- Supervised, no text, speakers do not give
negative examples
78EMILE 3.0 (1992) Problems
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
- Supervised, no text, speakers do not give
negative examples - Polynomial, but very complex due to overlapping
clusters
79EMILE 3.0 (1992) Only theoretical value
- Take Sample Positive examples
- First Order explosion Deduction
- Complete first order explosion Positive
Negative examples - Clustering Deduction
- Non-terminal names Deduction
- Proto-rules Induction
- Context-free rules Induction
- Supervised, no text, speakers do not give
negative examples - Polynomial, but very complex due to overlapping
clusters - Batch oriented, not incremental
80New theory necessary bootstrapping phenomena!
- Lewis Caroll's famous poem Jabberwocky' starts
with - 'Twas brillig, and the slithy toves
- Did gyre and gimble in the wabe
- All mimsy were the borogoves
- and the mome raths outgrabe.
81New theory necessary bootstrapping phenomena!
Heavy Low Frequency Tail
Structured High Frequency Core
82New theory necessary bootstrapping phenomena!
- An expression of a type T is characteristic for T
if it only appears with contexts of type T - Similarly, a context of a type T is
characteristic for T if it only appears with
expressions of type T. - Let G be a grammar (context-free or otherwise) of
a language L. G has context separability if each
type of G has a characteristic context, and
expression separability if each type of G has a
characteristic expression. - Natural languages seem to be context- and
expression-separable. - This is nothing but stating that languages can
definine their own concepts internally (...is a
noun, ...is a verb).
83Natural languages are shallow
- A class of languages C is shallow if for each
language L it is possible to find a context- and
expression-separable grammar G, and a set of
sentences S inducing characteristic contexts and
expressions for all the types of G, such that the
size of S and the length of the sentences of S
are logarithmic in the descriptive length of
L(relative to C). - Seems to hold for natural languages ? Large
dictionaries
84EMILE 4.1 (2000) Vervoort
- Unsupervised
- Two dimensional clustering random search for
maximized blocks in the matrix - Incremental thresholds for filling degree of
blocks - Simple (but sloppy) rule induction using
characteristic expressions
85Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
86Emile guaranteed to find types with right settings
- Let T be a type with a characteristic context cch
and a characteristic expression ech. Suppose that
the maximum lengths for primary contexts and
expressions are set to at least cch and ech
and suppose that the total_support ,
expression_support and context_support
settings are all set to 100 . Let TltmaxC and
TltmaxE be the sets of contexts and expressions of
T that are small enough to be used as primary
contexts and expressions. If EMILE is given a
sample containing all combinations of contexts
from TltmaxC and expressions from TltmaxE, then
EMILE will find type T. (Vervoort 2000)
87Original grammar
- S ? NP V_i ADV
- NP_a VP_a
- NP_a V_s that S
- NP ? NP_a
- NP_p
- VP_a ? V_t NP
- V_t NP P NP_p
- NP_a ? John Mary the man the child
- NP_p ? the car the city the house the shop
- P ? with near in from
- V_i ? appears is seems looks
- V_s ? thinks hopes tells says
- V_t ? knows likes misses sees
- ADV ? large small ugly beautiful
88Learned Grammar after 100.000 examples
- 0 ?17 6
- 0 ?17 22 17 6
- 0 ?17 22 17 22 17 22 17 6
- 6 ? misses 17 likes 17 knows 17
sees 17 - 6 ?22 17 6
- 6 ? appears 34 looks 34 is 34 seems
34 - 6 ?6 near 17 6 from 17 6 in 17
6 ?6 with 17 - 17 ? the child Mary the city the man
John the car the house the shop - 22 ? tells that thinks that hopes that
says that - 22 ?22 17 22
- 34 ? small beautiful large ugly
89Hypothesis Natural Languages are shallowWe can
learn them from text
C5103
Grammar Size
5107 sentences
Sample Size
90Bible books
- King James version
- 31102 verses of 82935 lines
- 4,8 Mb of English text
- 001001 In the beginning God created the heaven
and the earth. - 66 Experiments with increasing sample size
- Initially Book Genesis, Book Exodus,
- Full run 40 minutes, 500 Mb on Ultra-2 Sparc
91Bible books
92GI on the bible
- 0 ? Thou shall not 582
- 0 ? Neither shalt thou 582
- 582 ? eat it
- 582 ? kill .
- 582 ? commit adultery .
- 582 ? steal .
- 582 ? bear false witness against thy neighbour
. - 582 ? abhor an Edomite
93Knowledge base in Bible
- Dictionary Type 76
- Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
Kohath, Merari, Aaron, Amram, Mushi, Shimei,
Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
Zophah, Elpaal, Jehieli - Dictionary Type 362
- plague, leprosy
- Dictionary Type 414
- Simeon, Judah, Dan, Naphtali, Gad, Asher,
Issachar, Zebulun, Benjamin, Gershom - Dictionary Type 812
- two, three, four
- Dictionary Type 1056
- priests, Levites, porters, singers, Nethinims
- Dictionary Type 978
- afraid, glad, smitten, subdued
- Dictionary Type 2465
- holy, rich, weak, prudent
- Dictionary Type 3086
- Egypt, Moab, Dumah, Tyre, Damascus
- Dictionary Type 4082
- heaven, Jerusalem
94A simple partial grammar
- 0 --gt 12 ?
- 12 --gt Waar 23
- 12 --gt Wie 46
- 12 --gt Hoe 13
- 12 --gt Wat 31
- 23 --gt wonen jullie
- 23 --gt woon jij
- 23 --gt woont u
- 23 --gt wasje
- 23 --gt ben je geweest
- 46 --gt kan 49
- 46 --gt weet hoe laat de treinen uit Rotterdam
vertrekken - 46 --gt heeft gewonnen
- 49 --gt dat betalen
- 49 --gt mij dat uitleggen
- 49 --gt dat verklaren
- 49 --gt voor dit verschijnsel een verklaring
geven
- 13 --gt heet jij
- 13 --gt heet u
- 13 --gt laat begint de les
- 13 --gt lang doet de trein over de afstand
Rotterdam-Den Haag - 13 --gt komt dat
- 13 --gt vieren mensen eigenlijk feest
- 31 --gt is 33
- 31 --gt heb je gisteren gedaan
- 31 --gt vind jij daarvan
- 33 --gt jouw naam
- 33 --gt uw naam
- 33 --gt je leeftijd
- 33 --gt jouw mening hierover
- 33 --gt jouw opvatting over dit onderwerp
95Problems
- Results at first sight disappointing.
- Conversion to meaningful syntactic type rarely
observed. - Types seem to be semantic rather than syntactic.
- Why?
- Hypothesis distribution in real life text is
semantic, not syntactic. - Semantic grammar is intermediate compression
level between term algebra syntactic algebra.
96Characteristic sample Semantic Learning
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
??
S
True
CG
97Syntactic Learning Substitution salva
beneformatione
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
Sentence Noun Name
98Semantic Learning Substitution salva veritate
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
True
Sentence Noun Name
99Semantic Learning Substitution salva veritate
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
Compositionality Semantics Intermediate
Compression level
True False Ed Fido Tweety
Mammal Horse Dog Bird
Sentence Noun Name
100Not a bug, but a feature semantic learning
- Dictionary Type 362
- plague, leprosy
- Dictionary Type 1056
- priests, Levites, porters, singers, Nethinims
- Dictionary Type 978
- afraid, glad, smitten, subdued
- Dictionary Type 2465
- holy, rich, weak, prudent
101EMILE 4.1 Molecular Biology of the Cell, 3rd
edition
- 2.5 megabytes of data (text only)
- 60 of the text used
- Number of different sentences read 5461
- Number of different words 13396
- Number of different contexts 896343
- Number of different expressions 782123
- Number of different grammatical types 67
- Number of dictionary types 17
- Potential problems (1) language is learned
from redundancy - (2) high linguistic complexity
- Potential solutions (1) more data
- (2) provide seed grammars
102EMILE 4.1 Molecular Biology of the Cell, 3rd
edition
- Extracts cell information
- 0 --gt Eucaryotic Cells 14
- 14 --gt Contain Several Distinctive Organelles
- 14 --gt Depend on Mitochondria for Their
Oxidative Metabolism - 14 --gt Contain a Rich Array of Internal
Membranes - 14 --gt Have a Cytoskeleton
- Extracts chemical information
- 0 --gt 1 .
- 1 --gt B-OH ATP - 19 2
- 19 --gt gt B-O-P ADP
- 19 --gt gt B-O-P-P AMP
- 0 --gt The 2
- 2 --gt 35 Is Asymmetrical
- 35 --gt DNA Replication Fork
- 35 --gt Lipid Bilayer
103EMILE 4.1 Molecular Biology of the Cell, 3rd
edition
- Extracts biological information
- 0 --gt 1 .
- 1 --gt This 3
- 3 --gt phenomenon is 37
- 37 --gt known as gene conversion
- 37 --gt called genomic imprinting
- 0 --gt 64 in the Golgi Apparatus
- 64 --gt Oligosaccharide Chains Are Processed
- 64 --gt Proteoglycans Are Assembled
- 0 --gt Mitochondria and Chloroplasts Contain
66 - 66 --gt Complete Genetic Systems
- 66 --gt Tissue-specific Proteins
104The underlying challenge Conclusion
- Research Question Can we learn natural language
efficiently from text? - How much text is needed. How much processing is
needed. - Yes, under reasonable assumptions, but the type
of text we need to do that is not available. We
find semantic distributions, not syntactic
distributed corpora. Child language acquisition
may be a good domain to look at. Bootstrapping
plays an important role. - Natural language is shallow, has context and
expression separability. That is what makes it
learnable.
105Adaptive Information Disclosure Conclusion
- The engineering challenge build a routine that
can enrich an under-specified model A on the
basis of a collection of ontologies B1,,Bn and
a collection of corpora C1,,Ck and present the
results in mode D - Research Question 1 Can we learn an ontology for
a certain domain from scratch on the basis of a
collection of documents describing that domain? - Probably, expression separability and context
separability should do the job. - Research Question 2 Can we use grammar induction
on a collection of documents describing a domain
to formulate suggestions for the enrichment and
adaptation of an ontology for this domain? - Yes!
106Cochlear inplants Research Questions
- Research Question 1 Can we make a formal model
of language development of young children that
allows us to understand - Why the process is efficient?
- Why the process is discontinuous?
- Promising!
- Research Question 2 Can this formal model be
used to develop diagnostic tests for
(conginetally deaf) children with language
problems? - Maybe.
107Language acquisition Phases Hybrid learning
- Phase 1 (0-9) months Linking acoustics and
events - babling Model DFA, learning strategy
evidence based state merging - Phase 2 (9-24) months Children categorize words
into word classes and show evidence of early
sensitivity to syntax - wordclasses, Markers Model some complex
interaction between deixis and babbling, learning
strategy semantic learning - Phase 3 (2-3,5) years Language meaning and
syntax structure is acquired - recursive rules Model context-free grammar,
learning strategy Seginer, EMILE
108Study of Benign Distributions
109Furhter work
- Better understanding of Semantic Learning
- Incremental Learning with background knowledge
- Learning partial ontologies
- Integrating ontologies in grammar induction
process - Pieter Adriaans pietera_at_science.uva.nl
- WEB http//turing.wins.uva.nl/pietera/ALS/