Learning Logic and Grammar - PowerPoint PPT Presentation

1 / 109
About This Presentation
Title:

Learning Logic and Grammar

Description:

... of children wiith cochlear implants. The challenge: ... Language development of children with cochlear implants ... Cochlear inplants: Research Questions ... – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 110
Provided by: padr1
Category:

less

Transcript and Presenter's Notes

Title: Learning Logic and Grammar


1
Learning Logic and Grammar
Grammar Induction Semantic Learning Pieter
Adriaans Universiteit van Amsterdam pietera_at_scien
ce.uva.nl
2
Structure of talk grammar induction and semantic
learning
  • Introduction
  • Induction and learning
  • Motivation
  • Virtual Lab adaptive information disclosure
  • Language development of children wiith cochlear
    implants
  • The challenge learning from text
  • Early Research Gold, Horning, Valiant
  • Emile 2 Kolmogorov Complexity Univeral
    Distribution Shallowness
  • Emile 3 Charateristic expressions and contexts
    bootstrapping
  • Emperical tests at first not very convincing.
    Why?
  • Semantic Distributions Not a bug, a feature!
  • From Grammar Induction to Semantic Learning
  • grammar induction under semantic distributions
    semantic learning
  • Conclusion

3
Induction A desperate discipline
  • 400 BC. Herakleitos panta rhei
  • 400 BC. Parmenides phusis kruptestai philei
  • 200 BC 150 AD Pyrrhonism knowlegde is not
    possible
  • 1750 Hume induction problem
  • 1910 Russell paradox ramified theory of types
  • 1930 Godel incompleteness of arithmetic
  • 1935 Turing/Church undecidability,
    incomputability
  • 1935 Popper asymmetry between verification and
    falsification
  • 1965 Kolmogorov complexity is nonconstructive
  • 1967 Gold superfinite sets not identifiable in
    the limit
  • 1988 Pitt Warmuth selecting the smallest
    automaton is NP-hard
  • 1995 Wolpert Macready No free lunch theorem

4
There is no universal learning method
  • What we can do Identify classes of algorithms
    that work for classes of problems
  • The number of such classes is in principle
    infinite
  • Machine Learning is essentially heuristic
  • Machine Learning is essentially empirical which
    methods work for which domains in reality?

5
Paul Feyerabend Against method 1975
  • Science is an essentially anarchistic enterprise
    theoretical anarchism is more humanitarian and
    more likely to encourage progress than its
    law-and-order alternatives.
  • This is shown both by an examination of
    historical episodes and by an abstract analysis
    of the relation between idea and action. The only
    principle that does not inhibit progress is
    anything goes.

6
Learning adaptive systems
  • What is Machine Learning?The use of
    computational techniques to detect structures in
    datasets
  • What kind of datasets?continuous -
    discretefinite - infiniteone-dimensional -
    multidimensionalbinary - nonbinary strings
    sets trees databases - pictures etc. etc.

7
Base case 2-part code optimization
Observed Data
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
8
Paradigm case Finite Binary string
000110100110011111010101011010100000
  • Data
  • Theory
  • Data
  • Theory
  • Theory Program input lt Data

Program
Input
010101010101010101010101010101010101
For i 1 to x print y
x-18 y 01
9
Unsupervised Learning
Non-random (computational) Proces
Input
Unknown System
Observed Output
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
10
Supervised Learning
Non-random (computational) Proces
Unknown System
Observed Output
Observed Input
Learning
Non-loss compression
Input
Program
Learned Theory
Theory lt Data
11
Adaptive System
Non-random (computational) Proces
Unknown System
Observed Output
Observed Input
Learning
Input
Program
Learned Theory
12
Agent System
Unknown System
Non-random (computational) Proces
Adaptive systems
13
Practical applications
  • Virtual Lab adaptive information disclosure
    (Pieter Adriaans)
  • TNO, IBM, UvA, Unilever
  • Language development of children with cochlear
    implants
  • Jacqueline Van Kampen (UiL OTS, Utrecht), Pieter
    Adriaans, Dick de Jongh (ILLC, Amsterdam), Guido
    Smoorenburg (AZU, Utrecht).

14

Application cases
Application cases
Application cases
Application cases
Application cases
Application cases
E-Science Application P1
Generic Virtual Laboratory P2.
Large distributed System P3
15

Food Informatics SP.1.2
Telescience SP.1.6
Bio-diversity SP.1.4
Data intensive science SP.1.1
Medical diagnosis imaging SP.1.3
Bio-Inofrmatics SP.1.5
SP2.2
SP2.4
SP2.3
Collaborative information Management
Adaptive information Disclosing
SP2.5
SP2.1
Interactive PSE
User Interface Virtual reality
Virtual lab. System integration
HPDC Processor Data co-allocation
Security Generic AAA
Optical Networking
16
Adaptive Information Disclosure Research
Questions
  • The engineering challenge build a routine that
    can enrich an under-specified model A on the
    basis of a collection of ontologies B1,,Bn and
    a collection of corpora C1,,Ck and present the
    results in mode D
  • Research Question 1 Can we learn an ontology for
    a certain domain from scratch on the basis of a
    collection of documents describing that domain?
  • Research Question 2 Can we use grammar induction
    on a collection of documents describing a domain
    to formulate suggestions for the enrichment and
    adaptation of an ontology for this domain?

17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Hearing versus Speech/Language for CI group
21
Hearing versus Speech/Language for CI group
22
PET Duration of hearing impairment versus
brainactivity in auditive cortexNature, jan
2001Lee et al.
23
Cochlear inplants Research Questions
  • Research Question 1 Can we make a formal model
    of language development of young children that
    allows us to understand
  • Why the process is efficient?
  • Why the process is discontinuous?
  • Research Question 2 Can this formal model be
    used to develop diagnostic tests for
    (conginetally deaf) children with language
    problems?

24
The Underlying Challenge learning natural
language from text
Searles Chinese Room
Bron illustratie Scientific American Jan. 1990
25
The underlying challenge
  • Research Question Can we learn natural language
    efficiently from text?
  • How much text is needed. How much processing is
    needed.
  • First hypothesis Clustering of expressions and
    contexts seems a good idea
  • John (makes) tea
  • John (drinks) tea
  • Mary drinks tea
  • John drinks coffee
  • Second hypothesis context-free is a good first
    approximation of natural language.

26
1 Clustering of expressions and contexts seems a
good idea
  • Lamb 1961
  • Wolff 1978, 1988
  • Langley 1980
  • Carroll Charniak 1992
  • Pereira Schabes 1992
  • Adriaans 1992, 1999
  • Brill 1993
  • Stolcke Omuhundro 1994
  • Adriaans Haas 1999
  • Van Zaanen 2000
  • Clark 2001
  • Klein Manning 2001, 2003

27
2 context-free is a good approximation of
natural language.
  • 1967 Gold Identification In the Limit
    Context-free are are not identifiable in the
    limit from positive data
  • 1969 Horning Probabilistic context-free
    grammars can be learned from positive data.
  • 1984 Valiant Distribution Free PAC learning
  • 1991 Li Vitanyi PAC Learning under simple
    distributions
  • 1992 Adriaans Efficient learning of shallow
    languages
  • 2000 Vervoort implementation of Emile tool
  • 2001 Van Zaanen Alignment Based Learning (ABL)
    induction of bracket structure from strings
  • 2004 Solan et al. ADIOS

28
The Gold paradigm
29
Game theory
  • Challenger selects language
  • Presents enumeration of language
  • Learner produces an infinite number of guesses
  • If there is a winning strategy for the learner
    then the language is learnable
  • Theorem (Gold) In any grammar system, a class G
    of grammars is not learnable if L(G) contains all
    finite languages and at least one infinite
    language
  • Limit points
  • Locking sequences
  • Finite elasticiy

30
Learnability results (Gold)
31
Why superfinite sets are not identifiable in the
limit
  • 1 a
  • 2 a,aa
  • 3 a,aa,aaa
  • 4 a,aa,aaa,aaaa
  • 5 a,aa,aaa,aaaa,aaaaa
  • ....
  • ? a,aa,aaa,aaaa,aaaaa, ...
  • Overgeneralisation

32
So we have to turn our attention to probabilistic
solutions
  • 1969 Horning Probabilistic context-free
    grammars can be learned from positive data.
  • Given a text T and two grammars G1 and G2 we are
    able to approximate max(P(G1T), P(G2T))
  • So lets try distribution free PAC (Probably
    approximately Correct) learning Valiant 1984

33
PAC Learning
P on ??
??
34
PAC Learning
P on ??
??
f
35
PAC Learning
P on ??
??
g
36
PAC Learning
P on ??
??
f?g
g
37
PAC Learning
P on ??
??
f?g
g
P(f?g) ? ? with probability (1-?)
38
PAC Learning
  • For all target concepts f?F and all probability
    distributions P on ?? the algorithm A outputs a
    concept g?F such that with probability (1-?),
    P(f?g) ? ?
  • F concept class? confidence parameter?
    error parameterf?g (f-g) ? (g-f)
  • Polynomial in ? and ?
  • BAD NEWS! Powerlaws dominate wordfrequencies!

39
Scientific Text Bitterbase (Unilever)
  • The bitter taste of naringin and limonin was not
    affected by glutamic acid rmflav 160 Exp.Ok
    Naringin, the second of the two bitter principles
    in citrus, has been shown to be a depressor of
    limonin bitterness detection thresholds rmflav
    1591 Florisil reduces bitterness and tartness
    without altering ascorbic acid and soluble solids
    (primarily sugars) content rmflav 584
    nfluence pH on system was studied. The best
    substrate for Rhodococcus fascians at pH 7.0 was
    limonoate whereas at pH 4.0 to 5.5 it appeared to
    be limonin. Results suggest that the citrus juice
    debittering process start only once the natural
    precursor of limonin (limonoate A ring lactone)
    has been transformed into limonin, the
    equilibrium displacement being governed by the
    citrus juice pH. rmflav 474rmflav 504
    Limonin D-ring lactone hydrolase, the enzyme
    catalysing the reversible lactonization/hydrolysis
    of D-ring in limonin, has been purified from
    citrus seeds and immobilized on Q-Sepharose to
    produce homogeneous limonoate A-ring lactone
    solutions. The immobilized limonin D-ring lactone
    hydrolase showed a good operational stability and
    was stable after sixty-seventy operations and
    storing at 4C for six months.

40
Study of Benign Distributions
41
Colloquial Speech Corpus Spoken Dutch
  • " omdat ik altijd iets met talen wilde
    doen.""dat stond in elk geval uh voorop bij
    mij.""en Nederlands leek me leuk.""da's
    natuurlijk een erg afgezaagd antwoord maar dat
    was 't wel.""en uhm ik ben d'r maar gewoon aan
    begonnen aan de en ik uh heb 't met uh ggg
    gezondheid.""ggg.""ik heb 't met uh met veel
    plezier gedaan.""ja prima.""ja 'k vind 't nog
    steeds leuk."

42
Study of Benign Distributions
43
Motherese Sarah-Jaqueline
  • JAC kijk, hier heb je ook puzzeltjes.
  • SAR die (i)s van mij.
  • JAC die zijn van jouw, ja.
  • SAR die (i)s ...
  • JAC kijken wat dit is.
  • SAR kijken.
  • JAC we hoeven natuurlijk niet alle zooi te
    bewaren.
  • SAR en die.
  • SAR die (i)s van mij, die.
  • JAC die is niet kompleet.
  • JAC die legt mamma maar terug.
  • SAR die (i)s van mij.
  • SAR xxx.
  • SAR die ga in de kast, deze.
  • JAC die ltgaat in de kastgt ", ja.
  • JAC molenspel.
  • SAR mole(n)spel ".

44
Study of Benign Distributions
45
Observation
  • Word Frequencies in human utterances dominated by
    powerlaws
  • High Frequency core
  • Low Frequency heavy tail
  • Third Hypothesis Language is open. Grammar is
    elastic. Occurence of new words is natural
    phenomenon. Syntactic/semantic bootstrapping must
    play an important role in language learning.
  • Bootstrapping might be important for ontology
    learning as well as child language acquisition
  • Better understanding of distributions is necessary

46
What kind of distributions?
  • Simple distributions

47
Kolmogorov complexity (for dummies)
  • 01010101010101010101010..
  • 11000110111001010010011101 ..
  • The Kolmogorov complexity of a binary object is
    the length of the shortest program that generates
    this object on a universal Turing machine
  • Random strings are not compressible
  • A message with low Kolmogorov complexity is
    compressible
  • Problem Kolmogorov complexity is non
    constructive!

48
Kolmogorov complexity
  • Let U be a universal Turing Machine then the
    (prefix) Kolmogorov complexity of string x given
    string y with respect to U isKU(xy) min
    P P?0,1, U(P,y)xK(x) KU(x?)The
    Universal probability of a binary string x P
    U(x) ?U(P)x 2-P

49
Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
1
1 0 11 10 01 00 111 110 10011 101100
...
0
50
Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
1
1 0 11 10 01 00 111 110 10011 101100
...
?(x)
0
51
Universal Distributions
A recursively enumerable semi-measure ? is called
universal if it multiplicatively dominates every
other semi-measure ?
i.e. c?(x)? ?(x) for fixed c independent of x
1
1 0 11 10 01 00 111 110 10011 101100
...
c?(x)
?(x)m(x)
0
52
The universal distribution m
  • The coding theorem (Levin) -log m(x) -log
    PU(x)O(1)K(x)O(1)
  • A distribution is simple if it is dominated by a
    recursively enumerable distribution
  • Li Vitanyi A concept class C is learnable
    under m(x) iff C is also learnable under any
    arbitrary simple distribution P(x) provided the
    samples are taken according to m(x).

53
Problem Finite recursive grammar with infinite
sentences
  • How do we know we have enoughexamples?
  • The notion of a characteristic sample
    InformallyA sample S of a language G is
    characteristic if we can reconstruct G from S

54
How do can we draw a characteristic sample under m
  • Solution notion of shallowness Informally A
    language is shallow if we only need short
    examples to learn it
  • A language SL(G) is shallow is there exists a
    characteristic sample CG forS such that? s ?
    CG (s) ? c log K(G)

55
Simple and Shallow for dummies
  • Solution notion of shallowness Informally A
    language is shallow if we only need short
    examples to learn it
  • Shallow structures are constructed from an
    exponential number of small buidingblocks
  • Simple distributions are typical for those sets
    that are generated by a computational process
  • The Universal Distribution m is a non-computable
    distribution that dominates all simple
    distributions multiplicatively
  • Objects with low Kolmogorov complexity have high
    probability
  • Li Vitanyi A concept class C is learnable
    under m(x) iff C is also learnable under any
    arbitrary simple distribution P(x) provided the
    samples are taken according to m(x).

56
Characteristic sample
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
??
S
CG
57
Shallowness
  • Seems to be an independent category
  • There are finite languages that are not shallow

01 11 101 100 001 110 1110 0001 1011 0000000000011
00101011000010010010100101010
type0
C
Context sensitive
Context-free
regular
Shallow
58
Shallowness is very restrictive
rules
In terms of natural growth If one expands the
longest rule by 1 bit one must double the number
of rules
lt log G
Context-free Grammar G
ltG
Head Body
59
The learning algorithm
  • One can prove that, using clustering techniques,
    shallow CFGs can be learned efficiently from
    positive examples drawn under m.
  • General idea ????? ? ???\?/??
    sentence? expression?\?/? context

?
?
? ? ? ?
? ?
60
Grammar Formalisms Context_free
  • Context_free GrammarSentence ? Name Verb
    Sentence ? Name T_Verb Name Name ? Mary
    JohnVerb ? WalksT_Verb ? Loves
  • Sentences John loves Mary Mary walks

61
Grammar Formalisms Categorial Grammars
  • Categorial Grammar (Lexicalistic)loves ?Name \
    Sentence / Name walks Runs ? Name \
    SentenceMary John ? Name
  • Parsing as deduction ? ? ?\????/? ? ???

Sentence
Name Name\Sentence
Name (Name\Sentence)/Name Name
John loves
Mary
62
Categorial Grammar Propositional calculus
without structural rules
  • Interchange x, A, y, B, z ? C x, B, y, A, z
    ? C
  • Contraction x, A, A, y ? C x, A, y ? C
  • Thinning x, y ? C x, A, y ? C
  • Logic A (A ? B) ? B (A ? B) A ? B
  • Grammar A ? (A \ B) ? B (A / B) ? A ? B

63
Categorial Grammar Formalism Algebraic
specification
  • M is a multiplicative system
  • A ? B x ? y ? M (x ?A) (y ?B)
  • C / B x ?M ? y?B (x ? y ?C)
  • A \ C y ?M ? x?A (x ? y ?C)

64
Categorial Grammar Formalism Algebraic
specification Data base operations
  • Name John, Mary
  • Verb walks, runs
  • S Name ? Verb John, Mary ? walks,
    runs John walks John runs Mary
    walks Mary runs

65
Categorial Grammar Formalism Algebraic
specification Data base operations
  • Name \ S
  • John, Mary \ John walks, John runs, Mary
    Walks, Mary runs John walks Mary
    walks John runs Mary runs walks,
    runs
  • S / Verb
  • John walks, John runs, Mary Walks, Mary runs /
    walks, runs John. Mary

66
EMILE 3.0 stages Take Sample
John loves Mary Mary walks
67
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
S/loves Mary ? John S/Mary ? John loves S ?
John loves Mary John\S/Mary ? loves John\S ?
loves Mary John loves\S ? Mary S/walks ?
Mary S ? Mary walks Mary\S ? walks
68
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
69
EMILE 3.0 stages Complete First order Explosion
John loves Mary Mary walks
70
EMILE 3.0 stages Clustering
John loves Mary Mary walks
71
EMILE 3.0 stages Clustering
John loves Mary Mary walks
72
EMILE 3.0 stages Clusters ? non-terminal names
John loves Mary Mary walks
A
B
C
D
E
73
EMILE 3.0 stages protorules
S/loves Mary ? A S/Mary ? B S ? C John\S/Mary
? D John\S ? E John loves\S ? A S/walks ?
A Mary\S ? E
A ? John B ? John loves C ? John loves
Mary D ? loves E ? loves Mary A ?
Mary C ? Mary walks E ? walks
74
Emile 3.0 stages generalize into Context-free
rules
John\S/Mary ? D _____________________________ S
? John D Mary A ? John
(Chacteristic expression) A ?
Mary (Chacteristic expression) __________________
___________ S ? A D A
Grammar S ? A D A S ? B A S ? C S ? A E A ?
John Mary B ? A D C ? A D A A E D ?
loves E ? A D walks
75
Theorem (Adriaans 92)
  • If a language L has a context_free grammar
    Gis shallow is sampled according to the
    Universal Distributionand there is a
    member-check function availablethen then it can
    be learned efficiently from text
  • Assumptions Natural language is
    shallow Distributions of sentences in a text is
    simple

76
EMILE 3.0 (1992) Problems, not very practical
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction

77
EMILE 3.0 (1992) Problems
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction
  • Supervised, no text, speakers do not give
    negative examples

78
EMILE 3.0 (1992) Problems
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction
  • Supervised, no text, speakers do not give
    negative examples
  • Polynomial, but very complex due to overlapping
    clusters

79
EMILE 3.0 (1992) Only theoretical value
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction
  • Supervised, no text, speakers do not give
    negative examples
  • Polynomial, but very complex due to overlapping
    clusters
  • Batch oriented, not incremental

80
New theory necessary bootstrapping phenomena!
  • Lewis Caroll's famous poem Jabberwocky' starts
    with
  • 'Twas brillig, and the slithy toves
  • Did gyre and gimble in the wabe
  • All mimsy were the borogoves
  • and the mome raths outgrabe.

81
New theory necessary bootstrapping phenomena!
Heavy Low Frequency Tail
Structured High Frequency Core
82
New theory necessary bootstrapping phenomena!
  • An expression of a type T is characteristic for T
    if it only appears with contexts of type T
  • Similarly, a context of a type T is
    characteristic for T if it only appears with
    expressions of type T.
  • Let G be a grammar (context-free or otherwise) of
    a language L. G has context separability if each
    type of G has a characteristic context, and
    expression separability if each type of G has a
    characteristic expression.
  • Natural languages seem to be context- and
    expression-separable.
  • This is nothing but stating that languages can
    definine their own concepts internally (...is a
    noun, ...is a verb).

83
Natural languages are shallow
  • A class of languages C is shallow if for each
    language L it is possible to find a context- and
    expression-separable grammar G, and a set of
    sentences S inducing characteristic contexts and
    expressions for all the types of G, such that the
    size of S and the length of the sentences of S
    are logarithmic in the descriptive length of
    L(relative to C).
  • Seems to hold for natural languages ? Large
    dictionaries

84
EMILE 4.1 (2000) Vervoort
  • Unsupervised
  • Two dimensional clustering random search for
    maximized blocks in the matrix
  • Incremental thresholds for filling degree of
    blocks
  • Simple (but sloppy) rule induction using
    characteristic expressions

85
Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
86
Emile guaranteed to find types with right settings
  • Let T be a type with a characteristic context cch
    and a characteristic expression ech. Suppose that
    the maximum lengths for primary contexts and
    expressions are set to at least cch and ech
    and suppose that the total_support ,
    expression_support and context_support
    settings are all set to 100 . Let TltmaxC and
    TltmaxE be the sets of contexts and expressions of
    T that are small enough to be used as primary
    contexts and expressions. If EMILE is given a
    sample containing all combinations of contexts
    from TltmaxC and expressions from TltmaxE, then
    EMILE will find type T. (Vervoort 2000)

87
Original grammar
  • S ? NP V_i ADV
  • NP_a VP_a
  • NP_a V_s that S
  • NP ? NP_a
  • NP_p
  • VP_a ? V_t NP
  • V_t NP P NP_p
  • NP_a ? John Mary the man the child
  • NP_p ? the car the city the house the shop
  • P ? with near in from
  • V_i ? appears is seems looks
  • V_s ? thinks hopes tells says
  • V_t ? knows likes misses sees
  • ADV ? large small ugly beautiful

88
Learned Grammar after 100.000 examples
  • 0 ?17 6
  • 0 ?17 22 17 6
  • 0 ?17 22 17 22 17 22 17 6
  • 6 ? misses 17 likes 17 knows 17
    sees 17
  • 6 ?22 17 6
  • 6 ? appears 34 looks 34 is 34 seems
    34
  • 6 ?6 near 17 6 from 17 6 in 17
    6 ?6 with 17
  • 17 ? the child Mary the city the man
    John the car the house the shop
  • 22 ? tells that thinks that hopes that
    says that
  • 22 ?22 17 22
  • 34 ? small beautiful large ugly

89
Hypothesis Natural Languages are shallowWe can
learn them from text
C5103
Grammar Size
5107 sentences
Sample Size
90
Bible books
  • King James version
  • 31102 verses of 82935 lines
  • 4,8 Mb of English text
  • 001001 In the beginning God created the heaven
    and the earth.
  • 66 Experiments with increasing sample size
  • Initially Book Genesis, Book Exodus,
  • Full run 40 minutes, 500 Mb on Ultra-2 Sparc

91
Bible books
92
GI on the bible
  • 0 ? Thou shall not 582
  • 0 ? Neither shalt thou 582
  • 582 ? eat it
  • 582 ? kill .
  • 582 ? commit adultery .
  • 582 ? steal .
  • 582 ? bear false witness against thy neighbour
    .
  • 582 ? abhor an Edomite

93
Knowledge base in Bible
  • Dictionary Type 76
  • Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
    Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
    Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
    Kohath, Merari, Aaron, Amram, Mushi, Shimei,
    Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
    Zophah, Elpaal, Jehieli
  • Dictionary Type 362
  • plague, leprosy
  • Dictionary Type 414
  • Simeon, Judah, Dan, Naphtali, Gad, Asher,
    Issachar, Zebulun, Benjamin, Gershom
  • Dictionary Type 812
  • two, three, four
  • Dictionary Type 1056
  • priests, Levites, porters, singers, Nethinims
  • Dictionary Type 978
  • afraid, glad, smitten, subdued
  • Dictionary Type 2465
  • holy, rich, weak, prudent
  • Dictionary Type 3086
  • Egypt, Moab, Dumah, Tyre, Damascus
  • Dictionary Type 4082
  • heaven, Jerusalem

94
A simple partial grammar
  • 0 --gt 12 ?
  • 12 --gt Waar 23
  • 12 --gt Wie 46
  • 12 --gt Hoe 13
  • 12 --gt Wat 31
  • 23 --gt wonen jullie
  • 23 --gt woon jij
  • 23 --gt woont u
  • 23 --gt wasje
  • 23 --gt ben je geweest
  • 46 --gt kan 49
  • 46 --gt weet hoe laat de treinen uit Rotterdam
    vertrekken
  • 46 --gt heeft gewonnen
  • 49 --gt dat betalen
  • 49 --gt mij dat uitleggen
  • 49 --gt dat verklaren
  • 49 --gt voor dit verschijnsel een verklaring
    geven
  • 13 --gt heet jij
  • 13 --gt heet u
  • 13 --gt laat begint de les
  • 13 --gt lang doet de trein over de afstand
    Rotterdam-Den Haag
  • 13 --gt komt dat
  • 13 --gt vieren mensen eigenlijk feest
  • 31 --gt is 33
  • 31 --gt heb je gisteren gedaan
  • 31 --gt vind jij daarvan
  • 33 --gt jouw naam
  • 33 --gt uw naam
  • 33 --gt je leeftijd
  • 33 --gt jouw mening hierover
  • 33 --gt jouw opvatting over dit onderwerp

95
Problems
  • Results at first sight disappointing.
  • Conversion to meaningful syntactic type rarely
    observed.
  • Types seem to be semantic rather than syntactic.
  • Why?
  • Hypothesis distribution in real life text is
    semantic, not syntactic.
  • Semantic grammar is intermediate compression
    level between term algebra syntactic algebra.

96
Characteristic sample Semantic Learning
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
??
S
True
CG
97
Syntactic Learning Substitution salva
beneformatione
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
Sentence Noun Name
98
Semantic Learning Substitution salva veritate
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
True
Sentence Noun Name
99
Semantic Learning Substitution salva veritate
Tweety is_a bird Tweety is_a dog Tweety is_a
horse Tweety is_a mammal Fido is_a bird Fido
is_a dog Fido is_a horse Fido is_a mammal Ed
is_a bird Ed is_a dog Ed is_a horse Ed is a
mammal bird dog horse mammal Ed Fido Tweety
Compositionality Semantics Intermediate
Compression level
True False Ed Fido Tweety
Mammal Horse Dog Bird
Sentence Noun Name
100
Not a bug, but a feature semantic learning
  • Dictionary Type 362
  • plague, leprosy
  • Dictionary Type 1056
  • priests, Levites, porters, singers, Nethinims
  • Dictionary Type 978
  • afraid, glad, smitten, subdued
  • Dictionary Type 2465
  • holy, rich, weak, prudent

101
EMILE 4.1 Molecular Biology of the Cell, 3rd
edition
  • 2.5 megabytes of data (text only)
  • 60 of the text used
  • Number of different sentences read 5461
  • Number of different words 13396
  • Number of different contexts 896343
  • Number of different expressions 782123
  • Number of different grammatical types 67
  • Number of dictionary types 17
  • Potential problems (1) language is learned
    from redundancy
  • (2) high linguistic complexity
  • Potential solutions (1) more data
  • (2) provide seed grammars

102
EMILE 4.1 Molecular Biology of the Cell, 3rd
edition
  • Extracts cell information
  • 0 --gt Eucaryotic Cells 14
  • 14 --gt Contain Several Distinctive Organelles
  • 14 --gt Depend on Mitochondria for Their
    Oxidative Metabolism
  • 14 --gt Contain a Rich Array of Internal
    Membranes
  • 14 --gt Have a Cytoskeleton
  • Extracts chemical information
  • 0 --gt 1 .
  • 1 --gt B-OH ATP - 19 2
  • 19 --gt gt B-O-P ADP
  • 19 --gt gt B-O-P-P AMP
  • 0 --gt The 2
  • 2 --gt 35 Is Asymmetrical
  • 35 --gt DNA Replication Fork
  • 35 --gt Lipid Bilayer

103
EMILE 4.1 Molecular Biology of the Cell, 3rd
edition
  • Extracts biological information
  • 0 --gt 1 .
  • 1 --gt This 3
  • 3 --gt phenomenon is 37
  • 37 --gt known as gene conversion
  • 37 --gt called genomic imprinting
  • 0 --gt 64 in the Golgi Apparatus
  • 64 --gt Oligosaccharide Chains Are Processed
  • 64 --gt Proteoglycans Are Assembled
  • 0 --gt Mitochondria and Chloroplasts Contain
    66
  • 66 --gt Complete Genetic Systems
  • 66 --gt Tissue-specific Proteins

104
The underlying challenge Conclusion
  • Research Question Can we learn natural language
    efficiently from text?
  • How much text is needed. How much processing is
    needed.
  • Yes, under reasonable assumptions, but the type
    of text we need to do that is not available. We
    find semantic distributions, not syntactic
    distributed corpora. Child language acquisition
    may be a good domain to look at. Bootstrapping
    plays an important role.
  • Natural language is shallow, has context and
    expression separability. That is what makes it
    learnable.

105
Adaptive Information Disclosure Conclusion
  • The engineering challenge build a routine that
    can enrich an under-specified model A on the
    basis of a collection of ontologies B1,,Bn and
    a collection of corpora C1,,Ck and present the
    results in mode D
  • Research Question 1 Can we learn an ontology for
    a certain domain from scratch on the basis of a
    collection of documents describing that domain?
  • Probably, expression separability and context
    separability should do the job.
  • Research Question 2 Can we use grammar induction
    on a collection of documents describing a domain
    to formulate suggestions for the enrichment and
    adaptation of an ontology for this domain?
  • Yes!

106
Cochlear inplants Research Questions
  • Research Question 1 Can we make a formal model
    of language development of young children that
    allows us to understand
  • Why the process is efficient?
  • Why the process is discontinuous?
  • Promising!
  • Research Question 2 Can this formal model be
    used to develop diagnostic tests for
    (conginetally deaf) children with language
    problems?
  • Maybe.

107
Language acquisition Phases Hybrid learning
  • Phase 1 (0-9) months Linking acoustics and
    events
  • babling Model DFA, learning strategy
    evidence based state merging
  • Phase 2 (9-24) months Children categorize words
    into word classes and show evidence of early
    sensitivity to syntax
  • wordclasses, Markers Model some complex
    interaction between deixis and babbling, learning
    strategy semantic learning
  • Phase 3 (2-3,5) years Language meaning and
    syntax structure is acquired
  • recursive rules Model context-free grammar,
    learning strategy Seginer, EMILE

108
Study of Benign Distributions
109
Furhter work
  • Better understanding of Semantic Learning
  • Incremental Learning with background knowledge
  • Learning partial ontologies
  • Integrating ontologies in grammar induction
    process
  • Pieter Adriaans pietera_at_science.uva.nl
  • WEB http//turing.wins.uva.nl/pietera/ALS/
Write a Comment
User Comments (0)
About PowerShow.com