Theoretical Corpus Analysis Four examples - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Theoretical Corpus Analysis Four examples

Description:

The boy who is smoking is crazy. Is the boy who __ smoking is crazy? ... crazy. is. Structural Dependency. 7. No need for positive evidence ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 38
Provided by: brianmac
Category:

less

Transcript and Presenter's Notes

Title: Theoretical Corpus Analysis Four examples


1
Theoretical Corpus AnalysisFour examples
  • Brian MacWhinney
  • CMU

2
Example 1 Structural Dependency
  • Noisy input
  • Incomplete input
  • Totally absent input
  • Feedback is unreliable
  • Even if it is reliable, children ignore it

3
Two forms of the problem
  • Strong Form
  • inviolable constraints, error-free learning
  • Weak Form
  • violable constraints, recovery from error

4
Weak form deals with recoveryStrong form avoids
this
5
Structural Dependency
  • The man who is first in line is coming.
  • Is the man who __ first in line is coming?
  • Is the man who is first in line ___ coming?
  • This constraint is non-parameterized, inviolable
  • Learning is said to be error-free.
  • No recovery is needed.

6
Structural Dependency
The boy who is smoking is crazy.
7
No need for positive evidence
  • Chomsky, 1980 A person might go through much or
    all of his life without ever having been exposed
    to relevant evidence, but he will nevertheless
    unerringly employ the structure-dependent
    generalization, on the first relevant occasion.
  • Hornstein and Lightfoot, 1986 People attain
    knowledge of the structure of their language for
    which no evidence is available in the data to
    which they are exposed as children.
  • Crain, 1991 ...every child comes to know facts
    about the language for which there is no decisive
    evidence from the environment. In some cases,
    there appears to be no evidence at all.

8
Emergentist solution
  • Item-based learning for linking of aux/tense to
    main verb
  • Learning based on evidence from main clauses
    (Lightfoot, degree-0 learnability)
  • Learning on positive instances
  • No evidence of contrary movement pattern ever
    occurs, so competition favors basic item-based
    pattern

9
Two searches
  • Pullum and Sholz (2002) Nearly 1 of the Wall
    Street Journal corpus in the Penn Treebank
    consists of relevant positive exemplars
  • Lewis and Elman (2000) Searched the CHILDES
    database and found two relevant examples.
    (CHILDES has about 3 million utterances.)

10
A new search
  • Complete morphological tagging of all English
    corpora from normally developing children (mor
    1 .cha)
  • Training of the POSTTRAIN database file on the
    Eve corpus to reach 95 accuracy level
  • Running POST on the complete corpus (post 1
    s0 tengtags.cut .cha)

11
KWAL for the relevant patterns
  • aux/cop rel aux/cop
  • is/can/would who/that/what is/can/would
  • Test file
  • CHI is the boy who is next in line tall ?
  • mor vbe3S detthe nboy prowhwho vbe3S
    adjnext prepin nline adjtall ?
  • CHI does the boy who does the dishes run ?
  • mor vauxdo3s detthe nboy prowhwho
    vauxdo3s detthe ndish-PL vrun ?
  • CHI can the boy that can run walk ?
  • mor vauxcan detthe nboy prodemthat
    vauxcan vrun nwalk ?
  • RESULT - NONE

12
A broader search
  • Any initial aux with any embedded clause
    aux/cop rel
  • Checked for mistagged that as prodem

13
7 close matches
  • Brown - "adam20.cha" Line 3282
  • MOT are you the kind of nut that a squirrel
    likes?
  • Hallwhitework "stl.cha" Line 15497
  • FAT does he know who you are?
  • Hallwhitepro "gat.cha" Line 2889
  • MOT isn't that the little boy who a few months
    ago saw your lunch box and liked it?
  • Hall blackwork "roj.cha" Line 8387
  • EXP is that where you're from?
  • Hall blackwork "chj.cha" Line 848
  • MOT is that the person you were talking about?
  • Bates Snack 28 "ivy.cha" Line 175
  • MOT can you tell them how old you are?
  • Bates Snack 28 "rick.cha" Line 118
  • MOT can you show them what you are eating?

14
Two closer matches (spotted by Lewis and Elman)
  • Brown adam30.cha Line 2130
  • MOT Is the ball you were speaking of in the box
    with the bowling pin?
  • Korman st11.cha Line 386
  • MOT Wheres this little boy whos full of
    smiles?

15
What did we learn?
  • Parents dont provide input, but children dont
    say anything close to this either
  • Probably this is learned during adolescence
  • However, there is plenty of positive input
    demonstrating that the moved auxiliary or tense
    marker comes from the main clause

16
But isnt that Chomskys point?
  • If child must process structure to get it right,
    then structure is innate
  • But what level of structure is innate?
  • Minimum needed
  • main verb
  • aux that codes tense of verb
  • item-based relation
  • Pairs and nested pairs rather than abstract trees

17
Item-based patterns
  • MacWhinney (1976, 1978, 1982, 1987)
  • is (pres, 3s, inter, init) --- X (action,
    state)
  • can (pres, inter, init) -- X (action, state)
  • have (pres, -3s, inter, init) --- X (action,
    state)
  • -----
  • featural pattern
  • pres, inter, init --- V

18
Verb Island Constructions at About 2 Years of
Age - pictures from Tomasello
19
Superimposition Schematization
The dog eats the bone.
The cat the fish.
A bird a ladybug.
This one that one.
Two pin two dogs.
20
Hes push ing it.
Hes kill
Hes pull
Hes show
Hes draw
Hes deed
21
Kemp 2002 -- Easy to Hard
  • The cow is jumping - The pig is jumping
  • The pig is jumping - Zibby is jumping
  • Zibby is jumping - The pig is jumping
  • The cow is niffing - The pig is niffing
  • The cow is niffing - Zibby is niffing
  • Zibby is niffing - The pig is niffing

22
Three mechanisms
  • ROTE Repeated use makes other uses sound
    unconventional
  • Child hears X hit Y many times, but never Y
    hit
  • As a result, X hit Y is strengthened and
    entrenched
  • ANALOGY Semantic subclasses of verbs
  • Child learns verb for causing direct motion
    (remove)
  • Child assumes it behaves like other verbs of the
    same type, i.e., as fixed transitive (e.g. bring,
    take, etc.)
  • COMPETITION Alternative forms block the
    extension of a verb to a construction
  • Child watches as adult tickles sibling.
  • Sibling says I cant stop laughing.
  • Child now expects sibling to say Dont laugh
    me.
  • Sibling says Dont make me laugh.

23
Aside Wh-raising from complex-NPs is not so
error-free
  • What am I cooking on a hot __?
  • Who did pictures of __ surprise you?
  • What do you think a picture of __ should be on my
    cake?
  • What is this a funny __?

24
Example 2 Attachment competition
  • The cop saw the thief with a revolver.
  • The cop saw the thief with a telescope.
  • The daughter of the colonel who bought the watch
    entered the room.
  • Competition Model claim strength of competitor
    is a function of cue validity
  • If listeners prefer an attachment that is less
    frequent, model is falsified

25
NP1
Usage
Preferences low high mid
and NP4
?
(Gibson Schütze, 1999)
N2
26
Statistics of training data (estimated from Brown
Corpus)
27
100
80
low
60
N3
PREFERENCE
40
high
N1
mid
20
N2
0
PREFERRED ATTACHMENT SITE
28
Example 3 How degenerate is the input?
  • Tagged Eve corpus with MOR and POST
  • Applied LC-Flex (Rosé and Lavie, 2001) and got
    47 accuracy.
  • Added statistical parser -- 60
  • Robustness methods -- 78.5
  • This compares to 85 for WSJ

29
Example 4 Growing Lexicon Model
  • Current network models assume a static lexicon,
    but childrens lexicons are growing
  • catastrophic interference problem
  • Some current network models make neurologically
    improbable assumptions.
  • Children learn from both direct experience
    (Juszczyk, Aslin, Saffran) and cooccurrence
  • Current models have no good links to
    morphological analysis

30
Components
  • FLO to strip CHILDES codes
  • FREQ to extract 300 most frequent words
  • WCD (word cooccurrence detector as in HAL) to
    acquire bigram cooccurrences
  • Compression to a constant dimensionality
  • GLM (growing lexicon model) for self-organizing
    map (SOM) with node insertion

31
DevLex - Farkas, Li, MacWhinney 2001/2002
32
Results
  • Confusions were within part-of-speech
  • mummy and daddy
  • wouldnt and didnt
  • car and truck
  • We disambiguated these using additional features
    from WordNet (courtesy Robert Harms)
  • Thus, the model requires both cooccurrence and
    perceptual feature information
  • Next, linking GLM to DisLex

33
DevLex
34
Lexical and semantic maps
35
Growing Hierarchical MapsDittenbach et al.
(2002) GHSOM
36
Syntax emerging from items
  • jump jumps jumped jumped jumping
  • run runs ran run running
  • pull pulls - - pulling
  • want wants - - wanting
  • bet bets - - betting
  • Model is given pull Past and sound pulled
  • -ed should be learned as the past tense
  • Its position should be learned from the example
  • The semantics of the head should also be learned

37
Summary
  • Four examples of theoretical corpus analysis
  • Structural dependency
  • Attachment competition
  • Parsability of parental input
  • Growing lexicon model
  • New Directions
  • Complete parsing of the database
  • New taggers for Spanish, Italian, Japanese,
    French, German
  • Conversion of the database to XML
Write a Comment
User Comments (0)
About PowerShow.com