Distributional learning - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Distributional learning

Description:

Collins Cobuild Dictionary. Scoring. Cluster analysis. Bench-mark. Cluster ... the opposite pattern is observed: better performance for lower frequency words. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 50
Provided by: ACE593
Category:

less

Transcript and Presenter's Notes

Title: Distributional learning


1
Distributional learning
  • Holger Diessel
  • University of Jena
  • holger.diessel_at_uni-jena.de
  • http//www.holger-diessel.de/

2
Semantic bootstrapping

Pinker (1984)
  • Grammatical categories such as nouns and verbs
    are part of our genetic endowment.
  • There is only one reliable cue in the ambient
    language that children of all languages can use
    to identify grammatical categories That is their
    meaning.
  • Structural cues (e.g. inflection) are not
    reliable because they are language-specific.

3
Semantic bootstrapping

Step 1 Children construct semantic word classes
based on the words they encounter in the
ambient language.
Step 2 The semantically defined word classes are
hooked up to the categories of UG.
Step 3 Once the connections are established,
children can use language-specific
structural properties to identify semantically
a-typical category members (e.g. deverbal
nouns).
4
Cues for category acquisition

Cues for grammatical category acquisition
  • Semantic cues (e.g. Gentner 1982 Pinker 1984)
  • Pragmatic cues (e.g. Bruner 1975)
  • Phonological cues (e.g. Kelly 1992)
  • Distributional cues (e.g. Maratsos Chalkley
    1980)

5
Maratsos Chalkley 1980

Nouns the __ , x-s Verbs will __, x-ing, x-ed
6
Theoretical arguments
The vast number of possible relationships that
might be included in a distributional analysis is
likely to overwhelm any distributional learning
mechanism in a combinatorial explosion. (Pinker
1984)
  • Distributional learning mechanisms do not search
    blindly for all possible relationships between
    linguistic items, i.e. the search is focused on
    specific distributional cues (Reddington et al.
    1998).

7
Theoretical arguments
The interesting properties of linguistic
categories are abstract and such abstract
properties cannot be detected in the input.
(Pinker 1984)
  • This assumption crucially relies on Pinkers
    particular view of grammar. If you take a
    construction grammar perspective, grammar (or
    syntax) is much more concrete (Redington et al.
    1998).

8
Theoretical arguments
Even if the child is able to determine certain
correlations between distributional regularities
and syntactic categories, this information is of
little use because there are so many different
cross-linguistic correlations that the child
wouldnt know which ones are relevant in his/her
language. (Pinker 1984)
  • Syntactic categories vary to some extent across
    languages (i.e. there are no fixed categories).
    Children recognize any distributional pattern
    regardless of the particular properties that
    categories in different languages may have
    (Redington et al. 1998)

9
Theoretical arguments
Spurious correlations will occur in the input
that will be misguiding. For instance, if the
child hears John eats meat. John eats
slowly. The meat is good. He may erroneously
infer The slowly is good is a possible English
sentence. (Pinker 1984)
  • Children do not learn categories from isolated
    examples (Redington et al. 1998).

10
Redington et al. 1998. Distributional
information A powerful cue for acquiring
syntactic categories. Cognitive Science 1998
425-469.
11
Redington et al. 1998

Steps of analysis
  • Measuring the distribution of contexts within
    which each word occurs.
  • Comparing the distributions of contexts for pairs
    of words.
  • Grouping together words with similar
    distributions of contexts.

12
Redington et al. 1998
All adult speakers of the CHILDES database (2.5
million words).
Bigram statistics Target words 1000 most
frequent words in the corpus Context words 150
most frequent words in the corpus
Context size 2 words preceding 2 words
following the target word x the __ of x in
the __ x x will have __ the x
13
Distributional learning
Distributional context 2 words preceding 2
words following the target word x the __ of
x in the __ x x I have __ x x
Bigram statistics
14
Distributional learning
Context 1 (the __ of)
Target w. 1 Target w. 2 Target w. 3 Target w. 4 Etc. 210 376 0 1
15
Distributional learning
Context 1 (the __ of) Context 2 (at the __ is)
Target w. 1 Target w. 2 Target w. 3 Target w. 4 Etc. 210 376 0 1 321 917 1 4
16
Distributional learning
Context 1 (the __ of) Context 2 (at the __ is) Context 3 (has __ him)
Target w. 1 Target w. 2 Target w. 3 Target w. 4 Etc. 210 376 0 1 321 917 1 4 2 1 1078 987
17
Distributional learning
Context 1 (the __ of) Context 2 (at the __ is) Context 3 (has __ him) Context 4 (He __ in)
Target w. 1 Target w. 2 Target w. 3 Target w. 4 Etc. 210 376 0 1 321 917 1 4 2 1 1078 987 0 5 1298 1398
18
Distributional learning
Context 1 (the __ of) Context 2 (at the __ is) Context 3 (has __ him) Context 4 (He __ in)
Target w. 1 Target w. 2 Target w. 3 Target w. 4 Etc. 210 376 0 1 321 917 1 4 2 1 1078 987 0 5 1298 1398
Context vectors Target word 1 210-321-2-0 Targe
t word 2 376-917-1-5 Target word
3 0-1-1078-1298 Target word 4 1-4-987-1398
19
Statistical analysis
  • Hierarchical cluster analysis over context
    vectors dendogram
  • Slicing of the denogram
  • Treatment of polysemous words

20
Statistical analysis
Slicing of the dengogram
21

Cluster analysis
Pronouns, auxiliaries (49) Question words,
pronouns-auxiliaries (53) Verb (105) Verb
(62) Verb, present PTC (50) Determiner,
possessive pronoun (29) Conjunction,
interjection, proper noun (91) Proper noun
(91) Preposition (33) Noun (317) Adjective
(92) Proper noun (10)
Dendogram
22
Benchmark
Category Example N
Noun Adjective Numeral Verb Article Pronoun Adverb Preposition Conjunction Interjection Contractions Truck, card, hand Little, favorite, white Two, ten, three could, hope, empty The, a You, whose, more Rather, always, softly In, around, between Because, while, and Oh, huh, wow Ill, cant, theres 407 81 10 239 3 52 60 21 9 16 58
Collins Cobuild Dictionary
23
Scoring
Cluster analysis
Bench-mark
Accuracy
Cluster analysis
Bench-mark
Complete-ness
24

Exp. 1 Context size
Result Local contexts have the strongest effect,
notably the word immediately preceding the target
word is important.
"Learners might be innately biased towards
considering only these local contexts, whether as
a result of limited processing abilities (e.g.
Elman 1993) or as a result of language specific
representational bias." (Redington et al. 1998)
25

Exp. 2 Number of target words

Level of accuracy
Number of target words
Distributional learning is most efficient for
high frequency open class words.
26

Exp. 3 Category type
Result nouns lt verbs lt function words
Although content words are typically much less
frequent, their context is relatively predictable
Because there are many more content words, the
context of function words will be relatively
amorphous." (Redington et al. 1998)
27

Exp. 4 Corpus size

Level of accuracy
Number of words
28

Exp. 5 Utterance boundaries
Result Including information about utterance
boundaries did not improve the level of
accurarcy.
29

Exp. 6 Frequency vs. occurrence
Frequency vectors were replaced by occurrence
vectors Frequency vector Occurrence
vector 27-0-12-0-0-12-2 1-0-1-0-0-1-1 0-213-2-1
-45-3-0 0-1-1-1-1-1-0
Result The cluster analysis still revealed
significant clusters, but performance was much
better when frequency information was included.
30

Exp. 7 Removing function words
Early child language includes very few function
words. Thus, Redington et al. removed all
function words from the context and repeated the
cluster analysis without function words.
Result The results decreased but were still
significant.
31

Exp. 8 Knowledge of word classes
The cluster analyses were performed over the
distribution of individual items. It is
conceivable that the child recognizes at some
point discrete syntactic categories (e.g. nouns),
which may facilitate the categorization task.
Result Representing particular word classes
through discrete category labels (e.g. N), does
not improve the categorization of other
categories (e.g. V).
32
Mintz, Toben, Elissa L. Newport, and Thomas
Bever. 2002. The distributional structure of
grammatical categories in speech to young
children. Cognitive Science 26 393-424.
33
Mintz et al. 2002. Cognitive Science
(1) The man in the yellow car (2) She has
not yet been to NY.
1. Information about phrasal boundaries improves
performance. 2. Local contexts have the
strongest effect (cf. Redington et al.
1998). 3. The results for Ns are better than the
results for Vs (cf. Redington et al. 1998).
34
Monaghan, Padraic, Nick Chater, and Morton
Christiansen. 2005. The differential role of
phonological and distributional cues in
grammatical categorization. Cognition 96 143-182.
35
Monaghan et al. 2005. Cognition
(1) Nouns vs. verbs (2) Open class vs. closed
class.
1. Distributional information 2. Phonological
information
36

Monaghan et al. 2005. Cognition
  • Length Open class words are longer than
    closed class words
  • Stress Closed class words usually do not
    carry stress
  • Stress Nouns tend to be more often trochaic
    than verbs (i.e. verbs are often iambic)
  • Consonants Closed class words have fewer
    consonant cluster
  • Reduced vowels Closed class words include a
    higher proportion of reduced vowels than
    open class words

37

Monaghan et al. 2005. Cognition
  • Interdentals Closed class words are more likely
    to begin with an interdental fricative than
    open class words
  • Nasals Nouns are more likely than verbs to
    include nasals
  • Final voicing Nouns are more likely than verbs to
    end in a voiced consonant
  • Vowel position Nouns tend to include more back
    vowels than verbs
  • Vowel height The vowels of verbs tend to be
    higher than the vowels of verbs

38

Monaghan et al. 2005. Cognition
For high-frequency items, distributional
information is extremely useful, but drops off
dramatically for lower frequency items. For the
phonological cues, the opposite pattern is
observed better performance for lower frequency
words. (168)
39

Monaghan et al. 2005. Cognition
Phonological features do not just reinforce
distributional information, but seem to be
especially powerful in domains in which
distributional information is not so easily
available.
  1. Distributional information is especially useful
    for categorization of high frequency open class
    words.
  2. Phonological information is more useful for
    catego-rization of low frequency open class words
    (Zipf 1935).
  3. Phonological information is also useful for the
    distinction between open and closed class words.

40
Distributional learning
We found confirmation for our hypothesis that
phonological and distributional information
contributed differentially towards
categorization. At points where distributional
information was better for classificationthe
high frequency itemsphonological cues were found
to be of less value. Conversely, for the
lower-frequency items, where distributional
information was less useful, phonological
information contributed towards more accurate
classification.
41
Distributional learning
But are children able to detect and compute the
distributional information that is available in
the ambient language?
42
Saffran et al. 1996
Nonce words tupiro golabu bidaku padoti
Subjects 8 months-old infants
43
Saffran et al. 1996
tupiro bidaku padoti bidaku golabu
44
Saffran et al. 1996
Condition1 tupiro-bidaku- Condition
2 da-pi-ku-ro-tu-
45
Head-turn procedure
light auditory stimulus
green light
46
Saffran et al. 1996
47
Saffran et al. 1996
tu-pi-ro bi-da-ku padoti bidaku golabu
100
25
transitional probabilities
48
Saffran et al. 1996
Condition 1 100-100-25-100-100-25
Condition 2 8.3-8.3-8.3-8.3-8.3
49
Saffran et al. 1996
the existence of computational abilities that
extract structure so rapidly suggests that it is
premature to assert a priori how much of the
striking knowledge base of human infants is
primarily a result of experience-independent
mechanisms. In particular, some aspects of early
development may turn out to be best characterized
as resulting from innately biased statistical
leaning mechanisms rather than innate knowledge.
If this is the case, then the massive amount of
experience gathered by infants during the first
postnatal year may play a far greater role in
development than has previously been recognized.
Write a Comment
User Comments (0)
About PowerShow.com