Gambel - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Gambel

Description:

Establish the role of domain-specific (prosodic) constraints ... TP with Prosodic Constraints ... Prosodic constraints help. Universal prosodic constraint is ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 12
Provided by: virgini7
Category:
Tags: gambel | prosodic

less

Transcript and Presenter's Notes

Title: Gambel


1
GambelYangs Mechanisms and Constraints in Word
Segmentation
  • Virginia Savova
  • Statistical Language Learning Reading Group,
  • MIT 2005

2
Goals
  • Evaluate statistical word segmentation strategies
  • Establish the role of domain-specific (prosodic)
    constraints
  • Propose an algebraic strategy based on a
    domain-specific but language-independent
    constraint

3
Critique of existing statistical word
segmentation algorithms
  • Off-line, global optimization approaches
  • overestimate learners computational capabilities
  • Start from phonemes, rather than syllables
  • underestimate what was learned on previous stages
  • Discussion
  • To what extent can off-line methods be converted
    to online
  • Does assuming discrete learning stages make
    sense?

4
Corpus
CHILDES
PHONEME STRESS
SYLLABLES
DICTIONARY
MAX ONSET PHONOTACTICS
Test and train on the same corpus
5
Local Minima of Transitional Probabilities
  • Formalization of Saffran et al.s hypothesized
    learning mechanism

TPab P(a b) P(a)
  • Local minima Word Boundaries
  • TPX-1gtTP X lt TPX1

6
Local Minima of Transitional Probabilities
  • Results
  • 23.3 recall
  • 41.6 precision
  • Problem monosyllables
  • Discussion
  • amend LM criteria?

Example monosyllable TP profile
7
Swingleys TP algorithm
  • Probability of syllable, bigram, trigram
  • Mutual information Transitional probability

Ixy log2 P(x y) P(x)P(y)
Ps/b/t F(s/b/t) n
  • Fixed threshold ?
  • If Ps/b/t gt ? then s/b/t is a word.
  • If Ixy gt ? then xy is a word.

8
Swingleys TP algorithm
  • Off-line
  • Combines two measures in an arbitrary way
  • Arbitrary cutoff at trigram
  • Results
  • high precision overall (?)
  • low 3-syllable precision (23)
  • low recall (27)
  • ? TPs alone dont work

9
TP with Prosodic Constraints
  • Unique Stress Constraint a word bears at most
    one (primary) stress
  • universal (language-independent) constraint
  • Strong (stressed) and weak (unstressed)
  • (a) CONflict versus (to) conFLICT
  • TPUSC
  • S1S2 ? boundary
  • S1W1...nS2 ? pairwise TP boundary (?)
  • Results for TP USC
  • 73.5 precision
  • 71.2 recall

10
Algebraic learning with the Unique Stress
Constraint
  • S1S2 ? boundary
  • S1W1..i-1 ...Wj1..nS2? Wi...Wj is a word
  • Add words to lexicon
  • S1W1...nS2? split at random
  • Results
  • 95.9 precision
  • 93.4 recall

11
Conclusions
  • Local minima of TPs alone dont work
  • Prosodic constraints help
  • Universal prosodic constraint is sufficient
  • Statistical learning is not necessary, algebraic
    does better
Write a Comment
User Comments (0)
About PowerShow.com