Statistical Syllabification with Supervised and Unsupervised Algorithms - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Statistical Syllabification with Supervised and Unsupervised Algorithms

Description:

A syllable is a unit of spoken language larger than a phoneme, smaller ... CMU Pronouncing Dictionary. Training corpus: 123,777 words. Test corpus: 3,007 words ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 17
Provided by: wwwsc
Learn more at: http://www-scf.usc.edu
Category:

less

Transcript and Presenter's Notes

Title: Statistical Syllabification with Supervised and Unsupervised Algorithms


1
Statistical Syllabification with Supervised and
Unsupervised Algorithms
  • Shankar Ananthakrishnan
  • CS562 Term Project
  • December 2, 2004

2
Definitions
  • A syllable is a unit of spoken language larger
    than a phoneme, smaller than a word
  • Consists of three parts
  • nucleus main sonorant vowel/diphthong
  • onset set of consonants preceding nucleus
  • coda set of consonants succeeding nucleus
  • Onset and coda are optional
  • Example alert ? ah l er t ? (ah) (l er t)
  • Two syllables (ah) and (l er t)
  • (ah) has only nucleus, no onset or coda
  • (l er t) has onset, nucleus and coda

3
Usefulness
  • Syllables capture higher level contextual
    information about speech than phonemes
  • Can be used as acoustic units in ASR
  • Sethy et. al. Syllable-based recognition of
    spoken names, ICSA PMLA Workshop, 2002
  • Syllable-like units can better model
    co-articulatory dependencies
  • Useful for natural speech synthesis
  • Syllabic stress is an important
    linguistic-prosodic feature
  • Word sense disambiguation, POS tagging

4
Syllabification
  • Basic task given phoneme sequence, find the
    correct syllable bracketing
  • (ah) (l er t) is correct but (ah l) (er t) is not
  • How? Based on phonotactic rules of the language
  • Can a machine learn these rules?
  • Sure, just code them right in (NIST tsylb2)
  • Downside need deep knowledge of phonology
  • Or, learn them in a statistical fashion from
    large corpora (no linguistics, generalizes across
    languages)
  • Labeled corpora supervised learning (ML)
  • Unlabeled corpora unsupervised learning (EM)

5
Framework
  • Every vowel is the nucleus of a syllable
  • Noisy-channel model
  • Source generates nuclei (vowels/diphthongs)
  • Channel transduces nuclei to syllables
  • Source output (vowel sequence) is given
  • Model parameters channel probabilities p(SN)

p(N)
p(SN)
ah er
(ah) (l er t)
Source
Channel
6
Channel Model
  • p(SN) p(O1,C1,O2,C2,,On,Cn N1, N2,, Nn)
  • Complete channel model
  • Too many parameters
  • Extremely complex, intractable
  • Simplification assume short term dependencies
    for onset, nuclei, and coda
  • p(SN) p(O1,C1N1,N2) p(O2,C2N1,N2 ,N3)
    p(O3,C3N2,N3 ,N4) p(On,CnNn-1,Nn)
  • Model is still complex
  • Use graphical techniques to simplify

7
Graphical Model
Nn-1
Nn
Nn1
Cn-1
On
Cn
On1
  • p(On, Cn, Nn-1, Nn, Nn1)
  • Millions of parameters!
  • Impossible to estimate from real data
  • Graphical representation suggests good ways to
    prune model

8
Graphical Model
Nn-1
Nn
Nn1
Cn-1
On
Cn
On1
  • Assume
  • Nuclei generated independently of one another
  • Coda independent of preceding nucleus
  • Onset independent of succeeding nucleus
  • Onset, coda conditionally independent given
    nucleus

9
Graphical Model
Nn-1
Nn
Nn1
Cn-1
On
Cn
On1
  • p(On, Cn Nn-1, Nn, Nn1) p(On Nn-1, Nn)
    p(Cn Nn, Nn1)
  • Manageable set of parameters (w/smoothing)

10
Parameter Estimation
  • Supervised learning
  • Training data has syllable alignments
  • Maximum likelihood count-and-divide
  • Unsupervised learning
  • No training alignments EM!
  • Similar to Eng-Jap phoneme alignment problem
  • Initially assume all alignments are equiprobable
  • Iteratively update channel parameters to maximize
    likelihood of observed training corpus

11
Smoothing
  • Parameters are of the form p(On Nn-1, Nn)
  • Sparsity parameters may not cover test corpus
  • Simple interpolation
  • p(On Nn-1, Nn) a1 p(On Nn-1, Nn)
    a2 p(On Nn)
    a3 p(onset Nn)
  • a1 a2 a3 1.0
  • Similar smoothing for coda parameters

12
Data / Tools
  • CMU Pronouncing Dictionary
  • Training corpus 123,777 words
  • Test corpus 3,007 words
  • Held-out data 2,891 words (to be used later)
  • Syllabification tool NIST tsylb2
  • Deterministic rule-based syllabifier
  • Daniel Kahn, Syllable-based Generalizations in
    English Phonology, Ph.D thesis, University of
    Massachusetts, 1976.

13
Evaluation
  • Baseline to compare against
  • Alignment based on maximal onset principle
  • Syllables prefer a maximum number of consonants
    in their onset and a minimum number in their coda
  • t r ae n s f er ih ng ? (t r ae) (n s f er) (ih
    ng)
  • Results reported in terms of
  • Word accuracy how many words did the
    algorithm(s) align correctly?
  • Syllable accuracy how many syllables overall did
    the algorithm(s) correctly identify?

14
Results
Table 1. Syllabification accuracy for various
algorithms
15
EM Unexpected Results
16
To-do List
  • Supervised learning
  • Use EM to optimize smoothing parameters on
    held-out data (possible!)
  • Unsupervised learning
  • Why is there a paradox in EM syllabification?
  • Figure out a way to lump smoothing parameters
    along with the rest and jointly optimize w/EM
    (possible?)
  • Overall goal
  • Compare all techniques (including rule-based) on
    human-labeled corpus (need data!)
Write a Comment
User Comments (0)
About PowerShow.com