Towards a solution for the sharing of phonological data presentation

About This Presentation

Transcript and Presenter's Notes

Title: Towards a solution for the sharing of phonological data

1
Towards a solution for the sharing of
phonological data

Yvan Rose
Memorial University of Newfoundland
Brian MacWhinney
Carnegie Mellon University

2
Map of presentation

Context no specialized tool to facilitate
research in phonological development
A preliminary attempt ChildPhon
A more promising solution Phon
Current state of the Phon project
Developments in foreseeable future
Potential
Publicly-available cross-linguistic database
Proposal

3
Context (until recently)

CHILDES tools (focus on CLAN)
Number of tools for multimedia data storage and
analysis
Mostly deals with morphological and syntactic
aspects of development
Not easily extensible
What about phonology?
No CHILDES tool adapted for phonology
Data sharing and broad-based investigations are
challenging

4
A first attempt

ChildPhon (Rose 2003)
Analytical (relational) database for child
language data
Designed within FileMaker Pro
Main features
Interface for double-blind transcriptions
Automatic functions based on phonetic
transcriptions
Syllabification of transcribed forms
Detection of common processes observed in child
language (e.g. onset cluster reduction)

5
Problems with ChildPhon

No support for Unicode fonts ??no X-platform
compatibility (Macintosh-only)
Not compatible with CHILDES / TalkBank??no data
exchange functions
Automatic parses limited, not customizable
Multimedia capabilities are minimal (at best)
Requires use of proprietary software and font
Algorithms are destructive
Statistical functions are minimal
No web implementation
In sum Good idea -- Bad implementation

6
Phon a more promising solution

Interdisciplinary project (First of its kind
between Linguistics and Computer Science at
Memorial University of Newfoundland)
Software designers and programmersRodrigue
Byrne, Gregory Hedlund, Philip O'Brien, Yvan
Rose, Harold Wareham
Financial Support
Faculty of Arts, Memorial University
Social Sciences and Humanities Research Council
of Canada (SSHRC)
Canada Fund for Innovation (CFI)
National Science Foundation (NSF)

7
Phon Overview

Software underpinnings
Programmed in Java, Unicode font encoding
Cross-platform compatible (Mac, Windows, )
XML data storage structure
Compatible with TalkBank schema
User management system
Extended multimedia capabilities
More flexible automatic algorithms
Specialized query language
Offers a complete solution for data sharing

8
Phon usability

Intuitive graphical user interface
Helpful wizards (e.g. project creation, queries)
Record navigator
Custom selection of data fields
General / record-by-record
Intuitive query language
Standard terminology
Built-in queries (modifiable by user)
Query memorization and saving

9
Phon main functions

User management
Media segmentation
Phonetic transcription
Transcription merging (Selection of final
transcriptions for analysis)
Phrase segmentation and alignment (Further
segmentation according to research needs)
Syllable alignment (Alignment of syllables of
target and actual forms)
Database query

10
User management

Secure login
User tasks / privilegesmanagement

11
Media segmentation

Generally similar to CLAN
Hit the space bar to define a speech segment
Default segment length user-defined
Useful for working on small speech segments
Segment editing
Change numerical value
Stretch the time segment by sliding pointer

Yvan Rose Replace yellow line in segment
timebar by waveform.
12
Transcription general interface
13
Transcription

Built-in IPA character map
Symbol categories
Access tosound segment
Interface for double-blind transcriptions
Tied with user management functions

Yvan Rose Link adulttranscription to an
electronic IPA dictionary. Need to develop a
transcription system for sounds that cant be
transcribed easily. Ability to assign a
feature set to a dummy character Ability to use
the forward slash bar to assign two competing
symbols to a given sound (e.g. p/b would imply
that voicing cannot be transcribed accurately
the alternants will be considered as one
consonant by the syllabifier and query
interpreter.
14
Transcription merging

Comparison of competing transcriptions
Direct access to media segment
Selection of most accurate transcription
Further refinement of selected transcription

Yvan Rose People an algorithm that would enable
a comparison of transcriptionsbased on specific
parameters (e.g. voicing). This algorithm could
build on the feature sets associated with each
segment transcribed.
15
Phrase alignment

Further segmentation of the utterances
Useful for researchon phonologicaldomains
A simple mouse click sets and resetsthe domain
boundaries

Yvan Rose Several people requested different
levels of segmentation. This includes
morpho-syntacticlevels of segmentation, as well
as various levels of the prosodic
hierarchy.Also add PLAY button in the interface
of this module
16
Syllabification algorithm

Syllabification algorithm
Refined labeling of each syllabic position
Each label is a valid object for query

?
?
R
R
O
O
N
N
17
Syllabification algorithm

Parameters of syllabification areuser-definable

Timing tier
Syllable constituents
Yvan Rose The parameters will be revised
thoroughly. To add (among others) word-final
codas, list of exceptional clusters.Also add, to
complement stress attraction, an option of
ambisyllabic syllabification of
intervocalicconsonants in Strong-Weak syllable
juncture. In addition to this, we also need a
way to manually assign a syllabification to each
consonant whichcannot be accounted for by the
automatic algorithm.
18
Syllable alignment

Automatic alignment of syllables
Manual modifications

19
Query language

Quick and accurate queries on large amounts of
data
Language features
Uses terms familiar to phonologists to compose
queries
Syllable constituents onset, nucleus,
Stressed vs. unstressed syllables
Custom predicates
History of recent queries
Ability to save queries

20
Query language components

Selectors (e.g. Onset(Syllable x))
Predicates (e.g. Branching(Onset(Syllable x))
Boolean connectives
Example

let corpusName "TestCorpus", let corpus
Corpus(corpusName), let records
Records(corpus) foreach r in records foreach
p in Phrases(r) foreach s in
Syllables(p) Branching(Onset(TargetS
yllable(s))) AND NOT
Branching(Onset(ActualSyllable(s)))
21
Query tree structure

Branching onset reduction in 2nd syllable

Record
TargetPhrase
ActualPhrase
Syllable
Syllable
Syllable
Syllable
Rhyme
Rhyme
Rhyme
Rhyme
Nucleus
Nucleus
Nucleus
Nucleus
Onset
Onset
Onset
Onset
Coda
Coda
T
U
N
D
R
A
S
D
U
N
D
A
S
TRUE
FALSE
AND NOT
branching(
)
pos( , 2)
onset( )
TargetPhrase
MATCH
AND NOT
ActualPhrase
pos( , 2)
onset( )
branching(
)
22
Query results

View in application
Use to generate textual reports
Recording session (e.g. to exemplify a given
process)
Time slice (e.g. to exemplify a stage of
acquisition)
Entire database (to exemplify a learning curve)
Export
As Unicode file
As ASCII file (modulo font conversion
limitations)

23
Enhancements (short term)

Improvement of syllable alignment algorithm
(building on Kondraks 2003 algorithm)
Import function
ChildPhon files (including font translator
--almost done!)
CHAT files
Incorporation user-defined fields
Incorporation of statistical functions
Chart report generator
Ability to select various chart formats
Bar graphs (for proportions within and across
sessions)
Line graphs (for learning curves)

24
Enhancements (longer term)

Interoperability with Praat
Export to Praat (similar to CLAN function)
Interface to accommodate acoustic measurement
data
Web-based interface
Data sharing at a distance
Easy query of corpora on CHILDES database
Further automation
Automatic detection of pre-identified processes

Yvan Rose Include function to extract phonetic
inventories per session/stage/Get examples of
canned analyses in literature on clinical
phonology.
25
Development timeline

End of fall of 2004
Completion of current development phase
Release of testing (Beta) version
Winter of 2005
Bug fixes
Improvement of functionality and user interface
(including short-term enhancements)
Website creation (http//www.phon.ca/)
Completion of technical documentation
Notes to programmers
User guide
Summer of 2005
Release of ? Phon 1.0 as open-source freeware

26
Potential

Standard for data sharing
Large-scale investigations
Cross-linguistic investigations
Enhancement to CHILDES
Elaboration of a database fulfilling the needs of
acquisitionists focussing on phonology and
related issues
Investigation of interface issues (e.g. between
morpho-syntax and phonology)

27
How to realize this potential

Team of researchers specializing in
Early acquisition (including babbling)
Segmental development
Prosodic development
Phonological disorders
Second language acquisition
Feedback on software development project
Data contribution
Existing corpora in digital format
Conversion of printed corpora
Identification of corpora (printed, with or
without audio files)
Setting of conventions for data conversion

28
Our proposal

Constitution of a research team to develop a
phonological component of CHILDES
Database
Supporting software
Elaboration, with the research team, of a grant
application to support
Database elaboration
Software development
Periodical meetings
Workshops

29
Concretely

Feedback on software project
Software needs for various types of research Let
us know what you need
Implementation Let us know how you want it to
work
Contribution to grant application
Kinds of research would the new database
enable Let us know what you would like to do
Impacts of this research (e.g. theoretical,
clinical, )
Supporting letters
Contribution to the public database
Sharing of existing / future corpora
Establishment of conventions to format older
corpora

30
Special thanks

The Phon team at Memorial
Rodrigue Byrne
Harold Wareham
Gregory Hedlund
Philip OBrien
For his great help with the TalkBank XML schema
Franklin Chen (Carnegie Mellon University)
For their useful feedback on an early version of
this software
Heather Goad (McGill), Paula Fikkert (Nijmegen),
Clara Levelt (Leiden), Katherine Demuth (Brown),
Mark Johnson (Brown), Carrie Dyck (Memorial),
Phil Branigan (Memorial), Brian MacWhinney
(Carnegie Mellon), Bryan Gick (UBC), Sophie
Wauquier-Gravelines (Nantes), Sharon Inkelas (UC
Berkeley), Conxita Lleó, Sonia Frota (Lisbon),
Maria João Freitas (Lisbon), Ronald Sprouse (UC
Berkeley), Joe Pater (UMass, Amherst), John
Archibald (Calgary), Éliane Lebel (Memorial)
hoping that no one was forgotten

Write a Comment

User Comments (0)

About PowerShow.com

Towards a solution for the sharing of phonological data PowerPoint PPT Presentation