Towards a solution for the sharing of phonological data PowerPoint PPT Presentation

presentation player overlay
1 / 30
About This Presentation
Transcript and Presenter's Notes

Title: Towards a solution for the sharing of phonological data


1
Towards a solution for the sharing of
phonological data
  • Yvan Rose
  • Memorial University of Newfoundland
  • Brian MacWhinney
  • Carnegie Mellon University

2
Map of presentation
  • Context no specialized tool to facilitate
    research in phonological development
  • A preliminary attempt ChildPhon
  • A more promising solution Phon
  • Current state of the Phon project
  • Developments in foreseeable future
  • Potential
  • Publicly-available cross-linguistic database
  • Proposal

3
Context (until recently)
  • CHILDES tools (focus on CLAN)
  • Number of tools for multimedia data storage and
    analysis
  • Mostly deals with morphological and syntactic
    aspects of development
  • Not easily extensible
  • What about phonology?
  • No CHILDES tool adapted for phonology
  • Data sharing and broad-based investigations are
    challenging

4
A first attempt
  • ChildPhon (Rose 2003)
  • Analytical (relational) database for child
    language data
  • Designed within FileMaker Pro
  • Main features
  • Interface for double-blind transcriptions
  • Automatic functions based on phonetic
    transcriptions
  • Syllabification of transcribed forms
  • Detection of common processes observed in child
    language (e.g. onset cluster reduction)

5
Problems with ChildPhon
  • No support for Unicode fonts ??no X-platform
    compatibility (Macintosh-only)
  • Not compatible with CHILDES / TalkBank??no data
    exchange functions
  • Automatic parses limited, not customizable
  • Multimedia capabilities are minimal (at best)
  • Requires use of proprietary software and font
  • Algorithms are destructive
  • Statistical functions are minimal
  • No web implementation
  • In sum Good idea -- Bad implementation

6
Phon a more promising solution
  • Interdisciplinary project (First of its kind
    between Linguistics and Computer Science at
    Memorial University of Newfoundland)
  • Software designers and programmersRodrigue
    Byrne, Gregory Hedlund, Philip O'Brien, Yvan
    Rose, Harold Wareham
  • Financial Support
  • Faculty of Arts, Memorial University
  • Social Sciences and Humanities Research Council
    of Canada (SSHRC)
  • Canada Fund for Innovation (CFI)
  • National Science Foundation (NSF)

7
Phon Overview
  • Software underpinnings
  • Programmed in Java, Unicode font encoding
  • Cross-platform compatible (Mac, Windows, )
  • XML data storage structure
  • Compatible with TalkBank schema
  • User management system
  • Extended multimedia capabilities
  • More flexible automatic algorithms
  • Specialized query language
  • Offers a complete solution for data sharing

8
Phon usability
  • Intuitive graphical user interface
  • Helpful wizards (e.g. project creation, queries)
  • Record navigator
  • Custom selection of data fields
  • General / record-by-record
  • Intuitive query language
  • Standard terminology
  • Built-in queries (modifiable by user)
  • Query memorization and saving

9
Phon main functions
  • User management
  • Media segmentation
  • Phonetic transcription
  • Transcription merging (Selection of final
    transcriptions for analysis)
  • Phrase segmentation and alignment (Further
    segmentation according to research needs)
  • Syllable alignment (Alignment of syllables of
    target and actual forms)
  • Database query

10
User management
  • Secure login
  • User tasks / privilegesmanagement

11
Media segmentation
  • Generally similar to CLAN
  • Hit the space bar to define a speech segment
  • Default segment length user-defined
  • Useful for working on small speech segments
  • Segment editing
  • Change numerical value
  • Stretch the time segment by sliding pointer

Yvan Rose Replace yellow line in segment
timebar by waveform.
12
Transcription general interface
13
Transcription
  • Built-in IPA character map
  • Symbol categories
  • Access tosound segment
  • Interface for double-blind transcriptions
  • Tied with user management functions

Yvan Rose Link adulttranscription to an
electronic IPA dictionary. Need to develop a
transcription system for sounds that cant be
transcribed easily. Ability to assign a
feature set to a dummy character Ability to use
the forward slash bar to assign two competing
symbols to a given sound (e.g. p/b would imply
that voicing cannot be transcribed accurately
the alternants will be considered as one
consonant by the syllabifier and query
interpreter.
14
Transcription merging
  • Comparison of competing transcriptions
  • Direct access to media segment
  • Selection of most accurate transcription
  • Further refinement of selected transcription

Yvan Rose People an algorithm that would enable
a comparison of transcriptionsbased on specific
parameters (e.g. voicing). This algorithm could
build on the feature sets associated with each
segment transcribed.
15
Phrase alignment
  • Further segmentation of the utterances
  • Useful for researchon phonologicaldomains
  • A simple mouse click sets and resetsthe domain
    boundaries

Yvan Rose Several people requested different
levels of segmentation. This includes
morpho-syntacticlevels of segmentation, as well
as various levels of the prosodic
hierarchy.Also add PLAY button in the interface
of this module
16
Syllabification algorithm
  • Syllabification algorithm
  • Refined labeling of each syllabic position
  • Each label is a valid object for query

?
?
R
R
O
O
N
N
17
Syllabification algorithm
  • Parameters of syllabification areuser-definable

Timing tier
Syllable constituents
Yvan Rose The parameters will be revised
thoroughly. To add (among others) word-final
codas, list of exceptional clusters.Also add, to
complement stress attraction, an option of
ambisyllabic syllabification of
intervocalicconsonants in Strong-Weak syllable
juncture. In addition to this, we also need a
way to manually assign a syllabification to each
consonant whichcannot be accounted for by the
automatic algorithm.
18
Syllable alignment
  • Automatic alignment of syllables
  • Manual modifications

19
Query language
  • Quick and accurate queries on large amounts of
    data
  • Language features
  • Uses terms familiar to phonologists to compose
    queries
  • Syllable constituents onset, nucleus,
  • Stressed vs. unstressed syllables
  • Custom predicates
  • History of recent queries
  • Ability to save queries

20
Query language components
  • Selectors (e.g. Onset(Syllable x))
  • Predicates (e.g. Branching(Onset(Syllable x))
  • Boolean connectives
  • Example

let corpusName "TestCorpus", let corpus
Corpus(corpusName), let records
Records(corpus) foreach r in records foreach
p in Phrases(r) foreach s in
Syllables(p) Branching(Onset(TargetS
yllable(s))) AND NOT
Branching(Onset(ActualSyllable(s)))
21
Query tree structure
  • Branching onset reduction in 2nd syllable

Record
TargetPhrase
ActualPhrase
Syllable
Syllable
Syllable
Syllable
Rhyme
Rhyme
Rhyme
Rhyme
Nucleus
Nucleus
Nucleus
Nucleus
Onset
Onset
Onset
Onset
Coda
Coda
T
U
N
D
R
A
S
D
U
N
D
A
S
TRUE
FALSE
AND NOT
branching(
)
pos( , 2)
onset( )
TargetPhrase
MATCH
AND NOT
ActualPhrase
pos( , 2)
onset( )
branching(
)
22
Query results
  • View in application
  • Use to generate textual reports
  • Recording session (e.g. to exemplify a given
    process)
  • Time slice (e.g. to exemplify a stage of
    acquisition)
  • Entire database (to exemplify a learning curve)
  • Export
  • As Unicode file
  • As ASCII file (modulo font conversion
    limitations)

23
Enhancements (short term)
  • Improvement of syllable alignment algorithm
    (building on Kondraks 2003 algorithm)
  • Import function
  • ChildPhon files (including font translator
    --almost done!)
  • CHAT files
  • Incorporation user-defined fields
  • Incorporation of statistical functions
  • Chart report generator
  • Ability to select various chart formats
  • Bar graphs (for proportions within and across
    sessions)
  • Line graphs (for learning curves)

24
Enhancements (longer term)
  • Interoperability with Praat
  • Export to Praat (similar to CLAN function)
  • Interface to accommodate acoustic measurement
    data
  • Web-based interface
  • Data sharing at a distance
  • Easy query of corpora on CHILDES database
  • Further automation
  • Automatic detection of pre-identified processes

Yvan Rose Include function to extract phonetic
inventories per session/stage/Get examples of
canned analyses in literature on clinical
phonology.
25
Development timeline
  • End of fall of 2004
  • Completion of current development phase
  • Release of testing (Beta) version
  • Winter of 2005
  • Bug fixes
  • Improvement of functionality and user interface
    (including short-term enhancements)
  • Website creation (http//www.phon.ca/)
  • Completion of technical documentation
  • Notes to programmers
  • User guide
  • Summer of 2005
  • Release of ? Phon 1.0 as open-source freeware

26
Potential
  • Standard for data sharing
  • Large-scale investigations
  • Cross-linguistic investigations
  • Enhancement to CHILDES
  • Elaboration of a database fulfilling the needs of
    acquisitionists focussing on phonology and
    related issues
  • Investigation of interface issues (e.g. between
    morpho-syntax and phonology)

27
How to realize this potential
  • Team of researchers specializing in
  • Early acquisition (including babbling)
  • Segmental development
  • Prosodic development
  • Phonological disorders
  • Second language acquisition
  • Feedback on software development project
  • Data contribution
  • Existing corpora in digital format
  • Conversion of printed corpora
  • Identification of corpora (printed, with or
    without audio files)
  • Setting of conventions for data conversion

28
Our proposal
  • Constitution of a research team to develop a
    phonological component of CHILDES
  • Database
  • Supporting software
  • Elaboration, with the research team, of a grant
    application to support
  • Database elaboration
  • Software development
  • Periodical meetings
  • Workshops

29
Concretely
  • Feedback on software project
  • Software needs for various types of research Let
    us know what you need
  • Implementation Let us know how you want it to
    work
  • Contribution to grant application
  • Kinds of research would the new database
    enable Let us know what you would like to do
  • Impacts of this research (e.g. theoretical,
    clinical, )
  • Supporting letters
  • Contribution to the public database
  • Sharing of existing / future corpora
  • Establishment of conventions to format older
    corpora

30
Special thanks
  • The Phon team at Memorial
  • Rodrigue Byrne
  • Harold Wareham
  • Gregory Hedlund
  • Philip OBrien
  • For his great help with the TalkBank XML schema
  • Franklin Chen (Carnegie Mellon University)
  • For their useful feedback on an early version of
    this software
  • Heather Goad (McGill), Paula Fikkert (Nijmegen),
    Clara Levelt (Leiden), Katherine Demuth (Brown),
    Mark Johnson (Brown), Carrie Dyck (Memorial),
    Phil Branigan (Memorial), Brian MacWhinney
    (Carnegie Mellon), Bryan Gick (UBC), Sophie
    Wauquier-Gravelines (Nantes), Sharon Inkelas (UC
    Berkeley), Conxita Lleó, Sonia Frota (Lisbon),
    Maria João Freitas (Lisbon), Ronald Sprouse (UC
    Berkeley), Joe Pater (UMass, Amherst), John
    Archibald (Calgary), Éliane Lebel (Memorial)
    hoping that no one was forgotten
Write a Comment
User Comments (0)
About PowerShow.com