IPSOM - PowerPoint PPT Presentation

About This Presentation
Title:

IPSOM

Description:

spoken interface (accessible by the visually impaired) ... To be done automatically using the grapheme-to-phone module of the DIXI project ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 13
Provided by: antniojoaq
Category:
Tags: ipsom | grapheme

less

Transcript and Presenter's Notes

Title: IPSOM


1
IPSOM
  • Indexing, Integration
  • and
  • Sound Retrieval in Multimedia Documents

2
Outline
  • Objectives 3
  • Problems to be addressed 4
  • Research team 5
  • Background 6
  • Work plan 7
  • Dissemination of results 8

3
Objectives
  • Improved access to spoken information
  • spoken interface (accessible by the visually
    impaired)
  • detection and indexing of units in spoken books
  • words
  • sentences
  • topics
  • Development of multimedia spoken books
  • broaden the usage of spoken books (didactic
    applications, etc.)
  • multimedia interfaces for access and retrieval

4
Problems to be Addressed
  • spoken books offer the visually impaired
    community a powerful source of information and
    leisure
  • however
  • information is sequentially stored in analogue
    form
  • 30,000 hours ? 2,000 books
  • information retrieval is
  • extremely slow and difficult
  • error prone
  • trial-and-error basis
  • not structured

5
Research Team
  • Speech Processing Group of INESC Lisboa
  • António Serralheiro (PhD)
  • Isabel Trancoso (PhD)
  • Carlos Teixeira (PhD)
  • Diamantino Caseiro (MSc)
  • Rui Amaral (MSc)
  • Hugo Amorim (UG)
  • Large Scale Informatics Laboratory of Faculdade
    de Ciências de Lisboa
  • Nuno Guimarães (PhD)
  • Teresa Chambel (MSc)
  • National Library
  • José Borbinha (MSc)

6
Background
  • Previous work on speech recognition and
    synthesis, development of spoken corpora and
    alignment tools
  • Current work on topic detection for broadcast
    news recognition (ALERT project)
  • Previous work on video segmentation and indexing
  • Current work on techniques and methods for
    integrating digital video with text (UNIBASE
    project)
  • Collaboration with the NISO efforts for the
    Talking Book standard
  • Collaboration in the DAISY (world-wide
    initiative) project for digital talking books

7
Work Plan
  • Duration 36 months
  • Manpower 193 personmonth

8
Dissemination of Results
  • Conferences Workshops
  • National (e. g. RECPAD, PROPOR)
  • International (e.g. AACE, ICASSP, ASRU,
    EUROSPEECH, ICSLP, ECDL)
  • Final Workshop
  • specially aimed at the visually impaired
    community
  • Web Site
  • dissemination of didactic or other multimedia
    applications
  • Browsing tools for spoken books
  • Digitally stored and indexed spoken books
    (distributed through BN)
  • invaluable resource for data-driven prosody
    modelling and unit selection for text-to-speech
    synthesis

9
Budget
  • Overall funding

10
Budget
  • Requested Funding by Institution

11
Spoken Corpora Alignment
  • Generation of an N-gram framework for topic
    detection.
  • Generation of phonetic transcriptions for the
    spoken book texts
  • To be done automatically using the
    grapheme-to-phone module of the DIXI project
  • Pronunciation of specific proper names or
    technical terms may eventually need to be
    manually corrected
  • Generation of speaker-dependent acoustic models,
    adapted to the reader of the spoken book
  • Context-dependant models are the state-of-the-art
    for LVCSR tasks
  • Especially important to model the intra and inter
    word vowel reduction phenomena that characterise
    European Portuguese.
  • Labelling of the spoken corpora
  • Initial segmentation stage
  • Knowledge sources derived from the above subtasks.

12
Multimedia Applications for Indexing and Retrieval
  • Search and retrieval models
  • keyword, combined indexes, topics, and
    standardised metadata
  • Access interface, visualisation and navigation on
    the "spoken books" base
  • Search
  • Query
  • Retrieval
  • Navigation
  • Integration of the tools/applications for
    "browsing" and "retrieval" with the phonographic
    base
  • Access interfaces and performance issues have to
    be designed
  • Usability testing and evaluation
Write a Comment
User Comments (0)
About PowerShow.com