Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System Architecture (MUSA) - PowerPoint PPT Presentation

About This Presentation

Title:

Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System Architecture (MUSA)

Description:

Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System Architecture (MUSA) S.Piperidis, I.Demiros, P.Prokopidis – PowerPoint PPT presentation

Number of Views:227

Avg rating:3.0/5.0

Slides: 19

Provided by: SteliosP3

Category:

more less

Transcript and Presenter's Notes

Title: Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System Architecture (MUSA)

1
Multimodal multilingual information processing
for automatic subtitle generation Resources,
Methods and System Architecture (MUSA)

S.Piperidis, I.Demiros, P.Prokopidis
spip, iason, prokopis_at_ilsp.gr

2
Objectives

explore the degree to which subtitling can be
automated by using the appropriate
technologies
focus on human language technologies
explore the degree to which speech and
language technologies can be integrated
try out system architectures simulating the
underlying cognitive processes

3
Challenges of Subtitling

the challenge in automated generation is that
there must be agreement between subtitles, the
spoken source language and the corresponding
image
generated subtitles must meet a set of
constraints imposed by the visual context of
the
text and spatio-temporal factors
subtitle text is no longer normal written text
but
rather oral text

4
Experiments in MUSA

experiments on monolingual and multilingual
subtitle generation
Languages English source target
French Greek target
Technologies used
English ASR component for the transcription of
audio
streams into text
Subtitling component producing English subtitles
from
English audio transcriptions
Translation component integrating machine
translation and translation memory, for EN-FR
EN-EL

5
Architecture
6
Resources for subtitling

in order to train and evaluate system
components,
an array of application specific resources is
needed
primary audiovisual data from BBC World Service,
documentaries and newsy current affairs
for each programme, the following parallel data
are sourced
the actual video of the programme
its script or hand-made transcript
English, Greek and French subtitles
topically relevant newspaper
and web-sourced texts

7
Resources overview
Scripts Tran scripts ScriptsTran scripts EN sub titles EL sub titles FR sub titles
Horizon 110.452 55.224 165.676 121.036 106.668 38.875
Panorama 87.039 87.039 43.981 35.623 25.891
Misc 563.155 563.155 408.214 351.857 64.381
DVDs 89.882 89.882 77.629 58.427
Totals 763.489 142.263 905.752 650.860 552.575 129.147
8
Speech recognition component

Use of parallel corpus of BBC programs, audio
and hand-made transcripts, as well as
topically
relevant newspaper texts
Tuning of acoustic and language models of the
KUL/ESAT recogniser
Background noise non-native speech hinder the
process
Aligning audio with hand-made transcripts
proved to be a working solution helping
overcome
noise and non-native speakers problems

9
Speech recognition component (2)

10
Constraints Requirements

subtitling conventions in various EU countries

constraints entail that compression of
transcripts segments is required
compression rate expressed in of words and
of chars to delete

11
Subtitling engine resources

Use of a parallel corpus of BBC programs
featuring program hand-validated transcripts
and their hand-made subtitles
Align sentences and words in the parallel corpus
Extract a table of paraphrases to aid compression
Example
Within the next few years -gt Soon
During the years when -gt While
It was clear that -gt Clearly

12
Subtitling engine resources (2)

If compression rate is not reached by using
paraphrasing, apply syntactic rules to delete
low-importance units (e.g. adverbs, adjectives,
etc)
Hand-crafted deletion rules making use of
a shallow-parse of the segments
surprise values for each word, computed on the
basis of a large text corpus.
If more deletable segments than necessary exist,
start by deleting the least important segments
first.

13
Translation component