Automatic Sentence Compression in the MUSA project - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Sentence Compression in the MUSA project

Description:

Remove disfluencies: compress sentence by removing repetitions introduced by hesitation ... Rule-Based Approach: compress sentences based on handcrafted ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 21
Provided by: walt147
Category:

less

Transcript and Presenter's Notes

Title: Automatic Sentence Compression in the MUSA project


1
Automatic Sentence Compression in the MUSA project
  • Walter Daelemans Anja Höthker
  • walter.daelemans_at_ua.ac.be
  • http//cnts.uia.ac.be
  • CNTS, University of Antwerp, Belgium
  • Languages The Media 2004, Berlin

2
MUSA
  • MUltilingual Subtitling of multimediA content
  • EU IST 5th framework, Sep. 2002 - Feb. 2005
  • Goals
  • Conversion of audio streams into TV subtitles
    (monolingual)
  • Translation of subtitles into French or Greek

3
(No Transcript)
4
Partners
  • ILSP, Athens coordination, integration
  • ESAT, KU Leuven Automatic Speech Recognition
  • CNTS, U. Antwerp Sentence compression
  • Systran, Paris Machine Translation
  • BBC, London Main User, Data provider, Evaluation
  • Lumiere, Athens Main User, Multilingual Data
    Provider, Evaluation

5
Goals for Sentence Compression
  • Automatically and dynamically generate subtitles
    based on constraints (words and characters)
  • Reduce the time needed for producing subtitles by
    expert subtitler
  • Provide an architecture that can easily be ported
    to other languages

6
Example
  • SPEECH
  • The task force is in place and ready to attack
    without mercy.
  • Constraints
  • Delete 3 words and 14 characters
  • Compression Module output
  • The task force is in place and ready to fight
    without mercy .
  • SUBTITLE
  • The task force is ready ...
  • ...to fight without mercy.

7
Approach
  • Remove disfluencies compress sentence by
    removing repetitions introduced by hesitation
  • I, I know that this war, this war will last for
    years
  • Paraphrasing replace part of the input sentence
    by shorter paraphrase
  • an increasing number of ? more and more
  • Rule-Based Approach compress sentences based on
    handcrafted deletion rules that combine
  • Shallow-parsing information (identifying
    constituents used by deletion rules)
  • Relevance measures (determine in which order to
    delete constituents)

8
Shallow Parsing POS Tagging
  • The/Det woman/NN will/MD give/VB Mary/NNP a/Det
    book/NN

9
Shallow Parsing Chunking
  • The/Det woman/NNNP will/MD give/VBVP
    Mary/NNPNP a/Det book/NNNP

10
Shallow parsing Sense Tagging
  • The/Det woman/NNNP-PERSON will/MD give/VBVP
    Mary/NNPNP-PERSON a/Det book/NNNP-MATERIAL-OBJ
    ECT

11
Shallow Parsing Relation Finding
person
material-object
person
12
MBSP (Perl)
Text In
Tokenizer (Perl)
MBT server POS Tagger
TiMBL server Known words
TiMBL server Relation Finder
TiMBL server Unknown words
MBT server Concept Tagger
Timbl server Phrase Chunker
TiMBL 5.0 MBT 2.0 http//ilk.uvt.nl/
TiMBL server Known words
TiMBL server Unknown words
13
Rule-Based Approach (syntax)
  • Deletion rules mark phrases for deletion based on
    shallow parser output
  • Rules for adverbs, adjectives, PNPs, subordinate
    sentences, interjections, ...
  • Phrases are deleted iteratively until target
    compression rate is met

14
  • Example Rule ADJECTIVES
  • if(POS(word) JJ CHUNK(word) ! ADJP-END
    word-1 ! most least more less)
  • delete(word)
  • if (word-1CC word-2JJ)
  • delete(word-1)
  • elseif (word1CC word2JJ)
  • delete(word1)
  • Adam 's only serious childhood illness
    had been measles
  • The virus triggered an 1 extremely 1 2 rare 3
    and 2 fatal 3 condition

15
Relevance Measures (semantics)
  • Deletion rules suggest more deletions than
    necessary for reaching target compression
  • System rates the different possibilities and
    starts with deleting the least important phrases
  • Relevance measures in MUSA are based on (a
    weighted combination of)
  • Word frequencies (in BNC corpus)
  • Rule Probabilities (as encountered in parallel
    BBC corpus of transcripts with associated
    subtitles)
  • Word Durations (compare estimates with actual
    durations)

16
Example
  • This is a basic summarizer for English used for
    demonstration purposes.
  • (NP This) is (NP a basic11 summarizer) (PNP for
    English)10 used (PNP for demonstration
    purposes)12.
  • This is a basic summarizer used for demonstration
    purposes
  • This is a summarizer used for demonstration
    purposes
  • This is a summarizer used

17
Evaluation Data (Lumiere)
  • MMR - every parents choice
  • 243 segments
  • 39.5 of the segments need compression
  • Average target compression rate 4.58 words, 1.98
    chars
  • The Tranquiliser Trap
  • 287 segments
  • 50.52 of the segments need compression
  • Average target compression rate 3.21 words, 2.0
    chars

18
Human Evaluation
19
Conclusions
  • We presented the Sentence Compression Module of
    the MUSA system
  • Eclectic system combining statistical techniques
    for relevance detection with handcrafted deletion
    rules based on shallow parser output
  • Evaluation suggests usefulness (with transcripts
    as input)
  • Future Work
  • Porting to other languages
  • Machine Learning of paraphrases

20
Demos
  • Sentence Compression http//cnts.uia.ac.be/cgi-bin
    /anja/musa
  • MUSA demo http//sifnos.ilsp.gr/musa/demos
Write a Comment
User Comments (0)
About PowerShow.com