Summarization Nisha - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Summarization Nisha

Description:

Brooke White needs our votes too. Just this week, Brooke got choked up about missing her sister's wedding. ... 'brooke': 4 'lot': 2 'jason': 4 Extracting ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 20
Provided by: Samu55
Category:

less

Transcript and Presenter's Notes

Title: Summarization Nisha


1
Summarization-Nisha Lakshmi
2
Summarization
  • Single document summarization.
  • Keywords
  • Headline words
  • Combination of headline, position and keywords.
  • Lexical chains.
  • Rely on empirical methodology.
  • Headline generation using HMM.

3
Statistical Generation
  • Word frequency
  • - calculate word frequency
  • - threshold meankSD
  • - select words greater than threshold
  • - select sentences containing these words.

4
Continuation.
  • In the season's greatest shock thus far, it was
    Michael Johns, instead, who received the fewest
    votes this week. Why was Michael Johns ejected
    from the Idoldome when he was a fixture in many
    "Idol" forecasters Top Three lists? I was asked
    on Thursday, more than once, whether I thought
    Syesha Mercado or Carly Smithson was headed home.
    After a solid performance of "Dream On" on
    Tuesday, Michael Johns seemed a shoo-in to the
    next round. When he was revealed to be in the
    bottom three, Randy said it was a "pretty
    surprising bottom three," and Paula said that
    America got it "partly right, partly wrong.
    Brooke White needs our votes too. Just this week,
    Brooke got choked up about missing her sister's
    wedding. And what of Jason Castro? Jason
    explained that on "Idol," "there's just been a
    lot more not-music stuff than I thought. " We
    knew what they meant. " And Michael wailed his
    way through "Dream On" like he had nothing to
    lose because finally, he did. After that scare,
    America made sure to take good care of Syesha.

5
  • Headline Words
  • Select the sentences containing the headline
    words.
  • Output
  • I was asked on Thursday, more than once, whether
    I thought Syesha Mercado or Carly Smithson was
    headed home. In the season's greatest shock thus
    far, it was Michael Johns, instead, who received
    the fewest votes this week. After a solid
    performance of "Dream On" on Tuesday, Michael
    Johns seemed a shoo-in to the next round. America
    thought Michael didn't need their help. After
    David's lackluster performance on Tuesday, fans
    rushed to vote for him, because he needed their
    support. Jason explained that on "Idol," "there's
    just been a lot more not-music stuff than I
    thought.

6
  • Combined Approach
  • Word frequency
  • Headline words
  • Frequencya (a2)
  • Position of the sentence
  • Group of 5 sentences
  • (N)-No of groups
  • Frequency5(N-i) where i is the group number

7
  • I was asked on Thursday, more than once, whether
    I thought Syesha Mercado or Carly Smithson was
    headed home.There was no question, in the minds
    of many, that one of the two would be
    eliminated.But that's not what happened.In the
    season's greatest shock thus far, it was Michael
    Johns, instead, who received the fewest votes
    this week. And he was as surprised as we were.
    After a solid performance of "Dream On" on
    Tuesday, Michael Johns seemed a shoo-in to the
    next round. When he was revealed to be in the
    bottom three, Randy said it was a "pretty
    surprising bottom three," and Paula said that
    America got it "partly right, partly wrong.
  • " We knew what they meant. Then, Ryan Seacrest
    added insult to injury by announcing that both
    Carly and Syesha were safe. The audience gasped
    in horror, and Carly's jaw dropped about a foot.
  • Why was Michael Johns ejected from the Idoldome
    when he was a fixture in many "Idol" forecasters
    Top Three lists?

8
Lexical Chains
  • Most prevalent discourse topic will play an
    important role in the summary.
  • Lexical cohesion
  • Used WordNet for determining relatedness of
    words.

9
  • Algorithm
  • Select a set of candidate words.
  • Nouns
  • Chunking
  • Find an appropriate chain.
  • Distance threshold
  • Hypernymy and hyponymy
  • Senses considered
  • Insert word in chain.

10
  • Scoring Chains
  • Good predictors for strength of the chain
  • Length Number of occurrences
  • Homogeneity index
  • ( 1-number of distinct occurrences)/length
  • Scoring function
  • Score(chain) Length HomogeneityIndex
  • Strength criterion
  • Score(chain)Average(scores)wSD(scores)
  • Where W1

11
  • Lexical chain
  • 'johns' 5
    'competition' 1
    'idol' 3, 'picture' 2, 'guy' 1
    2

12
  • Extracting the sentences
  • Choose the sentence that contains the first
    appearance of a representative chain member in
    the text.

13
  • Output
  • I was asked on Thursday, more than once, whether
    I thought Syesha Mercado or Carly Smithson was
    headed home. There was no question, in the minds
    of many, that one of the two would be eliminated.
    In the season's greatest shock thus far, it was
    Michael Johns, instead, who received the fewest
    votes this week. After a solid performance of
    "Dream On" on Tuesday, Michael Johns seemed a
    shoo-in to the next round.  

14
  • Precision and Recall
  • Precision Recall
  • Head 0.372 0.216
  • Combine 0.523 0.378
  • Frequency 0.258 0.279
  • Lexicals 0.258 0.279

15
Headline Generation
  • Hidden Markovs Model
  • First N words (N60)
  • Extracted in order
  • Generative process
  • Switches between Headline state and Story state
  • Use Viterbi algorithm

16
(No Transcript)
17
  • Language model
  • Probabilities estimated from training corpus.
  • Bigram probabilities of the headline words used
    in the story.
  • Unigram probabilities of the non-headline words
    used in the story.
  • Smoothing.

18
  • Headlines
  • an army
  • a senior malaysian indian leader has
  • an indian
  • nd then told us that

19
References
  • 1) Natural Language Toolkit2) OpenNLP tools3)
    Automatic Headline Generation for Newspaper
    Stories , Zajic et al4) Hedge Trimmer A
    Parse-and-Trim Approach to Headline Generation,
    Zajic et al5)Headline Generation using a
    Training Corpus, Jin et al6) Improved Algorithms
    For Keyword Extraction and Headline Generation
    From Unstructured Text, Mondal et al7) Using
    Lexical Chains for Text Summarization, Barzilay
    et al
Write a Comment
User Comments (0)
About PowerShow.com