Knowledge-rich approaches for text summarization - PowerPoint PPT Presentation

About This Presentation
Title:

Knowledge-rich approaches for text summarization

Description:

Full text is not the only possible source material for ... Danger: over-effective compression leads to unreadable sentences. 10. Linguistic summarization ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 33
Provided by: minnava
Category:

less

Transcript and Presenter's Notes

Title: Knowledge-rich approaches for text summarization


1
Knowledge-rich approaches for text summarization
  • Minna Vasankari
  • 27.11.2001

2
Structure
  • 1. The idea
  • 2. Conceptual summarization
  • 3. Linguistic summarization
  • 4. Example system Plandoc
  • 5. Summary

3
The idea
  • Full text is not the only possible source
    material for summarization
  • Other sources
  • databases
  • simulation data
  • user interaction sequences
  • etc

4
The idea
  • Data with structure
  • easier to interpret than full text
  • no source text gt no shortcuts
  • text generation phase is hard
  • domain-dependency

5
Conceptual summarization
  • Sorting the source material
  • facts, events
  • Choosing what is important
  • must be included in the summary
  • and what is potentially important
  • can be left out or included

6
Conceptual summarization
  • What is important?
  • depends on the domain
  • depends on the input material
  • depends on the user

7
Conceptual summarization
  • Importance of a fact
  • manual decision
  • Importance of an event
  • manual decision
  • frequency analysis

8
Conceptual summarization
  • Potentially important facts/events are included
    only if they fit in
  • Determined by
  • space limit
  • linguistic constraints
  • possible ordering of facts

9
Linguistic summarization
  • Expressing the same information in fewer
    sentences
  • Method linguistic constructs revision
  • Danger over-effective compression leads to
    unreadable sentences

10
Linguistic summarization
  • Linguistic constructs
  • semantically rich words
  • modifiers of nouns or verbs
  • conjunction and ellipsis
  • abridged references
  • abstraction
  • aggregation
  • presentational techniques

11
Linguistic summarization
  • Semantically rich words
  • killing two birds with one stone
  • Karl Malone scored 39 points.
  • Karl Malone's 39 point performance is equal to
    his season high.
  • becomes
  • Karl Malone tied his season high with 39 points.

12
Linguistic summarization
  • Modifiers of nouns or verbs
  • one fact specifies a verb or a noun in another
    fact
  • Jay Humphries scored 24 points. He came in as a
    reserve.
  • becomes
  • Reserve Jay Humphries scored 24 points.

13
Linguistic summarization
  • Conjunction
  • joining facts with "and" or "or"
  • Mick Reynes scored 265 points last season and
  • Jack Jones scored 265 points last season.
  • Ellipsis
  • removing repetition
  • Mick Reynes and Jack Jones scored 265 points last
    season.

14
Linguistic summarization
  • Abridged references
  • using shorter names for already introduced things
  • San Antonio Spurs took a 127-111 victory over
    Denver Nuggets and handed Denver their seventh
    straight loss.

15
Linguistic summarization
  • Abstraction
  • replacing a series of events with a single event
  • mission start, movements, firing, damages,
    mission abort gt
  • failed mission

16
Linguistic summarization
  • Aggregation
  • connecting events with spatial or temporal
    adverbials
  • Site-A and Site-B simultaneously fired a missile.
  • Presentational techniques
  • using spatial or temporal adverbs
  • Site-A fired a missile at 1302. Three minutes
    later Site-B fired a missile.

17
Linguistic summarization
  • Revision approach 1
  • First create a draft summary from important facts
  • Then enrich the draft with potentially important
    facts
  • Revision approach 2
  • Generate the draft by collecting similar facts
    into each sentence
  • Compress the sentences with ellipsis etc.

18
Example system Plandoc
  • Application developed by K. McKeown, J.Robin and
    K.Kukich at Columbia University, New York and
    Bell Communication Research (1995)
  • Problem
  • a telephone company engineer plans how a
    telephone route should be developed in the next
    20 years
  • the engineer uses PLAN planning system software
  • Goal a documentation of the planning process

19
Plandoc input and output
  • Input a trace of user's actions with the PLAN
    system
  • 1. RUNID fiberall FIBER 6/19/93 act yes
  • 2. FA 1301 2 1995
  • 3. FA 1201 2 1995
  • 4. FA 1501 3 1995
  • 5. ANF 1201 1301 2 1995 24
  • END. 856.0 670.2

20
Plandoc input and output
  • Output a 1-2 page report
  • the initial plan PLAN proposed
  • refinements the engineer made
  • alternative refinements the engineer tried but
    rejected
  • the final plan
  • Purpose documentation

21
Plandoc conceptual summarization
  • Important facts
  • accepted parts of the initial plan accepted
    refinements to it
  • the final plan
  • rejected refinements?
  • the engineer decides

22
Plandoc overview of the method
  • Fact generator converts the input to an internal
    representation
  • facts presented as feature structures
    (attribute/value pairs)
  • Ontologizer enriches the facts with e.g. price
    information
  • Discourse planner groups the facts
  • A lexicalizer/sentence generator converts the
    groups into English

23
Plandoc processing the input
  • Example FA 1301 2 1995
  • Enriched feature structure
  • class refinement
  • ref-type fiber
  • action activation
  • csa-site 1301
  • date year 1995, quarter 2
  • price 56.00K

24
Plandoc grouping facts into sentences
  • Let's construct a sentence from the FA facts
  • FA 1301 2 1995
  • FA 1201 2 1995
  • FA 1501 3 1995
  • 1. Group facts by common action
  • action activation for all
  • one sentence is needed
  • FA 1301 2 1995
  • FA 1201 2 1995
  • FA 1501 3 1995

25
Plandoc grouping facts into sentences
  • 2. For each common-action group (sentence)
  • (a) Collapse groups which differ by one feature
    into a single group
  • two groups
  • FA 1301, 1201 2 1995
  • FA 1501 3 1995

26
Plandoc grouping facts into sentences
  • (b) If more than one group remains (sentence is
    broken into clauses by conjunction)
  • i. Find the feature that is shared across most
    groups (but has not the same value for all)
  • FA 1301, 1201 2 1995
  • FA 1501 3 1995
  • only the date feature is left and it has two
    values gt two clauses are needed

27
Plandoc grouping facts into sentences
  • ii. Sort the groups to subgroups by the most
    common shared feature (nested conjunction inside
    the clause)
  • each group has only one member
  • FA 1301, 1201 2 1995
  • FA 1501 3 1995

28
Plandoc grouping facts into sentences
  • iii. Repeat the selection of most common shared
    feature and sorting to subgroups until all have
    been sorted
  • no subgroups left
  • iv. Sort the clauses by date
  • FA 1301, 1201 2 1995
  • FA 1501 3 1995

29
Plandoc grouping facts into sentences
  • FA 1301, 1201 2 1995
  • FA 1501 3 1995
  • The produced sentence
  • This refinement activated fiber for CSAs 1301 and
    1201 in 1995 Q2 and this refinement activated
    fiber for CSA 1501 in 1995 Q3.
  • The final sentence after ellipsis
  • This refinement activated fiber for CSAs 1301 and
    1201 in 1995 Q2 and for CSA 1501 in 1995 Q3.

30
Plandoc grouping facts into sentences
  • Readibility
  • This refinement extended fiber from fiber hub
    8107 to CSAs 8128,8126, 8121 and 8113 and from
    fiber hub 8120 to the CO in 1994 Q1 and from the
    CO to CSA 8120 in 1994 Q3, with the active fibers
    placed on the primary path.
  • limit the number of facts conjoined
  • limit the number of embedded conjunctions inside
    a clause

31
Summary
  • Also other sources than text can be summarized
  • Problems
  • choosing the important elements
  • generating a compact and readable summary text
  • domain-dependency

32
Summary
  • Applications
  • automatic weather reports (not predictions!)
  • simulation reports
  • patient monitoring system summaries
  • etc
Write a Comment
User Comments (0)
About PowerShow.com