Title: Blog Summarization
1Blog Summarization
Student Wang Xuan Supervisor Kan Min-Yen
Abstract
System Design
Evaluation
We have built a blog summarization system to
assist people in getting opinions from the blogs.
After identifying topic-relevant sentences, our
system takes a further step to pick opinionated
sentences to form a summary. Unsupervised
iterative training is implemented to identify
opinions. Evaluation shows the sentence level
accuracy of our Opinion Identification Module is
79.7. The document level accuracy is 71.8,
which outperforms an existing sentiment analysis
system by 2.8.
Since our main contribution is Opinion
Identification Module, we evaluate the
performance of that module only.
Topic Relevance Module
Opinion Identification Module
Summarization Module
Summary
Opinion Query
Literature Review
- Summarization Approach
- Features location, thematic, fixed phrases, add
term - Similarity of two text units, Distance between
text units, Semantic relationships among words - Document format, Topics structure, Rhetorical
structure of the text. - Sentiment Analysis
- Identify prior subjectivity and sentiments
- Identify subjective language and its contextual
polarity - Subjective and sentiment analysis in NLP
application
- Split sentence into zones
Important parameters
Sentence I grew up with all women, and happen to
think I hate to generalize, but must they are
smarter than men. Zone1 I grew up with all
women Zone2 and happen to think I hate to
generalize Zone3 but must they are smarter than
men.
- Influence of negation word in zone
- Whole zone
- Three-word-window
- Word to be added to potential seed list
- All part of speech
- Only noun, adverb, adjective
- Influence of seed word in zone
- Whole zone
- Six words window for all words
- Three words after adjective, adverb.
- Three words before noun.
Human Summary Survey
- Polarity Identifier
- Match of seed word and part of speech
- Negation word
- Seven students
- Twelve blogs on abortion
- One hour time
- 100 words summary on What are peoples
opinions towards abortion?
Zone We do not have the advantage of seeing
that. Polarity negative (advantage
positive, not negation)
Common Behaviors
- Identify New Seed word
- Influenced by existing seed words based on its
part of speech
- Read the blogs to gain an understanding of their
contents. - Identify the relevant information to the given
question for each blog. - Use subjective information and discard
information that expresses facts. - Group the information into categories.
- Extract and organize the information into
well-formed sentences. - Combine these sentences to a paragraph.
Zone one of the very best movies ever made about
the life of movie making Potential POS_Seed
movies (best positive, movie noun)
- Significance of co-existence
- If (differencegt1), scoreFp/(FpFn)
- If (difference lt-1), score-Fn/(FpFn)
Conclusion and Future Work
- Enlightened by the human behavior experiment,
we built a three-stage blog summarization system.
- We employ Opinion Identification Module to
extract opinionated sentences in blog articles,
to fit blogs characteristic. - We use unsupervised iterative training process
in Opinion Identification Module. - More opinionated evidence are added through the
iterative process. - The evaluation of the Opinion Identification
Module achieved an accuracy of 79.7 at the
sentence level. - Future work can be done on investigate in the
relationship between zones.