Summarization of Multiple, Metadata Rich, Product Reviews - PowerPoint PPT Presentation

About This Presentation
Title:

Summarization of Multiple, Metadata Rich, Product Reviews

Description:

ReSum Algorithm (Review Summarizer) Creates extractive summary ... that's why naive ReSum performed so well. repetition in Copernic ... Monitor A - ReSum. PROS ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 22
Provided by: mlkdCs
Category:

less

Transcript and Presenter's Notes

Title: Summarization of Multiple, Metadata Rich, Product Reviews


1
Summarization of Multiple, Metadata
Rich,Product Reviews
Department of Informatics Aristotle University
of Thessaloniki LPIS Group http//lpis.csd.auth.g
r
  • Fotis Kokkoras, Efstratia Lampridou,
    Konstantinos Ntonas, Ioannis Vlahavas

MSoDa '08 ECAI 2008 Workshop on Mining Social
Data
2
Introduction
  • Modern, successful on-line shops allow consumers
    to express their opinion on products and services
    they purchased.
  • These reviews are valuable for new customers.
  • If there are dozens, or even hundreds, of
    reviews for a single product, their utilization
    is time-consuming.
  • The need for automatically generated summaries of
    these reviews is obvious.

3
Summarization Background
  • Types of summary
  • Extractive use sentences from the original text
  • Abstractive reuse sentence fragments
  • Text features usually used
  • frequency and location of words, sentence
    location in article, syntactic rules,
    dictionaries of important words
  • Various Techniques/Approaches
  • Machine Learning Techniques
  • LSA (Latent Semantic Analysis)
  • Lexical Chains
  • Cluster-based
  • They perform well on article-style texts.

4
The Special Nature of Reviews
  • On-line product reviews in e-shops, are quite
    different than article-style texts
  • They are usually short and do not obey to strict
    syntactic rules.
  • They convey only the subjective opinion of each
    reviewer.
  • there are a lot of reviewers!
  • They include a lot of repeated content.
  • There are usually too many reviews.

5
What is the problem?
  • Traditional summarization techniques do not work
    very well of such data.
  • Why?
  • a frequently mentioned problem can be reported
    many times in the summary of summarizers that
    work on the sentence level
  • reuse of sentence fragments to construct new
    sentences is risky because reviews are short with
    weak/poor syntax
  • it is difficult to detect biased reviews based on
    their text only

6
Motivation




  • On-line reviews are usually accompanied by
    various metadata, such as
  • buyer's technology level,
  • ownership of the product,
  • overall judgment for the product or service, in
    some scale,
  • labeled (positive or negative) or unlabeled
    comments,
  • usefulness of the review to other customers, etc.
  • How can these metadata help in summarization?

7
Our Approach
  • ReSum Algorithm (Review Summarizer)
  • Creates extractive summary
  • Uses dictionary of important words and metadata
  • Is applied separately for () and (-) comments
  • For each product two summaries are created
  • How it works
  • Scores the sentences based on their words
  • Adjusts the initial score based on the metadata
  • Selects sentences avoiding repetition of concepts
  • Tested on newegg.com

8
Requirements
  • A dictionary D of important words for the domain
  • automatically created from a few thousands
    reviews of the domain in question
  • concatenation of reviews
  • removal of common (500) English words
  • selection of the top 150 most frequent words
  • Access to the reviews (and their metadata)
  • we use DEiXTo, an in-house developed, web
    content extraction system
  • HTML/DOM based extraction rules

9
ReSum Initial Scoring
  • Step 1
  • Concatenate all positive (or negative) comments
    and divide them into separate sentences.
  • Remove stop words, punctuation, numbers, etc
  • Count frequency fv of every word v.
  • Step 2
  • Score every sentence i based on its words and the
    dictionary D

10
ReSum Metadata Contribution
  • Metadata used
  • Reviewers Technology Level (w1)
  • Ownership duration of the product (w2)
  • Usefulness of a review to other users (w3)
  • Step 3
  • Initial score Ri is adjusted based on the
    metadata, in a weighted fashion
  • weights are initialized using multicriteria
    techniques (will be explained later)

11
ReSum Redundancy Elimination
  • Step 4
  • Select the sentence with the highest score S.
  • Penalize the rest sentences that share common
    words with the selected.
  • This eliminates redundancy.
  • The step is repeated until the desired number of
    sentences is reached.

12
Weight Initialization (1/3)
  • Subjective task
  • we need a consistent way for weight
    initialization
  • Analytic Hierarchy Process (AHPSaaty 99)
  • multicriteria method
  • provides a methodology to calculate consistent
    weights for selection criteria, according to the
    importance we assign to them
  • importance values are selected from a predefined
    scale (defined by AHP)

13
Weight Initialization (2/3)
  • Fundamental Scale of AHP
  • Subjective Importance Values we used

14
Weight Initialization (3/3)
  • Calculated weights w10.14, w20.24, w30.62
  • Initial weights were further adjusted based on
    the metadata values

15
Experimental Results (1/2)
  • Dataset
  • 1587 reviews from newegg.com
  • 3 domains (monitors, printers, cpu coolers)
  • 9 products (3 from each domain)
  • Reference Summary
  • manually generated by 3 human experts
  • Comparison Systems
  • Two commercial summarizers
  • TextAnalyst (Megaputer Intelligence Inc)
  • Copernic (Copernic Inc)
  • Naive ReSum
  • contribution of metadata (step 3) was removed

16
Experimental Results (2/2)
  • Average Recall 91.7 (78.8), 69.5, 54
  • Average Precision 73.3 (62.8), 58.3, 53.3

17
Interesting Facts in our Summaries
  • Neither biased nor abusive comments appeared
  • it did happened in the other 3 systems
  • Comments with low frequency but with significant
    meaning were included
  • was not the case for the other 3 systems
  • Repetition of concepts was minimal or absent
    thanks to the redundancy elimination step
  • thats why naive ReSum performed so well
  • repetition in Copernic and TextAnalyst was evident

18
Conclusions
  • Metadata can contribute to a better summary.
  • We proposed an algorithm for summarizing on-line,
    metadata rich, product reviews.
  • Is Statistical in it's nature.
  • Assumes labeled comments (pros cons).
  • Works at the sentence level
  • Ranks sentences based on some "importance
    measure and selects the N most important of them.
  • Uses metadata to make "good" ranking.

19
Future Work
  • Generalize our methodology to adapt to the
    availability or not of the various metadata.
  • the scoring algorithm is modular can easily add
    or remove weights/metadata
  • Remove the requirement for categorized reviews
    (positive and negative)

20
Summarization of Multiple, Metadata
Rich,Product Reviews
Department of Informatics Aristotle University
of Thessaloniki LPIS Group http//lpis.csd.auth.g
r
Thank you!
  • Fotis Kokkoras, Efstratia Lampridou,
    Konstantinos Ntonas, Ioannis Vlahavas

MSoDa '08 ECAI 2008 Workshop on Mining Social
Data
21
Monitor A - ReSum
  • PROS
  • Great resolution, clear picture, very very good
    price, 24in monitors are gigantic, widescreen
    aspect ratio makes dvds look awesome
  • Very, VERY bright, HDMI, no dead pixels, looks
    much nicer than online photos, unbeatable viewing
    angle
  • Excellent color reproduction fantastic image and
    text quality very good brightness and contrast
    HDMI input unbeatable value
  • Several things stood out above all other monitors
    I'd considered Almost non-existent issues of
    dead/stuck pixels
  • Resolution sharpness is amazing In my opinion,
    sleek design Functional speakers (not the best)
    Audio output is available Multiple inputs
  • CONS
  • So when Windows power management turns off the
    monitor signal, instead of turning off the
    monitor goes to bluescreen and says ""no signal""
    on the HDMI input
  • no height or rotation adjustments flimsy base
    awkward location of OSD buttons no DVI
    connection (no DVI to HDMI cable included)
  • Weak stand, awful menu controls, no audio out, no
    USB ports, low buzzing sound when brightness
    turned down
  • This monitor is so darn tall it strains my neck a
    bit to view it - but that's simply a natural
    consequence of its size
  • Doesn't come with a DVI to HDMI cable that you
    will need to run this with a computer to get a
    good picture (don't use the vga port)
Write a Comment
User Comments (0)
About PowerShow.com