11'2 FeatureBased Opinion Mining and Summarization - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

11'2 FeatureBased Opinion Mining and Summarization

Description:

The reviewer usually writes both positive and negative aspects of the product. ... 'The earphone broke in two days.' Definition (opinion holder) ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 38
Provided by: dily7
Category:

less

Transcript and Presenter's Notes

Title: 11'2 FeatureBased Opinion Mining and Summarization


1
11.2 Feature-Based Opinion Mining and
Summarization
2
Feature-Based Opinion Mining and Summarization
  • The reviewer usually writes both positive and
    negative aspects of the product.
  • To obtain detailed aspects
  • Identifying and extracting product features.
  • the picture quality of this camera is amazing,
  • Determining whether the opinions on the features
    are positive, negative or neutral.

3
11.2.1 Problem Definition
  • Definition (object)
  • An object O is an entity which can be a product,
    person, event, organization, or topic.
  • O (T, A), T is a hierarchy or taxonomy of
    components, and A is a set of attributes of O.
  • Example 2
  • digital camera -gt lens, battery, view-finder,
    etc
  • digital camera picture quality, size, weight
  • Battery battery life, battery size, battery
    weight

4
Problem Definition
  • An object is represented as a tree.
  • root is the object itself.
  • Each non-root node is a component or
    subcomponent.
  • Each link represents a part-of relationship.
  • Each node is also associated with a set of
    attributes.

5
Problem Definition
  • An opinion can be expressed on any node and any
    attribute of the node.
  • Example 3
  • I do not like this camera
  • the picture quality of this camera is poor
  • the battery of this camera is bad
  • the battery life of this camera is too short.

6
Problem Definition
  • To simplify our discussion, we use the word
    features to represent both components and
    attributes.
  • Let the evaluative text (e.g., a product review)
    be r.
  • r consists of a sequence of sentences.

7
Problem Definition
  • Definition (explicit and implicit feature)
  • If a feature f appears in evaluative text r, it
    is called an explicit feature in r.
  • Example 4
  • The battery life of this camera is too short.
    (battery life)
  • This camera is too large (size)

8
Problem Definition
  • Definition (opinion passage on a feature)
  • The opinion passage on feature f of an object
    evaluated in r is a group of consecutive
    sentences that expresses a positive or negative
    opinion on f.
  • The picture quality is good, but the battery
    life is short.

9
Problem Definition
  • Definition (explicit and implicit opinion)
  • explicit opinion a subjective sentence.
  • The picture quality of this camera is amazing.
  • implicit opinion an objective sentence.
  • The earphone broke in two days.
  • Definition (opinion holder)
  • The holder of a particular opinion is a person or
    an organization that holds the opinion.

10
Problem Definition
  • To define a model of an object and a set of
    opinions on the object.
  • An object is represented with a finite set of
    features,
  • Each feature fi in F can be expressed with a
    finite set of words or phrases Wi, which are
    synonyms.
  • Each opinion holder j comments on a subset of the
    features
  • For each feature that j comments on, he
    chooses a word or phrase from Wk to describe the
    feature, and then expresses a opinion on it.

11
Problem Definition
  • This model introduces three main practical
    problems.
  • Problem 1 F and W are unknown. Then, in opinion
    mining, we need to perform three tasks
  • Task 1 Identifying and extracting object
    features that have been commented on in each
    evaluative text
  • Task 2 Determining whether the opinions are
    positive , negative or neutral.
  • Task 3 Grouping synonyms of features, as
    different people may use different words or
    phrases to express the same feature.

12
Problem Definition
  • Problem 2 F is known but W is unknown.
  • Task 3 becomes the problem of matching discovered
    features with the set of given features F.
  • Problem 3 W is known (then F is also known).
  • Only need to perform Task 2.
  • Example 6
  • A cellular phone company wants to mine customer
    reviews on a few models of its phones. There is
    no need to perform Tasks 1 and 3.

13
Problem Definition
  • Output The final output is a set of pairs. Each
    pair is denoted by (f, SO), where f is a feature
    and SO is the semantic orientation.
  • To use the results, a simple way is to produce a
  • feature-based summary of opinions on the object.

14
Problem Definition
  • Example 7 Assume we summarize the reviews of a
    particular digital camera, digital_camera_1.

15
Problem Definition
  • The summary can also be visualized using a bar
    chart.

16
Problem Definition
17
Problem Definition
  • Four other important issues
  • 1. Separation of Opinions on the Object itself
    and its Features.
  • 2. Granularity of Analysis.
  • At level 1identify opinions on the object itself
    and its attributes.
  • At level 2 identify opinions on the major
    components and also opinions on the attributes of
    the components.
  • At other levels, similar tasks can be performed.

18
Problem Definition
  • Example 8
  • I like this camera. Its picture quality is
    amazing. However, the battery life is a little
    short
  • 3. Opinion Holder Identification
  • Opinion holders are more useful for news
    articles, in which the person or organization
    that expressed an opinion is usually stated in
    the text explicitly.
  • The opinion holders are often the authors of
    discussion posts, bloggers, or reviewers, whose
    login ids are often known although their true
    identities in the real-world may be unknown.

19
Problem Definition
  • 4. Opinioned Object Identification and Pronoun
    Resolution.
  • I have a Canon S50 camera purchased from Amazon.
    It takes great photos.
  • (1) what object does the post praise?
  • (2) what it means in the second sentence?
  • to automatically discover answers is a very
    challenging problem.

20
11.2.2 Object Feature Extraction
  • There are three main review formats on the Web.
  • Different review formats may need different
    techniques to perform the feature extraction
    task.
  • Format 1 - Pros, cons and the detailed review

21
Object Feature Extraction
  • Format 2 - Pros and cons

22
Object Feature Extraction
  • Format 3 - Free format

23
Object Feature Extraction
  • In both formats 1 and 2, only product features
    need to be identified.
  • For format 3, we need to identify both product
    features and opinion orientations.

24
11.2.3 Feature Extraction from Pros and Cons of
Format 1
  • A product feature can be expressed with a noun,
    adjective, verb or adverb.
  • The labels and their POS tags used in mining LSRs
    are feature, NN, feature, JJ, feature,
    VB and feature, RB
  • Each sentence segment in pros and cons contains
    only one feature. Sentence segments are separated
    by commas, periods, semi-colons, hyphens, s,
    ands, buts, etc.

25
Feature Extraction from Pros and Cons of Format 1
  • We call a word that indicates an implicit feature
    an implicit feature indicator.

26
Feature Extraction from Pros and Cons of Format 1
  • 1. Training data preparation for LSR mining
  • Part-Of-Speech (POS) tagging and sequence
    generation
  • Included memory is stingy -gt
  • ltincluded, VBmemory, NNis, VBstingy, JJgt.
  • Replace the actual feature words with feature,
    lttaggt
  • ltincluded, VBfeature, NNis, VBstingy, JJgt

27
Feature Extraction from Pros and Cons of Format 1
  • Use an n-gram to produce shorter segments from
    long ones
  • generate two trigram sequences
  • ltincluded, VBfeature, NNis, VBgt
  • ltfeature, NNis, VBstingy, JJgt
  • Perform word stemming.

28
Feature Extraction from Pros and Cons of Format 1
  • 2. Label sequential rule mining
  • A LSR mining system is applied to find rules.
  • A suitable minimum confidence and minimum support
    should be used.
  • lteasy, JJ to, VBgt
  • ?lteasy, JJtofeature, VBgt(language
    pattern)

29
Feature Extraction from Pros and Cons of Format 1
  • 3. Feature extraction
  • the word in the sentence segment that matches
    feature in a language pattern is extracted.
  • Three situations are considered in extraction
  • If a sentence segment satisfies multiple rules,
    we search for a matching rule in the following
    order feature, NN, feature, JJ,
    feature, VB and feature, RB.

30
Feature Extraction from Pros and Cons of Format 1
  • For sentence segments that no rules apply, nouns
    or noun phrases produced by a POS tagger are
    extracted as features
  • For a sentence segment with only a single word,
    the single words are treated as features.

31
Feature Extraction from Pros and Cons of Format 1
  • Mapping to Implicit Features
  • There are many types of implicit feature
    indicators, Their exact meaning can be domain
    dependent.
  • Grouping Synonyms
  • It is common that people use different words or
    phrases to describe the same feature.
  • Many synonyms are domain dependent.
  • Granularity of Features
  • In a practical application, we need to determine
    the right level of analysis.

32
11.2.4 Feature Extraction from Reviews of
Formats 2 and 3
  • Complete sentences are more complex and contain a
    large amount of irrelevant information.
  • An unsupervised method for finding explicit
    features that are nouns and noun phrases.
  • This method requires a large number of reviews,
    and consists of two steps
  • 1. most product features are nouns, and those
    nouns that are frequently talked about are
    usually genuine and important features.

33
Feature Extraction from Reviews of Formats 2 and 3
  • 2. Finding infrequent features by making use of
    sentiment words. The same opinion word can be
    used to describe different objects.
  • The precision of step 1 of the above algorithm
    was improved by Popescu and Etzioni.
  • by computing a PMI score between the phrase and
    meronymy discriminators associated with the
    product class.

34
Feature Extraction from Reviews of Formats 2 and 3
  • where f is a candidate feature identified in step
    1 and d is a discriminator.
  • Web search is used to find the number of hits.
  • If the PMI value of a candidate feature is too
    low, it may not be a component of the product
    because f and d do not co-occur frequently.

35
11.2.5 Opinion Orientation Classification
  • 1. Using sentiment words and phrases
  • Manually find a set of seed positive and negative
    words.
  • Grow each of the seed set by iteratively
    searching for their synonyms and antonyms in
    WordNet until convergence.
  • Manually inspect the results to remove those
    incorrect words.

36
Opinion Orientation Classification
  • using the final lists of positive and negative
    words, phrases, idioms and patterns, each
    sentence that contains product features can be
    classified as follows
  • A positive word or phrase is assigned a score of
    1 and a negative word or phrase is assigned a
    score of -1. All the scores are then summed up.
  • If the final total is positive, then the sentence
    is positive, otherwise it is negative.

37
Opinion Orientation Classification
  • If a negation word is near a sentiment word, the
    opinion is reversed.
  • A sentence that contains a but clause
    (subsentence that starts with but, however,
    etc.) indicates a sentiment change for the
    feature in the clause.
  • The opinion orientations of many words are domain
    and/or sentence context dependent.
Write a Comment
User Comments (0)
About PowerShow.com