Title: 11'2 FeatureBased Opinion Mining and Summarization
111.2 Feature-Based Opinion Mining and
Summarization
2Feature-Based Opinion Mining and Summarization
- The reviewer usually writes both positive and
negative aspects of the product. - To obtain detailed aspects
- Identifying and extracting product features.
- the picture quality of this camera is amazing,
- Determining whether the opinions on the features
are positive, negative or neutral.
311.2.1 Problem Definition
- Definition (object)
- An object O is an entity which can be a product,
person, event, organization, or topic. - O (T, A), T is a hierarchy or taxonomy of
components, and A is a set of attributes of O. - Example 2
- digital camera -gt lens, battery, view-finder,
etc - digital camera picture quality, size, weight
- Battery battery life, battery size, battery
weight -
4Problem Definition
- An object is represented as a tree.
- root is the object itself.
- Each non-root node is a component or
subcomponent. - Each link represents a part-of relationship.
- Each node is also associated with a set of
attributes.
5Problem Definition
- An opinion can be expressed on any node and any
attribute of the node. - Example 3
- I do not like this camera
- the picture quality of this camera is poor
- the battery of this camera is bad
- the battery life of this camera is too short.
6Problem Definition
- To simplify our discussion, we use the word
features to represent both components and
attributes. - Let the evaluative text (e.g., a product review)
be r. - r consists of a sequence of sentences.
7Problem Definition
- Definition (explicit and implicit feature)
- If a feature f appears in evaluative text r, it
is called an explicit feature in r. - Example 4
- The battery life of this camera is too short.
(battery life) - This camera is too large (size)
8Problem Definition
- Definition (opinion passage on a feature)
- The opinion passage on feature f of an object
evaluated in r is a group of consecutive
sentences that expresses a positive or negative
opinion on f. - The picture quality is good, but the battery
life is short.
9Problem Definition
- Definition (explicit and implicit opinion)
- explicit opinion a subjective sentence.
- The picture quality of this camera is amazing.
- implicit opinion an objective sentence.
- The earphone broke in two days.
- Definition (opinion holder)
- The holder of a particular opinion is a person or
an organization that holds the opinion.
10Problem Definition
- To define a model of an object and a set of
opinions on the object. - An object is represented with a finite set of
features, - Each feature fi in F can be expressed with a
finite set of words or phrases Wi, which are
synonyms. - Each opinion holder j comments on a subset of the
features - For each feature that j comments on, he
chooses a word or phrase from Wk to describe the
feature, and then expresses a opinion on it.
11Problem Definition
- This model introduces three main practical
problems. - Problem 1 F and W are unknown. Then, in opinion
mining, we need to perform three tasks - Task 1 Identifying and extracting object
features that have been commented on in each
evaluative text - Task 2 Determining whether the opinions are
positive , negative or neutral. - Task 3 Grouping synonyms of features, as
different people may use different words or
phrases to express the same feature.
12Problem Definition
- Problem 2 F is known but W is unknown.
- Task 3 becomes the problem of matching discovered
features with the set of given features F. - Problem 3 W is known (then F is also known).
- Only need to perform Task 2.
- Example 6
- A cellular phone company wants to mine customer
reviews on a few models of its phones. There is
no need to perform Tasks 1 and 3.
13Problem Definition
- Output The final output is a set of pairs. Each
pair is denoted by (f, SO), where f is a feature
and SO is the semantic orientation. - To use the results, a simple way is to produce a
- feature-based summary of opinions on the object.
14Problem Definition
- Example 7 Assume we summarize the reviews of a
particular digital camera, digital_camera_1.
15Problem Definition
- The summary can also be visualized using a bar
chart.
16Problem Definition
17Problem Definition
- Four other important issues
- 1. Separation of Opinions on the Object itself
and its Features. - 2. Granularity of Analysis.
- At level 1identify opinions on the object itself
and its attributes. - At level 2 identify opinions on the major
components and also opinions on the attributes of
the components. - At other levels, similar tasks can be performed.
18Problem Definition
- Example 8
- I like this camera. Its picture quality is
amazing. However, the battery life is a little
short - 3. Opinion Holder Identification
- Opinion holders are more useful for news
articles, in which the person or organization
that expressed an opinion is usually stated in
the text explicitly. - The opinion holders are often the authors of
discussion posts, bloggers, or reviewers, whose
login ids are often known although their true
identities in the real-world may be unknown.
19Problem Definition
- 4. Opinioned Object Identification and Pronoun
Resolution. - I have a Canon S50 camera purchased from Amazon.
It takes great photos. - (1) what object does the post praise?
- (2) what it means in the second sentence?
- to automatically discover answers is a very
challenging problem.
2011.2.2 Object Feature Extraction
- There are three main review formats on the Web.
- Different review formats may need different
techniques to perform the feature extraction
task. - Format 1 - Pros, cons and the detailed review
21Object Feature Extraction
22Object Feature Extraction
23Object Feature Extraction
- In both formats 1 and 2, only product features
need to be identified. - For format 3, we need to identify both product
features and opinion orientations.
2411.2.3 Feature Extraction from Pros and Cons of
Format 1
- A product feature can be expressed with a noun,
adjective, verb or adverb. - The labels and their POS tags used in mining LSRs
are feature, NN, feature, JJ, feature,
VB and feature, RB - Each sentence segment in pros and cons contains
only one feature. Sentence segments are separated
by commas, periods, semi-colons, hyphens, s,
ands, buts, etc.
25Feature Extraction from Pros and Cons of Format 1
- We call a word that indicates an implicit feature
an implicit feature indicator.
26Feature Extraction from Pros and Cons of Format 1
- 1. Training data preparation for LSR mining
- Part-Of-Speech (POS) tagging and sequence
generation - Included memory is stingy -gt
- ltincluded, VBmemory, NNis, VBstingy, JJgt.
- Replace the actual feature words with feature,
lttaggt - ltincluded, VBfeature, NNis, VBstingy, JJgt
27Feature Extraction from Pros and Cons of Format 1
- Use an n-gram to produce shorter segments from
long ones - generate two trigram sequences
- ltincluded, VBfeature, NNis, VBgt
- ltfeature, NNis, VBstingy, JJgt
- Perform word stemming.
28Feature Extraction from Pros and Cons of Format 1
- 2. Label sequential rule mining
- A LSR mining system is applied to find rules.
- A suitable minimum confidence and minimum support
should be used. - lteasy, JJ to, VBgt
- ?lteasy, JJtofeature, VBgt(language
pattern)
29Feature Extraction from Pros and Cons of Format 1
- 3. Feature extraction
- the word in the sentence segment that matches
feature in a language pattern is extracted. - Three situations are considered in extraction
- If a sentence segment satisfies multiple rules,
we search for a matching rule in the following
order feature, NN, feature, JJ,
feature, VB and feature, RB.
30Feature Extraction from Pros and Cons of Format 1
- For sentence segments that no rules apply, nouns
or noun phrases produced by a POS tagger are
extracted as features - For a sentence segment with only a single word,
the single words are treated as features.
31Feature Extraction from Pros and Cons of Format 1
- Mapping to Implicit Features
- There are many types of implicit feature
indicators, Their exact meaning can be domain
dependent. - Grouping Synonyms
- It is common that people use different words or
phrases to describe the same feature. - Many synonyms are domain dependent.
- Granularity of Features
- In a practical application, we need to determine
the right level of analysis.
3211.2.4 Feature Extraction from Reviews of
Formats 2 and 3
- Complete sentences are more complex and contain a
large amount of irrelevant information. - An unsupervised method for finding explicit
features that are nouns and noun phrases. - This method requires a large number of reviews,
and consists of two steps - 1. most product features are nouns, and those
nouns that are frequently talked about are
usually genuine and important features.
33Feature Extraction from Reviews of Formats 2 and 3
- 2. Finding infrequent features by making use of
sentiment words. The same opinion word can be
used to describe different objects. - The precision of step 1 of the above algorithm
was improved by Popescu and Etzioni. - by computing a PMI score between the phrase and
meronymy discriminators associated with the
product class.
34Feature Extraction from Reviews of Formats 2 and 3
- where f is a candidate feature identified in step
1 and d is a discriminator. - Web search is used to find the number of hits.
- If the PMI value of a candidate feature is too
low, it may not be a component of the product
because f and d do not co-occur frequently.
3511.2.5 Opinion Orientation Classification
- 1. Using sentiment words and phrases
- Manually find a set of seed positive and negative
words. - Grow each of the seed set by iteratively
searching for their synonyms and antonyms in
WordNet until convergence. - Manually inspect the results to remove those
incorrect words.
36Opinion Orientation Classification
- using the final lists of positive and negative
words, phrases, idioms and patterns, each
sentence that contains product features can be
classified as follows - A positive word or phrase is assigned a score of
1 and a negative word or phrase is assigned a
score of -1. All the scores are then summed up. - If the final total is positive, then the sentence
is positive, otherwise it is negative.
37Opinion Orientation Classification
- If a negation word is near a sentiment word, the
opinion is reversed. - A sentence that contains a but clause
(subsentence that starts with but, however,
etc.) indicates a sentiment change for the
feature in the clause. - The opinion orientations of many words are domain
and/or sentence context dependent.