Document-level Semantic Orientation and Argumentation - PowerPoint PPT Presentation

About This Presentation

Title:

Document-level Semantic Orientation and Argumentation

Description:

Unsupervised learning algorithm for classifying reviews as recommended or not recommended ... Movie-review domain. Source: Internet Movie Database (IMDb) ... – PowerPoint PPT presentation

Number of Views:216

Avg rating:3.0/5.0

Slides: 47

Provided by: sysa151

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Document-level Semantic Orientation and Argumentation

1
Document-level Semantic Orientation and
Argumentation

Presented by Marta Tatu
CS7301
March 15, 2005

2
? or ?? Semantic Orientation Applied to
Unsupervised Classification of Reviews

Peter D. Turney
ACL-2002

3
Overview

Unsupervised learning algorithm for classifying
reviews as recommended or not recommended
The classification is based on the semantic
orientation of the phrases in the review which
contain adjectives and adverbs

4
Algorithm

Input review
Identify phrases that contain adjectives or
adverbs by using a part-of-speech tagger
Estimate the semantic orientation of each phrase
Assign a class to the given review based on the
average semantic orientation of its phrases
Output classification (? or ?)

5
Step 1

Apply Brills part-of-speech tagger on the review
Adjective are good indicators of subjective
sentences. In isolation
unpredictable steering (?) / plot (?)
Extract two consecutive words one is an
adjective or adverb, the other provides the
context

First Word Second Word Third Word (not extracted)
1. JJ NN or NNS Anything
2. RB, RBR, or RBS JJ Not NN nor NNS
3. JJ JJ Not NN nor NNS
4. NN or NNS JJ Not NN nor NNS
5. RB, RBR, or RBS VB, VBD, VBN, or VBG Anything
6
Step 2

Estimate the semantic orientation of the
extracted phrases using PMI-IR (Turney, 2001)
Pointwise Mutual Information (Church and Hanks,
1989)
Semantic Orientation
PMI-IR estimates PMI by issuing queries to a
search engine (Altavista, 350 million pages)

7
Step 2 continued

Added 0.01 to hits to avoid division by zero
If hits(phrase NEAR excellent) and hits(phrase
NEAR poor)4, then eliminate phrase
Added AND (NOT hostepinions) to the queries
not to include the Epinions website

8
Step 3

Calculate the average semantic orientation of the
phrases in the given review
If the average is positive, then ?
If the average is negative, then ?

Phrase POS tags SO
direct deposit JJ NN 1.288
local branch JJ NN 0.421
small part JJ NN 0.053
online service JJ NN 2.780
well other RB JJ 0.237
low fees JJ NNS 0.333

true service JJ NN -0.732
other bank JJ NN -0.850
inconveniently located RB VBN -1.541
Average Semantic Orientation Average Semantic Orientation 0.322
9
Experiments

410 reviews from Epinions
170 (41) (?)
240 (59) (?)
Average phrases per review 26
Baseline accuracy 59

Domain Accuracy Correlation
Automobiles 84.00 0.4618
Banks 80.00 0.6167
Movies 65.83 0.3608
Travel Destinations 70.53 0.4155
All 74.39 0.5174
10
Discussion

What makes the movies hard to classify?
The average SO tends to classify a recommended
movies as not recommended
Evil characters make good movies
The whole is not necessarily the sum of the parts
Good beaches do not necessarily add up to a good
vacation
But good automobile parts usually add up to a
good automobile

11
Applications

Summary statistics for search engines
Summarization of reviews
Pick out the sentence with the highest
positive/negative semantic orientation given a
positive/negative review
Filtering flames for newsgroups
When the semantic orientation drops below a
threshold, the message might be a potential flame

12
Questions ?

Comments ?
Observations ?

13
?? Sentiment Classification using Machine
Learning Techniques

Bo Pang, Lillian Lee and Shivakumar Vaithyanathan
EMNLP-2002

14
Overview

Consider the problem of classifying documents by
overall sentiment
Three machine learning methods besides the
human-generated lists of words
Naïve Bayes
Maximum Entropy
Support Vector Machines

15
Experimental Data

Movie-review domain
Source Internet Movie Database (IMDb)
Stars or numerical value ratings converted into
positive, negative, or neutral no need to hand
label the data for training or testing
Maximum of 20 reviews/author/sentiment category
752 negative reviews
1301 positive reviews
144 reviewers

16
List of Words Baseline

Maybe there are certain words that people tend to
use to express strong sentiments
Classification done by counting the number of
positive and negative words in the document
Random-choice baseline 50

17
Machine Learning Methods

Bag-of-features framework
f1,,fm predefined set of m features
ni(d) number of times fi occurs in document d
(Naïve
Bayes)

18
Machine Learning Methods continued

(Maximum Entropy)
where Fi,c is a feature/class function
Support vector machines Find hyperplane that
maximizes the margin. The constraint optimization
problem
cj is the correct class of document dj

19
Evaluation

700 positive-sentiment and 700 negative-sentiment
documents
3 equal-sized folds
The tag NOT_ was added to every word between a
negation word (not, isnt, didnt) and the
first punctuation mark
good is opposite to not very good
Features
16,165 unigrams appearing at least 4 times in the
1400-document corpus
16,165 most often occurring bigrams in the same
data

20
Results

POS information added to differentiate between
I love this movie and This is a love story

21
Conclusion

Results produced by the machine learning
techniques are better than the human-generated
baselines
SVMs tend to do the best
Unigram presence information is the most
effective
Frequency vs. presence thwarted expectation,
many words indicative of the opposite sentiment
to that of the entire review
Some form of discourse analysis is necessary

22
Questions ?

Comments ?
Observations ?

23
Summarizing Scientific Articles Experiments with
Relevance and Rhetorical Status

Simone Teufel and Marc Moens
CL-2002

24
Overview

Summarization of scientific articles restore the
discourse context of extracted material by adding
the rhetorical status of each sentence in the
document
Gold standard data for summaries consisting of
computational linguistics articles annotated with
the rhetorical status and relevance for each
sentence
Supervised learning algorithm which classifies
sentences into 7 rhetorical categories

25
Why?

Knowledge about the rhetorical status of the
sentence enables the tailoring of the summaries
according to users expertise and task
Nonexpert summary background information and the
general purpose of the paper
Expert summary no background, instead
differences between this approach and similar
ones
Contrasts or complementarity among articles can
be expressed

26
Rhetorical Status

Generalizations about the nature of scientific
texts information to enable the construction of
better summaries
Problem structure problems (research goals),
solutions (methods), and results
Intellectual attribution what the new
contribution is, as opposed to previous work and
background (generally accepted statements)
Scientific argumentation
Attitude toward other peoples work rival
approach, prior approach with a fault, or an
approach contributing parts of the authors own
solution

27
Metadiscourse and Agentivity

Metadiscourse is an aspect of scientific
argumentation and a way of expressing attitude
toward previous work
we argue that, in contrast to common belief,
we
Agent roles in argumentation rivals,
contributors of part of the solution (they), the
entire research community, or the authors of the
paper (we)

28
Citations and Relatedness

Just knowing that an article cites another is
often not enough
One needs to read the context of the citation to
understand the relation between the articles
Article cited negatively or contrastively
Article cited positively or in which the authors
state that their own work originates from the
cited work

29
Rhetorical Annotation Scheme

Only one category assigned to each full sentence
Nonoverlapping, nonhierarchical scheme
The rhetorical status is determined on the basis
of the global context of the paper

30
Relevance

Select important content from text
Highly subjective low human agreement
Sentence is considered relevant if it describes
the research goal or states a difference with a
rival approach
Other definitions relevant sentence if it shows
a high level of similarity with a sentence in the
abstract

31
Corpus

80 conference articles
Association for Computational Linguistics (ACL)
European Chapter of the Association for
Computational Linguistics (EACL)
Applied Natural Language Processing (ANLP)
International Joint Conference on Artificial
Intelligence (IJCAI)
International Conference on Computational
Linguistics (COLING).
XML markups added

32
The Gold Standard

3 tasked-trained annotators
17 pages of guidelines
20 hours of training
No communication between annotators
Evaluation measures of the annotation
Stability
Reproducibility

33
Results of Annotation

Kappa coefficient K (Siegel and Castellan, 1988)
where P(A) pairwise agreement and P(E) random
agreement
Stability K.82, .81, .76 (N1,220 and k2)
Reproducibility K.71

34
The System

Supervised machine learning Naïve Bayes

35
Features

Absolute location of a sentence
Limitations of the authors own method can be
expected to be found toward the end, while
limitations of other researchers work are
discussed in the introduction

36
Features continued

Section structure relative and absolute position
of sentence within section
First, last, second or third, second-last or
third-last, or either somewhere in the first,
second, or last third of the section
Paragraph structure relative position of
sentence within a paragraph
Initial, medial, or final

37
Features continued

Headlines type of headline of current section
Introduction, Implementation, Example,
Conclusion, Result, Evaluation, Solution,
Experiment, Discussion, Method, Problems, Related
Work, Data, Further Work, Problem Statement, or
Non-Prototypical
Sentence length
Longer or shorter than 12 words (threshold)

38
Features continued

Title word contents does the sentence contain
words also occurring in the title?
TFIDF word contents
High values to words that occur frequently in one
document, but rarely in the overall collection of
documents
Do the 18 highest-scoring TFIDF words belong to
the sentence?
Verb syntax voice, tense, and modal linguistic
features

39
Features continued

Citation
Citation (self), citation (other), author name,
or none location of the citation in the
sentence (beginning, middle, or end)
History most probable previous category
AIM tends to follow CONTRAST
Calculated as a second pass process during
training

40
Features continued

Formulaic expressions list of phrases described
by regular expressions, divided into 18 classes,
comprising a total of 644 patterns
Clustering prevents data sparseness

41
Features continued

Agent 13 types, 167 patterns
The placeholder WORK_NOUN can be replaced by a
set of 37 nouns including theory, method,
prototype, algorithm
Agent classes with a distribution very similar
with the overall distribution of target
categories were excluded

42
Features continued

Action 365 verbs clustered into 20 classes based
on semantic concepts such as similarity, contrast
PRESENTATION_ACTIONs present, report, state
RESEARCH_ACTIONs analyze, conduct, define, and
observe
Negation is considered

43
System Evaluation