Sentiment and Opinion - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Sentiment and Opinion

Description:

... stupid, waste, boring, ?, !' gives 69% accuracy on 700 /700- movie reviews ... Movie review classification using Na ve Bayes, Maximum Entropy, SVM ... – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 58
Provided by: csC76
Category:

less

Transcript and Presenter's Notes

Title: Sentiment and Opinion


1
Sentiment and Opinion
  • Sept 4, 2007
  • Analysis of Social Media Seminar
  • William Cohen

2
Announcements
  • First few classes will be lectures cover some
    background
  • Some tools commonly used in analysis of social
    media
  • Some ideas that have been widely explored in
    social media
  • So, today tools for sentiment and opinion!
  • Next weeks class will give some background on
    graph analysis
  • The Web as a graph Measurements, models and
    methods Kleinberg et al, Invited survey at the
    International Conference on Combinatorics and
    Computing, 1999.
  • The PageRank citation ranking Bringing order to
    the Web, Page et al,1999.
  • Will start splitting time with students soon
  • Enrolled students expect to lead ½ a meeting
  • Inspire discussion!

3
Manual and Automatic Subjectivity and Sentiment
Analysis
Content cheerfully pilfered from this 250slide
tutorial EUROLAN SUMMER SCHOOL 2007, Semantics,
Opinion and Sentiment in Text, July 23-August 3,
University of Iasi, Romania http//www.cs.pitt.edu
/wiebe/tutorialsExtendedTalks.html
  • Jan Wiebe
  • Josef Ruppenhofer
  • Swapna Somasundaran
  • University of Pittsburgh

4
Some sentences expressing opinion or something
a lot like opinion
  • Wow, this is my 4th Olympus camera.
  • Most voters believe that he's not going to raise
    their taxes.
  • The United States fears a spill-over from the
    anti-terrorist campaign.
  • We foresaw electoral fraud but not daylight
    robbery, Tsvangirai said.

5
One motivation Opinion Question Answering
Q What is the international reaction to the
reelection of Robert Mugabe as President of
Zimbabwe?
A African observers generally approved of his
victory while Western Governments denounced it.
6
More motivations
  • Product review mining What features of the
    ThinkPad T43 do customers like and which do they
    dislike?
  • Review classification Is a review positive or
    negative toward the movie?
  • Tracking sentiments toward topics over time Is
    anger ratcheting up or cooling down?
  • Etc.

These are all ways to summarize one sort of
content that is common on blogs, bboards,
newsgroups, etc. W
7
First, some early influential papers on opinion
8
Turneys paper
  • Goal classify reviews as positive or
    negative.
  • Epinions not recommended as given by authors.
  • Method
  • Find (possibly) meaningful phrases from review
    (e.g., bright display, inspiring lecture, )
  • Estimate semantic orientation of each candidate
    phrase (based on POS patterns, like ADJ NOUN)
  • Assign overall orentation of review by averaging
    orentation of the phrases in the review

9
Semantic orientation (SO) of phrases

10
(No Transcript)
11
(No Transcript)
12
Williams picture of Jans picture of this paper
excellent,poor
Separate corpus
Distributional similarity
Seeds
Altavista
(appear in same contexts)
Review
13
Key ideas in Turney 2002
  • Simplification
  • classify an entire document, not a piece of it.
    (Many reviews are mixed.)
  • Focus on what seems important
  • Extract semantically oriented words/phrases from
    the document. (Phrases are less ambiguous than
    words eg Even poor students will learn a lot
    from this lecture).
  • Bootstrapping/semi-supervised learning
  • To assess orientation of phrases, use some kind
    of contextual similarity of phrases

14
Pang et al EMNLP 2002
15
Methods
  • Method one count human-provided polar words
    (sort of like Turney)
  • Eg, love, wonderful, best, great, superb, still,
    beautiful vs bad, worst, stupid, waste, boring,
    ?, ! gives 69 accuracy on 700/700- movie
    reviews
  • Method two plain ol text classification
  • Eg, Naïve Bayes bag of words 78.7 SVM-lite set
    of words 82.9 was best result
  • Followup work (ACL 2004) improves by
  • Classifying based on the most subjective
    sentences
  • Using discourse (proximity) to help predict
    subjectivity

16
Pang, Lee, Vaithyanathan EMNLP 2002
A different approach
  • Movie review classification using Naïve Bayes,
    Maximum Entropy, SVM
  • Results do not reach levels achieved in topic
    categorization
  • Various feature combinations (unigram, bigram,
    POS, text position)
  • Unigram presence works best
  • Challengediscourse structure

17
Manual and Automatic Subjectivity and Sentiment
Analysis
  • Jan Wiebe
  • Josef Ruppenhofer
  • Swapna Somasundaran
  • University of Pittsburgh

18
Everyone knows that dragons don't exist. But
while this simplistic formulation may satisfy the
layman, it does not suffice for the scientific
mind. The School of Higher Neantical Nillity is
in fact wholly unconcerned with what does exist.
Indeed, the banality of existence has been so
amply demonstrated, there is no need for us to
discuss it any further here. The brilliant
Cerebron, attacking the problem analytically,
discovered three distinct kinds of dragon the
mythical, the chimerical, and the purely
hypothetical. They were all, one might say,
nonexistent, but each nonexisted in an entirely
different way... - Stanislaw Lem, The Cyberiad
19
Preliminaries
  • What do we mean by subjectivity?
  • The linguistic expression of somebodys emotions,
    sentiments, evaluations, opinions, beliefs,
    speculations, etc.
  • Wow, this is my 4th Olympus camera.
  • Staley declared it to be one hell of a
    collection.
  • Most voters believe that he's not going to raise
    their taxes

20
Corpus AnnotationWiebe, Wilson, Cardie
2005Annotating Expressions of Opinions and
Emotions in Language
Leaving aside whats possible, what sort of
inferences about sentiment, opinion, etc would we
like to be able to make?
21
Overview
  • Fine-grained expression-level rather than
    sentence or document level
  • The photo quality was the best that I have seen
    in a camera.
  • The photo quality was the best that I have seen
    in a camera.
  • Annotate
  • expressions of opinions, evaluations, emotions
  • material attributed to a source, but presented
    objectively

22
Overview
  • Fine-grained expression-level rather than
    sentence or document level
  • The photo quality was the best that I have seen
    in a camera.
  • The photo quality was the best that I have seen
    in a camera.
  • Annotate
  • expressions of opinions, evaluations, emotions,
    beliefs
  • material attributed to a source, but presented
    objectively

23
Overview
  • Opinions, evaluations, emotions, speculations are
    private states.
  • They are expressed in language by subjective
    expressions.

Private state state that is not open to
objective observation or verification.
Quirk, Greenbaum, Leech, Svartvik (1985). A
Comprehensive Grammar of the English Language.
24
Overview
  • Focus on three ways private states are expressed
    in language
  • Direct subjective expressions
  • Expressive subjective elements
  • Objective speech events

25
Direct Subjective Expressions
  • Direct mentions of private states
  • The United States fears a spill-over from the
    anti-terrorist campaign.
  • Private states expressed in speech events
  • We foresaw electoral fraud but not daylight
    robbery, Tsvangirai said.

This implies a private state
26
Expressive Subjective Elements Banfield 1982
  • We foresaw electoral fraud but not daylight
    robbery, Tsvangirai said
  • The part of the US human rights report about
    China is full of absurdities and fabrications

We foresaw difficulties with the electoral
process but not to this extent, Tsvangirai
said. The part of the US human rights report
about China contains many statements that we were
unable to verify.
27
Objective Speech Events
  • Material attributed to a source, but presented as
    objective fact
  • The government, it added, has amended the
    Pakistan Citizenship Act 10 of 1951 to enable
    women of Pakistani descent to claim Pakistani
    nationality for their children born to foreign
    husbands.

What does this have to do with opinion? You need
it to sort out who has opinions about what -W
28
Nested Sources
(Writer)
29
Nested Sources
(Writer, Xirao-Nima)
30
Nested Sources
(Writer Xirao-Nima)
(Writer Xirao-Nima)
31
The report is full of absurdities, Xirao-Nima
said the next day.
Objective speech event anchor the entire
sentence source implicit true
Attributes The anchor is the linguistic
expressionthe stretch of textthat tells us that
there is a private state. Where to hang the
annotation -W The source is the person to whom
the private state is attributed. Note that this
can be a chain of people. The target is the
content of the private state or what the private
state is about. Attitude type If not specified,
it is to be understood as neutral but can be set
to positive or negative as required. Intensity
records the intensity of the private state as a
whole. What? W
Direct subjective anchor said source
intensity high
expression intensity neutral attitude type
negative target report
Expressive subjective element anchor full of
absurdities source
intensity high attitude type negative
32
Corpus
  • www.cs.pitt.edu/mqpa/databaserelease (version 2)
  • English language versions of articles from the
    world press (187 news sources)
  • Themes of the instructions
  • No rules about how particular words should be
    annotated.
  • Dont take expressions out of context and think
    about what they could mean, but judge them as
    they are used in that sentence.
  • Kappa around 0.7 0.8.

33
More reasons for fine-grain annotation and
analysis
  • Turney Pang et al document D is about a known
    product PD, sentiment refers to PD. Life is more
    complicated
  • The part of the US human rights report about
    China is full of absurdities and fabrications
  • What is absurd fabricated? The part, the US,
    the report, or China?
  • For sentiment about products we want to know what
    is good or bad there are usually tradeoffs
  • Huge screen ? very heavy
  • Very fast ? really expensive

34
And more
  • Sentiment in action!

35
Demos
1) opinmind.com searches for positive/negative
sentiments about search termsample
queriesiphonegoogle vs. microsoft
http//www.opinmind.com/search.jsp?qgooglevsmic
rosoft cmu vs. stanford
36
Demos
2) opine (Ana-Maria Popescu, Bao Nguyen, Oren
Etzioni)sentiment-feature labeling of hotel
reviewshttp//www.cs.washington.edu/research/kno
witall/opine/ new yorknew york, attributebed
37
Demos
3) OASYSsentiment analysis of news sources,
sliced by country and sourcehttp//oasys.umiacs.
umd.edu/oasysnew/oasys.php login/password
guest3(run w/allow pop-ups)does anaphora
resolutionsample queries Musharraf, Karzai,
Apple, Dell
38
Demos
4) TextMapentity sentiment over news
blogsnews http//www.textmap.com blog
http//www.textblg.com http//www.icwsm.org/paper
s/3--Godbole-Srinivasaiah-Skiena.pdf Daily
sentiment reportTop entitiesHeatmaps
39
Demos
5) Moodviewshttp//ilps.science.uva.nl/MoodViews
/ Moodteller - predict aggregate mood from
textMoodspotter - explain discrepancies between
predicted and actual aggregate mood
40
And more
  • Quick overview of some of Jans other slides

41
Dave, Lawrence, Pennock 2003Mining the Peanut
Gallery Opinion Extraction and Semantic
Classification of Product Reviews
  • Product-level review-classification
  • Train Naïve Bayes classifier using a corpus of
    self-tagged reviews available from major web
    sites (Cnet, amazon)
  • Refine the classifier using the same corpus
    before evaluating it on sentences mined from
    broad web searches

42
Hu Liu 2004Mining Opinion Features in Customer
Reviews
  • Here explicit product features only, expressed
    as nouns or compound nouns
  • Use association rule mining technique rather than
    symbolic or statistical approach to terminology
  • Extract associated items (item-sets) based on
    support (1)

43
Yi Niblack 2005Sentiment mining in WebFountain
44
Takamura et al. 2007Extracting Semantic
Orientations of Phrases from Dictionary
  • Use a Potts model to categorize AdjNoun phrases
  • Targets ambiguous adjectives like low, high,
    small, large
  • Connect two nouns, if one appears in gloss of
    other
  • Nodes have orientation values (pos, neg, neu) and
    are connected by same or different orientation
    links

45
Popescu Etzioni 2005
  • Report on a product review mining system that
    extracts and labels opinion expressions their
    attributes
  • They use the relaxation-labeling technique from
    computer vision to perform unsupervised
    classification satisfying local constraints
    (which they call neighborhood features)
  • The system tries to solve several classification
    problems (e.g. opinion and target finding) at the
    same time rather than separately.

46
Blog analysis
  • Analysis of sentiments on Blog posts
  • Chesley et al.(2006)
  • Perform subjectivity and polarity classification
    on blog posts
  • Sentiment has been used for blog analysis
  • Balog et al. (2006)
  • Discover irregularities in temporal mood patterns
    (fear, excitement, etc) appearing in a large
    corpus of blogs
  • Kale et al. (2007)
  • Use link polarity information to model trust and
    influence in the blogosphere
  • Blog sentiment has been used in applications
  • Mishne and Glance (2006)
  • Analyze Blog sentiments about movies and
    correlate it with its sales

47
Trends Buzz
  • Stock market
  • Koppel Shtrimberg(2004)
  • Correlate positive/negative news stories about
    publicly traded companies and the stock price
    changes
  • Market Intelligence from message boards, forums,
    blogs.
  • Glance et al. (2005)

48
Bethard et al. 2004Automatic Extraction of
Opinion Propositions and their Holders
  • Find verbs that express opinions in propositional
    form, and their holders
  • Still, Vista officials realize theyre relatively
    fortunate.
  • Modify algorithms developed in earlier work on
    semantic parsing to perform binary classification
    (opinion or not)
  • Use presence of subjectivity clues to identify
    opinionated uses of verbs

49
Choi et al.2005Identifying sources of opinions
with conditional random fields and extraction
patterns
  • Treats source finding as a combined sequential
    tagging and information extraction task
  • IE patterns are high precision, lower recall
  • Base CRF uses information about noun phrase
    semantics, morphology, syntax
  • IE patterns connect opinion words to sources
  • Conditional Random Fields given IE features
    perform better than CRFs alone

50
Kim Hovy 2006Extracting opinions, opinion
holders, andtopics expressed in online news
media text
  • Perform semantic role labeling (FrameNet) for a
    set of adjectives and verbs (pos, neg)
  • Map semantic roles to holder and target
  • E.g. for Desiring frame Experiencer-Holder
  • Train on FN data, test on FN data and on news
    sentences collected and annotated by authors
    associates
  • Precision is higher for topics, recall for
    holders

51
Choi, Breck, Cardie 2006 Joint extraction of
entities and relations for opinion reocgnition
  • Find direct expressions of opinions and their
    sources jointly
  • Uses sequence-tagging CRF classifiers for opinion
    expressions, sources, and potential link
    relations
  • Integer linear programming combines local
    knowledge and incorporates constraints
  • Performance better even on the individual tasks

52
2007 NLP papersNAACL
  • N07-1037 bib Hiroya Takamura Takashi Inui
    Manabu OkumuraExtracting Semantic Orientations
    of Phrases from Dictionary
  • N07-1038 bib Benjamin Snyder Regina
    BarzilayMultiple Aspect Ranking Using the Good
    Grief Algorithm
  • N07-1039 bib Kenneth Bloom Navendu Garg
    Shlomo ArgamonExtracting Appraisal Expressions

53
2007 NLP PapersACL 1
  • P07-1053 bib Anindya Ghose Panagiotis
    Ipeirotis Arun SundararajanOpinion Mining using
    Econometrics A Case Study on Reputation Systems
  • P07-1054 bib Andrea Esuli Fabrizio
    SebastianiPageRanking WordNet Synsets An
    Application to Opinion Mining
  • P07-1055 bib Ryan McDonald Kerry Hannan
    Tyler Neylon Mike Wells Jeff ReynarStructured
    Models for Fine-to-Coarse Sentiment Analysis
  • P07-1056 bib John Blitzer Mark Dredze
    Fernando PereiraBiographies, Bollywood,
    Boom-boxes and Blenders Domain Adaptation for
    Sentiment Classification

54
2007 NLP PapersACL 2
  • P07-1123 bib Rada Mihalcea Carmen Banea
    Janyce WiebeLearning Multilingual Subjective
    Language via Cross-Lingual Projections
  • P07-1124 bib Ann Devitt Khurshid
    AhmadSentiment Polarity Identification in
    Financial News A Cohesion-based Approach
  • P07-1125 bib Ben Medlock Ted BriscoeWeakly
    Supervised Learning for Hedge Classification in
    Scientific Literature

55
2007 NLP PapersEMNLP
  • D07-1113 bib Soo-Min Kim Eduard HovyCrystal
    Analyzing Predictive Opinions on the Web
  • D07-1114 bib Nozomi Kobayashi Kentaro Inui
    Yuji MatsumotoExtracting Aspect-Evaluation and
    Aspect-Of Relations in Opinion Mining
  • D07-1115 bib Nobuhiro Kaji Masaru
    KitsuregawaBuilding Lexicon for Sentiment
    Analysis from Massive Collection of HTML
    Documents

56
Bibliographies
  • Bibliography of papers in this tutorial
  • www.cs.pitt.edu/wiebe/eurolan07.bib
  • www.cs.pitt.edu/wiebe/eurolan07.html
  • Andrea Esulis extensive Sentiment
    Classification bibliography (not limited to
    sentiment or classification)
  • http//liinwww.ira.uka.de/bibliography/Misc/Sentim
    ent.html

57
Yahoo! Group
  • SentimentAI
  • http//tech.groups.yahoo.com/group/SentimentAI/
Write a Comment
User Comments (0)
About PowerShow.com