Recognising Emotional and Evaluative Language in Text - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Recognising Emotional and Evaluative Language in Text

Description:

Collated a corpus of movie reviews from an iMDB archive ... Accuracy of Emoticon-trained classifiers across the domains of news articles and movie reviews ... – PowerPoint PPT presentation

Number of Views:1760

Avg rating:3.0/5.0

Slides: 60

Provided by: jonath47

Category:

more less

Transcript and Presenter's Notes

Title: Recognising Emotional and Evaluative Language in Text

1
Recognising Emotional and Evaluative Language in
Text

Jonathon Read
j.l.read_at_sussex.ac.ukhttp//www.sussex.ac.uk/User
s/jlr24/

2
Jon Read

Brighton, England
DPhil Student at the University of Sussex,
supervised by Dr John Carroll
Research Interests
Evaluative and Emotional language
Sentiment Analysis

3
Presentation Outline

Recognising emotion in text
Examples of dependencies in machine learning
techniques for sentiment classification
Using diverse training data for sentiment
classification
Future work and directions for DPhil thesis

4
Recognising Emotion in Text

Masters Project, Summer 2004
How can we computationally recognise the
emotional (affective) states of authors from
their text?

5
Test Data Acquisition

Recently researchers have compiled collections of
blog posts, labelled by the authors (Mishne 2005)
Corpora annotated or labelled with emotion was
uncommon at the time
Built an original collection
Fifty-Word Fiction
155 texts, 756 sentences, 7750 words

6
A Two Factor Structure of Affect
arousedastonishedsurprised
active enthusiastic excited
happypleasedkindly
distressed hostile nervous
calmplacidrelaxed
sadlonelygrouchy
drowsy dull sleepy
quiescentquietstill
Watson and Tellegen (1985)
7
Affect Annotation Experiment
1 month, 49 coders, 3,301 annotations
8
Affect Annotation Experiment

Expert coder
Expert choice most frequently chosen class
Human coders assessed
Kappa Coefficient of Agreement (Carletta 1996)
Human coders annotations were ignored if K

9
Sentiment Annotations
10
Affect Annotations
11
Annotations Usefulness

Number of classes (s)
Number of annotations made (n)
Number of annotations made to the most annotated
class (a)

12
Sentiment Annotations
13
Affect Annotations
14
Affect Annotation Experiment

Supervised model impractical
Base a semi-supervised modelon SO-PMI-IR (Turney
2002)
Output represents a location on an axis
Paradigm words provide seeds

15
SO-PMI-IR

Semantic Orientation using Pointwise-Mutual
Information and Information Retrieval
Turney 2002
Accuracy of 74.4 in recognising the sentiment
(positive or negative) of product reviews in a
variety of domains

16
AO-PMI-IR

Evaluate SO-PMI-IR for each dimension in the
Two-Factor Structure of Affect

17
AO-PMI-IR
18
PMI-IR and Very-Large Corpora

Turney (2002) used the World-Wide-Web as to
obtain frequency counts, via AltaVista
More recently AltaVista no longer provide a NEAR
operator in their search engine
Waterloo MultiText System ( 1 terabyte)

19
Paradigm Word Selection

Mood words from theTwo-Factor Structure of
Affect
Obviously ambiguous words dropped
(active, content, dull, still, strong)
Remaining words used as starting points to derive
a list of synonyms using WordNet

20
Experiment Baselines

Baseline 1
Prior knowledge of distribution choosing the
most frequently occurring type, which is
unclassifiable
Baseline 2
Choosing a class at random

21
SO-PMI-IR Results
22
AO-PMI-IR Results
23
Misclassifications

Distribution of misclassifications inspected
SO-PMI-IR fairly uniform
AO-PMI-IR algorithm biased towards Low Positive
Affect
This class describes a lack of affect (e.g. being
asleep)
Few mismatches against opposite poles of the same
axis

24
Accuracy vs. Annotator Agreement
25
AO-PMI-IR Summary

An algorithm for the recognition of affect
(emotion) in text, using point-wise mutual
information
Limited success, but outperforms a naïve baseline
Can perhaps inform a more thorough approach to
recognising affect

26
Sentiment Classification

Determining an authors general feeling toward
their subject that is, is a unit of text
generally positive, or generally negative?
Filtering flames (Spertus 1997)
Recommender systems (Pang et al. 2002)
Analysis of market trends (Dave et al. 2002)

27
Supervised Approach

Pang et al. 2002
Collated a corpus of movie reviews from an iMDB
archive
Naïve Bayes, Maximum Entropy and Support Vector
Machine classifiers
Trained using unigram and bigram features
Best result from an SVM at around 83
Pang and Lee 2004
Disregarding objective sentences (Wiebe et al.
2004)
Improves to around 87

28
Supervised Approach

Engström 2004
A bag-of-words approach is topic dependent
Turney 2002
Movie reviewunpredictable plot ? positive
Automobile reviewunpredictable steering ?
negative

29
Dependencies inSentiment Classification

Experimental Set-up
Classification Tools
Naïve Bayes
SVMlight (Joachim 1999)
Feature Selection
Unigram presence (Pang et al. 2002)
Evaluation
3-fold cross validation
Significance determined using paired-sample
t-test
Each experiment involves training on one subset
and testing on the others

30
Dependencies inSentiment Classification

Cross training/testing by topic
Datasets from business news (Newswire)
Finance (FIN)
Mergers and Acquisitions (MA)
Mixed (MIX)

31
Dependencies in Sentiment Classification
32
Dependencies inSentiment Classification

Cross training/testing by domain
Datasets
Business news (Newswire)
Movie Reviews (Polarity 1.0) (Pang et al. 2002)

33
Dependencies inSentiment Classification
34
Dependencies inSentiment Classification

Cross training/testing by time-period
Datasets
Movie Reviews before 2002 (Polarity 1.0)
Movie Reviews after 2002 (Polarity 2004)
Available for download at http//www.sussex.ac.uk/
Users/jlr24/data

35
Dependencies inSentiment Classification
36
Dependencies inSentiment Classification

The performance of machine-learning techniques
for sentiment classification is dependent on a
good match between the training and test data,
with respect to-
Topic
Domain and
Time-period

37
Using Emoticons for Sentiment Classification

Dependency can perhaps be solved by acquiring a
large and diverse collection of general text
annotated for sentiment
Emoticons can perhaps be assumed to mark-up text
according to its sentiment, if we assume
-) is positive
-( is negative

38
Using Emoticons for Sentiment Classification

Usenet articles downloaded if they contained one
of these listed emoticons

39
Using Emoticons forSentiment Classification

Extracted a paragraph from an article if it
contained a smile or a frown, and was English
text
26,000 article extracts
50/50 split between positive and negative
748,685 words
Available for download at http//www.sussex.ac.uk/
Users/jlr24/data

40
Optimisation on Emoticons

Emoticon corpus optimised for sentiment
classification task
4,000 articles held-out
Increasing articles in training set from 2,000 to
22,000 in increments of 500
Increasing context from 10 to 1,000 tokens in
increments of 10
Window around an emoticon
Before an emoticon

41
Optimisation on Emoticons

Optimal parameters
Naïve Bayes
Training 22,000 articles
Context 130 tokens in a window
SVM
Training 20,000 articles
Context 150 tokens in a window

42
Initial Results

Predicting sentiment of article extracts(10-fold
cross-validation)
Naïve Bayes 61.5
SVM 70.1
Predicting sentiment of movie reviews
Naïve Bayes 59.1
SVM 52.1

43
Optimisation on Reviews

Optimisation repeated using held-out movie
reviews from Polarity 1.0
Naïve Bayes
Training 21,000 articles
Context 50 tokens in a window
SVM
Training 20,000 articles
Context 510 tokens before

44
Experiments and Results

Accuracy of Emoticon-trained classifiers across
business news topics

45
Experiments and Results

Accuracy of Emoticon-trained classifiers across
the domains of news articles and movie reviews

46
Experiments and Results

Accuracy of Emoticon-trained classifiers across
movie reviews from different time periods

47
Performance Summary

Good at predicting Usenet article extracts
Okay at predicting movie reviews
Bad at predicting newswire articles
Performance reasonably consistent over time
periods

48
Coverage ofEmoticons Classifier

Coverage of unique token types is low
More training texts may improve coverage
Other sources
Online bulletin boards
Chat forums
Web logs
Google Groups
Usenet

49
Noise in EmoticonsTraining Data
Optimising the SVM Classifier against Movie
Reviews
50
Noise in EmoticonsTraining Data

Mixed SentimentSorry about venting my
frustration here but I just lost it. -( Happy
thanks giving everybody -)
SarcasmThank you so much, thats really
encouraging -(
Spelling mistakesThe movies where for me a
major desapointment -(

51
Future work and directions

Collect more examples of text marked-up with
emoticons
Experiment with techniques to automatically
remove noisy examples from the data
Investigate the nature of dependency in sentiment
classification

52
Future work and directions

What is the nature of these dependencies?
It seems classifiers may be learning authors
sentiment toward concepts, rather than the
language associated with communication emotion
and evaluation
Classifiers are not learning authors sentiment
toward named entities
Perhaps classifiers learn the words associated
with the sentiment of named entities the
ice-axe effect?

53
Future work and directions

Refined dependency experiments
Tag movie reviews for precise year and perform
cross training/testing biased on year
Remove named entities from training data
Remove all-but-one review from each author
Remove all-but-one review of a given movie
If accuracy is reduced this can be taken as
evidence of dependencies

54
Future work and directions

Feature Engineering for Machine Learning
OddsRatio can be employed to mark features with
temporal senses Liebscher and Belew (2005)
Richly-engineered features based on linguistic
theory
Automatic feature induction, maximising
performance whilst minimising dependency

SHWARZENEGGER
SPORTS MOVIES POLITICS
TIME
55
Future work and directions

Improving the automatic acquisition of sentiment
lexicons
SO-PMI-IR
An independent measure, but performs with varying
success in different domains (Turney 2002)
Identify the topic of a problem text, and
supplement the paradigm words with these
keywords?
Distributional Similarity
Distributional similarity (see a survey by Weeds
(2003)) has been shown to be a reasonable
approximation of semantic similarity (Curran and
Moens 2002)

56
Future work and directions

Finding a metric
SO-PMI-IR and Distributional Similarity do not
describe metric spaces
A true metric may increase performance

negative neutral positive
worst ? worse ? okay ? better ? best
57
Appraisal Theory

An approach to exploring, describing and
explaining the way language is used to evaluate,
to adopt stances, to construct textual personas
and to manage interpersonal positionings and
relationships.
http//www.grammatics.com/appraisal/
J. R. Martin and P.R.R. White. 2005. Language
of Evaluation Appraisal in English.

58
Appraisal Theory
59
Thank-you!