Title: Recognising Emotional and Evaluative Language in Text
1Recognising Emotional and Evaluative Language in
Text
- Jonathon Read
- j.l.read_at_sussex.ac.ukhttp//www.sussex.ac.uk/User
s/jlr24/
2Jon Read
- Brighton, England
- DPhil Student at the University of Sussex,
supervised by Dr John Carroll - Research Interests
- Evaluative and Emotional language
- Sentiment Analysis
3Presentation Outline
- Recognising emotion in text
- Examples of dependencies in machine learning
techniques for sentiment classification - Using diverse training data for sentiment
classification - Future work and directions for DPhil thesis
4Recognising Emotion in Text
- Masters Project, Summer 2004
- How can we computationally recognise the
emotional (affective) states of authors from
their text?
5Test Data Acquisition
- Recently researchers have compiled collections of
blog posts, labelled by the authors (Mishne 2005) - Corpora annotated or labelled with emotion was
uncommon at the time - Built an original collection
- Fifty-Word Fiction
- 155 texts, 756 sentences, 7750 words
6A Two Factor Structure of Affect
arousedastonishedsurprised
active enthusiastic excited
happypleasedkindly
distressed hostile nervous
calmplacidrelaxed
sadlonelygrouchy
drowsy dull sleepy
quiescentquietstill
Watson and Tellegen (1985)
7Affect Annotation Experiment
1 month, 49 coders, 3,301 annotations
8Affect Annotation Experiment
- Expert coder
- Expert choice most frequently chosen class
- Human coders assessed
- Kappa Coefficient of Agreement (Carletta 1996)
- Human coders annotations were ignored if K
9Sentiment Annotations
10Affect Annotations
11Annotations Usefulness
- Number of classes (s)
- Number of annotations made (n)
- Number of annotations made to the most annotated
class (a)
12Sentiment Annotations
13Affect Annotations
14Affect Annotation Experiment
- Supervised model impractical
- Base a semi-supervised modelon SO-PMI-IR (Turney
2002) - Output represents a location on an axis
- Paradigm words provide seeds
15SO-PMI-IR
- Semantic Orientation using Pointwise-Mutual
Information and Information Retrieval - Turney 2002
- Accuracy of 74.4 in recognising the sentiment
(positive or negative) of product reviews in a
variety of domains
16AO-PMI-IR
- Evaluate SO-PMI-IR for each dimension in the
Two-Factor Structure of Affect
17AO-PMI-IR
18PMI-IR and Very-Large Corpora
- Turney (2002) used the World-Wide-Web as to
obtain frequency counts, via AltaVista - More recently AltaVista no longer provide a NEAR
operator in their search engine - Waterloo MultiText System ( 1 terabyte)
19Paradigm Word Selection
- Mood words from theTwo-Factor Structure of
Affect - Obviously ambiguous words dropped
- (active, content, dull, still, strong)
- Remaining words used as starting points to derive
a list of synonyms using WordNet
20Experiment Baselines
- Baseline 1
- Prior knowledge of distribution choosing the
most frequently occurring type, which is
unclassifiable - Baseline 2
- Choosing a class at random
21SO-PMI-IR Results
22AO-PMI-IR Results
23Misclassifications
- Distribution of misclassifications inspected
- SO-PMI-IR fairly uniform
- AO-PMI-IR algorithm biased towards Low Positive
Affect - This class describes a lack of affect (e.g. being
asleep) - Few mismatches against opposite poles of the same
axis
24Accuracy vs. Annotator Agreement
25AO-PMI-IR Summary
- An algorithm for the recognition of affect
(emotion) in text, using point-wise mutual
information - Limited success, but outperforms a naïve baseline
- Can perhaps inform a more thorough approach to
recognising affect
26Sentiment Classification
- Determining an authors general feeling toward
their subject that is, is a unit of text
generally positive, or generally negative? - Filtering flames (Spertus 1997)
- Recommender systems (Pang et al. 2002)
- Analysis of market trends (Dave et al. 2002)
27Supervised Approach
- Pang et al. 2002
- Collated a corpus of movie reviews from an iMDB
archive - Naïve Bayes, Maximum Entropy and Support Vector
Machine classifiers - Trained using unigram and bigram features
- Best result from an SVM at around 83
- Pang and Lee 2004
- Disregarding objective sentences (Wiebe et al.
2004) - Improves to around 87
28Supervised Approach
- Engström 2004
- A bag-of-words approach is topic dependent
- Turney 2002
- Movie reviewunpredictable plot ? positive
- Automobile reviewunpredictable steering ?
negative
29Dependencies inSentiment Classification
- Experimental Set-up
- Classification Tools
- Naïve Bayes
- SVMlight (Joachim 1999)
- Feature Selection
- Unigram presence (Pang et al. 2002)
- Evaluation
- 3-fold cross validation
- Significance determined using paired-sample
t-test - Each experiment involves training on one subset
and testing on the others
30Dependencies inSentiment Classification
- Cross training/testing by topic
- Datasets from business news (Newswire)
- Finance (FIN)
- Mergers and Acquisitions (MA)
- Mixed (MIX)
31Dependencies in Sentiment Classification
32Dependencies inSentiment Classification
- Cross training/testing by domain
- Datasets
- Business news (Newswire)
- Movie Reviews (Polarity 1.0) (Pang et al. 2002)
33Dependencies inSentiment Classification
34Dependencies inSentiment Classification
- Cross training/testing by time-period
- Datasets
- Movie Reviews before 2002 (Polarity 1.0)
- Movie Reviews after 2002 (Polarity 2004)
- Available for download at http//www.sussex.ac.uk/
Users/jlr24/data
35Dependencies inSentiment Classification
36Dependencies inSentiment Classification
- The performance of machine-learning techniques
for sentiment classification is dependent on a
good match between the training and test data,
with respect to- - Topic
- Domain and
- Time-period
37Using Emoticons for Sentiment Classification
- Dependency can perhaps be solved by acquiring a
large and diverse collection of general text
annotated for sentiment - Emoticons can perhaps be assumed to mark-up text
according to its sentiment, if we assume - -) is positive
- -( is negative
38Using Emoticons for Sentiment Classification
- Usenet articles downloaded if they contained one
of these listed emoticons
39Using Emoticons forSentiment Classification
- Extracted a paragraph from an article if it
contained a smile or a frown, and was English
text - 26,000 article extracts
- 50/50 split between positive and negative
- 748,685 words
- Available for download at http//www.sussex.ac.uk/
Users/jlr24/data
40Optimisation on Emoticons
- Emoticon corpus optimised for sentiment
classification task - 4,000 articles held-out
- Increasing articles in training set from 2,000 to
22,000 in increments of 500 - Increasing context from 10 to 1,000 tokens in
increments of 10 - Window around an emoticon
- Before an emoticon
41Optimisation on Emoticons
- Optimal parameters
- Naïve Bayes
- Training 22,000 articles
- Context 130 tokens in a window
- SVM
- Training 20,000 articles
- Context 150 tokens in a window
42Initial Results
- Predicting sentiment of article extracts(10-fold
cross-validation) - Naïve Bayes 61.5
- SVM 70.1
- Predicting sentiment of movie reviews
- Naïve Bayes 59.1
- SVM 52.1
43Optimisation on Reviews
- Optimisation repeated using held-out movie
reviews from Polarity 1.0 - Naïve Bayes
- Training 21,000 articles
- Context 50 tokens in a window
- SVM
- Training 20,000 articles
- Context 510 tokens before
44Experiments and Results
- Accuracy of Emoticon-trained classifiers across
business news topics
45Experiments and Results
- Accuracy of Emoticon-trained classifiers across
the domains of news articles and movie reviews
46Experiments and Results
- Accuracy of Emoticon-trained classifiers across
movie reviews from different time periods
47Performance Summary
- Good at predicting Usenet article extracts
- Okay at predicting movie reviews
- Bad at predicting newswire articles
- Performance reasonably consistent over time
periods
48Coverage ofEmoticons Classifier
- Coverage of unique token types is low
- More training texts may improve coverage
- Other sources
- Online bulletin boards
- Chat forums
- Web logs
- Google Groups
- Usenet
49Noise in EmoticonsTraining Data
Optimising the SVM Classifier against Movie
Reviews
50Noise in EmoticonsTraining Data
- Mixed SentimentSorry about venting my
frustration here but I just lost it. -( Happy
thanks giving everybody -) - SarcasmThank you so much, thats really
encouraging -( - Spelling mistakesThe movies where for me a
major desapointment -(
51Future work and directions
- Collect more examples of text marked-up with
emoticons - Experiment with techniques to automatically
remove noisy examples from the data - Investigate the nature of dependency in sentiment
classification
52Future work and directions
- What is the nature of these dependencies?
- It seems classifiers may be learning authors
sentiment toward concepts, rather than the
language associated with communication emotion
and evaluation - Classifiers are not learning authors sentiment
toward named entities - Perhaps classifiers learn the words associated
with the sentiment of named entities the
ice-axe effect?
53Future work and directions
- Refined dependency experiments
- Tag movie reviews for precise year and perform
cross training/testing biased on year - Remove named entities from training data
- Remove all-but-one review from each author
- Remove all-but-one review of a given movie
- If accuracy is reduced this can be taken as
evidence of dependencies
54Future work and directions
- Feature Engineering for Machine Learning
- OddsRatio can be employed to mark features with
temporal senses Liebscher and Belew (2005) - Richly-engineered features based on linguistic
theory - Automatic feature induction, maximising
performance whilst minimising dependency
SHWARZENEGGER
SPORTS MOVIES POLITICS
TIME
55Future work and directions
- Improving the automatic acquisition of sentiment
lexicons - SO-PMI-IR
- An independent measure, but performs with varying
success in different domains (Turney 2002) - Identify the topic of a problem text, and
supplement the paradigm words with these
keywords? - Distributional Similarity
- Distributional similarity (see a survey by Weeds
(2003)) has been shown to be a reasonable
approximation of semantic similarity (Curran and
Moens 2002)
56Future work and directions
- Finding a metric
- SO-PMI-IR and Distributional Similarity do not
describe metric spaces - A true metric may increase performance
negative neutral positive
worst ? worse ? okay ? better ? best
57Appraisal Theory
- An approach to exploring, describing and
explaining the way language is used to evaluate,
to adopt stances, to construct textual personas
and to manage interpersonal positionings and
relationships. - http//www.grammatics.com/appraisal/
- J. R. Martin and P.R.R. White. 2005. Language
of Evaluation Appraisal in English.
58Appraisal Theory
59Thank-you!
- Email j.l.read_at_sussex.ac.uk
- Homepage www.sussex.ac.uk/Users/jlr24