Title: Ann Devitt
1Wednesday 18 February 2009
- The Languages of Emotion and Financial News
- Ann Devitt
- Khurshid Ahmad
2Sentiment and the Markets
3Sentiment and the Markets
4Specialised Language of Financial News
Global chipmakers, battling slower technology
demand, are betting size matters as they pin
their hopes for future growth on small and easy
to carry mobile devices such as netbooks and
smartphones.
Bloomberg.com, 18/2/09
5Specialised Language of Financial News
Global chipmakers, battling slower technology
demand, are betting size matters as they pin
their hopes for future growth on small and easy
to carry mobile devices such as netbooks and
smartphones.
Bloomberg.com, 18/2/09
6Specialised Language of Financial News
Global chipmakers, battling slower technology
demand, are betting size matters as they pin
their hopes for future growth on small and easy
to carry mobile devices such as netbooks and
smartphones.
Bloomberg.com, 18/2/09
7Specialised Language of Financial News
Global chipmakers, battling slower technology
demand, are betting size matters as they pin
their hopes for future growth on small and easy
to carry mobile devices such as netbooks and
smartphones.
Bloomberg.com, 18/2/09
8Sentiment and the Markets
9Sentiment and the Markets
10Engle Ng (1993) Asymmetry Curve
11Outline
- Current psychological theory of emotion
- Evaluation of lexical emotion resources
- Corpus analysis of language of emotion
12Outline
- Current psychological theory of emotion
- Evaluation of lexical emotion resources
- Corpus analysis of language of emotion
13Cognitive Theory of EmotionCategorical
Ekman (1975)
14Cognitive Theory of EmotionDimensions
- Osgood / Russell
- Evaluation
- Activity
- Potency
- Mehabrian PAD
- Pleasure
- Activation
- Dominance
15Cognitive Theory of Emotion
Watson and Tellegen (1985)
16Outline
- Current psychological theory of emotion
- Evaluation of lexical emotion resources
- Corpus analysis of language of emotion
17Lexical Resource Evaluation
SentiWordNet
Whissel
General Inquirer
WNA
18Lexical Resource Evaluation Senti WordNet
SentiWordNet
Whissel
General Inquirer
WNA
19Lexical Resource Evaluation Senti WordNet
- Word PositiveVal NegativeVal
- Happy 0.9 0.0
- Sad 0.0 0.9
- 39066 terms
- Evaluation dimension scale 0 - 1
- Low average Pos0.18, Neg0.23
- More extreme Neg values
- Error-prone rude (pos 0.875), gladsome (neg
0.875)
20Lexical Resource Evaluation General Inquirer
SentiWordNet
Whissel
General Inquirer
WNA
21Lexical Resource Evaluation General Inquirer
- ECSTATIC Pos Pleasure
- SORROWFUL Neg Pain
- Hand-coded, content analysis basis
- 8641 terms
- 184 binary categories (including MAB dimensions)
- Negative gt Positive
- Active gt Passive
- Strong gt Weak
22Lexical Resource Evaluation Whissel Dictionary
of Affect
SentiWordNet
Whissel
General Inquirer
WNA
23Lexical Resource Evaluation Whissel Dictionary
of Affect
- Word Eval Activ Imag
- great 2.6250 2.1250 1.0
- disastrous 1.4444 2.4000 2.0
- Corpus selection, hand-coded
- 8742 terms
- Dimensional representation 1-3 scale
- Evaluation, Activation, Imagery
24Lexical Resource Evaluation WordNet Affect
SentiWordNet
Whissel
General Inquirer
WNA
25Lexical Resource EvaluationWordNet Affect
- Word BinaryFeatures
- Loneliness cognitive state, emotion
- Happiness cognitive state, emotion
- 5432 terms
- Domains of emotional experience
- No Polarity
- Short-term Mood, Manner
- Long-term Attribute, Trait
26Lexical Resource EvaluationLexical Overlap
- Are the lexica consistent?
- Are they mutually exclusive?
- Dice, Jaccard, Asymmetric coefficients
27Lexical Resource Evaluation Lexical Overlap
SentiWordNet
Whissel
General Inquirer
WNA
28Lexical Resource Evaluation Lexical Overlap
SentiWordNet
- Statistically significant agreement for Polarity
Assignment (Chi square test) - Very weak correlation for activation features.
General Inquirer
Whissel
WNA
29Lexical Resource Evaluation Lexical Overlap
- Weak correlation of SWN with Whissel evaluation
- 2. No correlation with Whissel activation
dimension - 3. SWN positive negatively correlated with
imageability
SentiWordNet
Whissel
General Inquirer
WNA
30Lexical Resource Evaluation Lexical Overlap
- SWN tends to negative for short term WNA features
- SWN tends to positive for long-term WNA features
SentiWordNet
WNA
Whissel
General Inquirer
31Lexical Resource Evaluation Lexical Overlap
SentiWordNet
Whissel
General Inquirer
WNA
32Lexical Resource Evaluation Lexical Overlap
- WNA feature division
- Short-term Long-term
- Negative Positive
- Physical Cognitive
- More active Less active
- Internal External
- Less abstract More concrete
33Lexical Resource EvaluationSome conclusions
- The lexica
- Are quite consistent
- Can be used in combination
- SentiWN Largely unexplored territory
34Outline
- Current psychological theory of emotion
- Evaluation of lexical emotion resources
- Corpus analysis of language of emotion
- General Language
35Emotion in General LanguageCorpus Study Aims
- Does emotion constitute a distinct
sub-language? - Is there a polarity bias in General Language?
(the Polyanna Hypothesis of Boucher and Osgood) - What is the impact of using different lexica?
36Corpus AnalysisThe Data
- BNC
- 100 million words
- Balanced, broad corpus
37Corpus AnalysisMethodology
- Is emotion a distinct sub-language?
- Examine distribution type
- Examine distribution spread
- Bootstrap sampling distribution
38Corpus AnalysisDistribution Type
- Zipfian BNCEmotion Lexica
39Corpus AnalysisDistribution shape
- Comparison of means student t-test
- BNC ? Emotion Lexica (plt0.000)
- Different sample means
- 5-30 times more frequent than gen. language
- Assumptions of test?
40Corpus AnalysisBootstrap Sampling Distribution
- Are sentiment-bearing terms a statistically
distinct and highly frequent subset of English? - 1000 random samples of terms from BNC
- Sample size size of sentiment lexicon
- H0 Observed sample falls inside within 95 of
bootstrap random sampling distribution of means
41Corpus AnalysisBootstrap Sampling Distribution
- Are sentiment-bearing terms a statistically
distinct and highly frequent subset of English? - For all lexica
- Mean term frequency of lexicon well outside 95
- Sentiment lexica are not representative of BNC
(plt0.05)
42Corpus AnalysisSentiment Features
- Is there a polarity bias in General Language?
- Positive polarity bias
- Statistically significant for all lexica (?2
test of independence)
43Corpus AnalysisSentiment Features
- Is there a polarity bias in General Language when
you include intensity of polarity? - Positive polarity bias
- Statistically significant for all lexica
- ?2 158.5, df1, plt0.0001 for General Inquirer
- ?2 63.6, df1, plt0.0001 for Whissel
44Corpus AnalysisSome conclusions
- Sentiment-bearing terms are a distinct subset of
English - Positive polarity bias in BNC
- General Inquirer and Whissel
- Low coverage and high frequency
- SentiwordNet
- Wide coverage and much lower frequency
45Outline
- Current psychological theory of emotion
- Evaluation of lexical emotion resources
- Corpus analysis of language of emotion
- Comparative
46Comparative Corpus AnalysisAims
- Examine affective term use
- Identify statistically different distributions
- Is there a dominant feature/polarity?
47Comparative Corpus AnalysisThe Data
- Financial Language
- 2 million words
- On-line financial news
- Reuters, CNN, Bloomberg
- Newspapers
- General Language
- BNC
- 100 million words
- Balanced, broad corpus
48Comparative Corpus Analysis The Data
- BNC sub-corpora
- Imaginative written English
- 16 million words
- Informative written English
- 70 million words
49Comparative Corpus AnalysisMethodology
- Compare proportions of Sentiment Features
- ?2 Test of Independence
- H0 p FinCorpus p BNC
50Comparative Corpus Analysis Methodology
- Statistical significance of different proportion
- ?2 gt 7.8794
- p gt 0.005
- Features
- 41 Lexicon Sentiment Features from 4 lexica
- Frequency per million words
51Comparative Corpus AnalysisFinancial Corpus
- WRT Imaginative More affective terms
- WRT Informative Many more affective terms
- WRT BNC
- Dependent on feature type
- Distributions are statistically distinct
52Comparative Corpus AnalysisPositive GI Features
53Comparative Corpus AnalysisPositive GI Features
54Comparative Corpus AnalysisNegative GI Features
55Comparative Corpus AnalysisNegative GI Features
56Comparative Corpus AnalysisNegative GI Features
57Some conclusions
- Lexical resources for sentiment are consistent
- Financial news is a sub-language
- Affective content is statistically distinct
relative to general language - Text polarity is asymmetric, positive skew
- Different skews for different domains
58Something to think about
- If different language varieties and domains have
distinct use of sentiment terms and their own
polarity bias - Individual sentiment values are not informative
- So what do we need??
59Thank You!