Title: Approaches for Automatically Tagging Affect
1Approaches for Automatically Tagging Affect
- Nathanael Chambers, Joel Tetreault, James Allen
- University of Rochester
- Department of Computer Science
2Affective Computing
- Why use computers to detect affect?
- Make human-computer interaction more natural
- Computers express emotion
- And detect users emotion
- Tailor responses to situation
- Use affect for text summarization
- Understanding affect improves computer-human
interaction systems
3From the Psychologists P.O.V
- However, if computers can detect affect, it can
also help humans understand affect - By observing the changes in emotion and attitude
in people conversing, psychologists can determine
correct treatments for patients
4Marriage Counseling
- Emotion and communication are important to mental
and physical health - Psychological theories suggest that how well a
couple copes with serious illness is related to
how well they interact to deal with it - Poor interactions (ie. Disengagement during
conversations) can at times exacerbate an illness - Tested hypothesis by observing the
engagement-levels of conversation between
married-couples presented with a task
5Example Interactions
- Good interaction sequence
W Well I guess we'd just have to develop a plan
wouldn't we? H And we would be just more
watchful or plan or maybe not, or be together
more when the other one went to do something W
In other words going together H Going together
more W That's right. And working more closely
together and like you say, doing things more
closely together. And I think we certainly would
want to share with the family openly what we felt
was going on so we could kind of work out family
plans
- Poor interaction sequence
W So how would you deal with that? H I don't
know. I'd probably try to help. And you know, go
with you or do things like that if I, if I could.
And you know, I don't know. I would try to do
the best I could to help you
6Testing theory
- Record and transcribe conversations of married
couples presented with what-if scenario of one
of them having Alzheimers. - Participants asked to discuss how they would deal
with the sickness - Tag sentences of transcripts with affect-related
codes. Certain textual patterns evoke negative
or position connotations - Use distribution of tags to look for correlations
between communication and marital satisfaction - Use tag distribution to decide on treatment for
couple
7Problem
- However tagging (step 2) is time-consuming and
requires training time for new annotators, as
well as being unreliable - Solution use computers to do tagging work so
psychologists can spend more time with patients
and less time coding
8Goals
- Develop algorithms to automatically tag
transcripts of a Marriage Counseling Corpus
(Shields, 1997) - Develop a tool that human annotators can use to
pre-tag a transcript given the best algorithm,
and then quickly correct it
9Outline
- Background
- Marriage Counseling Corpus
- N-gram based approaches
- Information-Retrieval/Call Routing approaches
- Results
- CATS Tool
10Background
- Affective computing, or detecting emotion in
texts or from a user, is a young field - Earliest approaches used keyword matching
- Tagged dictionaries with grammatical features
(Boucouvalas and Ze, 2002) - Statistical methods LSA (Webmind project), TSB
(Wu et al., 2000) to tag a dialogue - Liu et al. (2003) use common-sense rules to
detect emotion in emails
11New Methods for Tagging Affect
- Our approaches differ from others in two ways
- Use different statistical methods based on
computing N-grams - Tag individual sentences as opposed to discourse
chunks - Our approaches are based on methods that have
been successful in another domain discourse act
tagging
12Marriage Counseling Corpus
- 45 annotated transcripts of married couples
working on a task of Alzheimers - Collected by psychologists in the Center for
Future Health, Rochester, NY - Transcripts broken into thought units one or
more sentences that represent how the speaker
feels toward a topic (4,040 total) - Tagging thought units takes into account positive
and negative words, level of detail, comments on
health, family, travel, etc, sensitivity
13Code Tags
- DTL Detail (11.2) speakers verbal content
is concise and distinct with regards to illness,
emotions, dealing with death - It would be hard for me to see you so helpless
- GEN General (41.6) verbal content towards
illness is vague or generic, or speaker does not
take ownership of emotions - I think that it would be important
14Code Tags
- SAT Statements About the Task (7.2) couple
discusses what the task is, how to perform it - I thought I would be the caregiver
- TNG Tangent (2.9) statements that are way
off topic. - ACK Acknowledgments (22.8) of the other
speakers comments - Yeah right
15N-Gram Based Approaches
- n-gram a sequential list of n words, used to
encode the likelihood that the phrase will appear
in the future - Involves splitting sentence into chunks of
consecutive words of length n
I dont know what to say 1-gram (unigram) I,
dont, know, what, to, say 2-gram (bigram) I
dont, dont know, know what, what to, to
say 3-gram (trigram) I dont know, dont know
what, know what to, etc. n-gram
16Frequency Table (Training)
GEN DTL ACK
SAT
I
0.5 0.2 0.2 0.1
Yeah
0.3 0.2 0.4 0.1
Dont want to be
0.2 0.8 0.0 0.0
0.0 1.0 0.0 0.0
I dont want to be
Each entry Probability that n-gram is labeled a
certain tag
17N-Gram Motivation
- Advantages
- Encode not just keywords, but also word ordering,
automatically - Models are not biased by hand coded lists of
words, but are completely dependent on real data - Learning features of each affect type is
relatively fast and easy - Disadvantages
- Long range dependencies are not captured
- Dependent on having a corpus of data to train
from - Sparse data for low frequency affect tags
adversely affects the quality of the n-gram model
18Naïve Approach
- P(tagi utt) maxj,k P(tagi ngramjk)
- Where i is one of GEN, DTL, ACK, SAT, TNG
- And ngramjk is the j-th ngram of length k
- So for all n-grams in a thought unit, find the
one with the highest probability for a given tag,
and select that tag
19Naïve Approach Example
- I dont want to be chained to a wall.
20N-Gram Approaches
- Weighted Approach
- Weight the longer n-grams higher in the
stochastic model - Lengths Approach
- Include a length-of-utterances factor, capturing
the differences in utterance length between
affect tags - Weights with Lengths Approach
- Combine Weighted with Lengths
- Repetition Approach
- Combine all the above information,with overlap of
words between thought units
21Repetition Approach
- Many acknowledgement ACK utterances were being
mistagged as GEN by the previous approaches.
Most of the errors came from grounding that
involved word repetition - A - so then you check that your tire is not flat.
- B - check the tire
- We created a model that takes into account word
repetition in adjacent utterances in a dialogue.
- We also include a length probability to capture
the Lengths Approach. - Only unigrams are used to avoid sparseness in the
training data.
22IR-based approaches
- Work based on call-routing algorithm of
Chu-Carroll and Carpenter (1999) - Problem route a users call to a financial call
center to the correct destination - Do this by comparing a query from the user
(speech converted to text) into a vector to be
compared with a list of possible destination
vectors in a database
23Database Table (Training)
Database
yeah, thats right
GEN DTL ACK
SAT
Query
I
0.5 0.2 0.2 0.1
0.0
Cosine comparison
0.3 0.2 0.4 0.1
yeah
1.0
0.2 0.8 0.0 0.0
Dont want to be
0.0
0.0 1.0 0.0 0.0
I dont want to be
0.0
Query (thought unit) compared against each tag
vector in database
24Database Creation
- Construct database in the same manner as N-gram
- Database then normalized
- Filter Inverse Document Frequency (IDF) lowers
the weight of terms that occur in many documents - IDF(t) log2 (N / d(t) )
- Where d(t) is the number of tags containing
n-gram t, and N is the total number of tags
25Method 1 Routing-based method
- Modified call-routing method with entropy (amount
of disorder) to further reduce contribution of
terms that occur frequently - Also created two more terms (rows in database)
- Sentence length tags may be correlated with
sentences of a certain length - Repetition acknowledgments tend to repeat the
words stated in the previous thought unit
26Method 1 Example
ACK0.002
query
DTL 0.073 GEN 0.072
SAT 0.014
TNG 0.0001
Cosine scores for tags compared against query
vector for I dont want to be chained to a wall
27Method 2 Direct Comparison
- Instead of comparing queries to a normalized
database of exemplar documents, compare them to
all test sentences - Advantage no normalizing or construction of
documents - Cosine test is used to get the top ten matches.
Add matches with the same tag. The tag that has
the highest sum in the end is selected.
28Method 2 Example
DTL selected with total score of 1.11
29Evaluation
- Performed six-fold cross-validation over the
Marriage Corpus and Switchboard Corpus - Averaged scores from each of the six evaluations
30Results
6-Fold Cross Validation for N-gram Methods
6-Fold Cross Validation for IR Methods
31Discussion
- N-gram approaches do slightly better than IR over
Marriage Counseling - Incorporating additional features of sentence
length and repetition improve both models - Entropy model better than IDF in call-routing
system (gets 4 boost) - Psychologists currently using tool to tag their
work. Note sometimes computer tags better than
the human annotators
32CATS
- CATS An Automated Tagging System for affect and
other similar information retrieval tasks. - Written in Java for cross-platform
interoperability. - Implements the Naïve approach with unigrams and
bigrams only. - Builds the stochastic models automatically off of
a tagged corpus, input by the user into the GUI
display. - Automatically tags new data using the users
models. Each tag also receives a confidence
score, allowing the user to hand check the
dialogue quickly and with greater confidence.
33The CATS GUI provides a clear workspace for text
and tags. Tagging new data and training old data
is done with a mouse click.
34Customizable models are available. Create your
own list of tags, provide a training corpus, and
build a new model.
35Tags are marked with confidence scores based on
the probabilistic models.