Automatic Domain Adaptive Sentiment Analysis Phase 1 presentation

About This Presentation

Transcript and Presenter's Notes

Title: Automatic Domain Adaptive Sentiment Analysis Phase 1

1
Automatic Domain Adaptive Sentiment Analysis
Phase 1

Justin Martineau

2
Outline

Introduction
Problem Definition
Thesis Statement
Motivation
Background and Related Work
Challenges
Approaches
Research Plan
Approach
Evaluation
Timeline
Conclusion

3
Problem Definition
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Sentiment Analysis is the automatic detection and
measurement of sentiment in text segments by
machines.
3 Sub Tasks
Objective vs. Subjective
Topic Detection
Positive vs. Negative
Commonly applied to web data
Very Domain Dependent

4
Sentiment Analysis Example
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
5
Thesis Statement
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

This dissertation will develop and evaluate
techniques to discover and encode
domain-specific, domain-independent, and semantic
knowledge to improve both single and multiple
domain sentiment analysis problems on textual
data given low labeled data conditions.

6
Motivation Private Sector
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Market Research
Surveys
Focus Groups
Feature Analysis
Customer targeting (Free samples etc)
Consumer Sentiment Search
Compare pros and cons
Overall opinion of products/services

7
Motivation Public Sector
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Political
Alternative Polling
Determine popular support for legislation
Choose campaign issues
National Security
Detect individuals at risk for radicalization
Determine local sentiment about US policy
Determine local values and sentimental icons
Portray actions positively using local flavor
Public Health
Detect potential suicide victims
Detect mentally unstable people

8
Challenges
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Text Representation
Unedited Text
Sentiment Drift
Negation
Sarcasm
Sentiment Target Identification
Granularity
Domain Dependence

9
Domain Dependence 1Domain Dependent Sentiment
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

The same sentence can mean two very different
things in different domains
Ex Read the book. lt Good for books, bad for
movies
Ex Jolting, heart pounding, Youre in for one
hell of a bumpy ride! Good for movies and books,
bad for cars.
Sentimental word associations change with domain
Fuzzy cameras are bad, but fuzzy teddy bears are
good.
Big trucks are good, but big iPods are bad.
Bad is bad, but bad villains are good.

10
Domain Dependence 2 Endless Possibilities
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
11
Domain Dependence 3Organization and Granularity
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
12
Theory of the Three Signals
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Authors communicate messages using three types of
signals
Domain-Specific Signals
Domain-Independent Signals
Semantic Signals
More specific signals are generally more powerful
than more generic signals

13
Domain-Specific Signals
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Fuzzy teddy bears
Sharp pictures
Sharp knives
Smooth rides
New ideas
Fast servers
Fast cars
Slow roasted burgers
Slow motion
Small cameras
Big cars

Dependent on problem and domain
Considered more useful by readers
Tells what is good or bad about topic
Domain knowledge determines sentiment orientation
Very strong in context, but weak or misleading
out of context
Can cause over generalization error when
overvalued
New domain-specific signal words are ignored in
CDT

14
Proposed Approach
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Sentiment Search is more than just a
classification problem
Detecting and Using the three signals
Dynamic Domain Adapting Classifiers
Generic Feature Detection using unlabeled data
Semantic Feature Spaces

15
Dynamic Domain Adapting Classifiers
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

A (preferably domain-independent) model is built
using computationally intense algorithms before
query time on a set of labeled data.
Users interact at a query box level
Query results define the domain of interest
Domain specific adaptations are calculated
compares how the domain of interest is different
from known cases
uses semantic knowledge about word senses and
relations
must be fast algorithm users are waiting
Domain specific adaptations are woven into the
domain independent model
resulting model is temporary
used to classify documents as positive, negative,
or objective
Sentimental search results are processed for
significant components and presented for human
consumption

16
Overview
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
Key User Level, Source Data, Knowledge,Labeled
Data Algorithms, Search Results
17
Subjective Context Scoring

Multiply
PMI(Word,Context)
IDF
Co-occurance with know generic sentiment seed
words times their bias (From movie reviews)
Seeds
bad,worst,stupid,ridiculous, terrible,poorly
great,best,perfect,wonderful, excellent,effective

18
Rocchio Baseline

Rocchio - Query Expansion algorithm for search
Similar goals to ours, find more relevant words
Does not account for sentiment
The new query is a weight sum of
Matching document vectors
Query vector
Non-matching document vectors (negative value).

19
Papa Johns According to TFIDF
20
Papa Johns According to Subjective Context
21
George Bush According to TFIDF
22
George Bush According to Subjective Context
23
iPod according to Rocchio
24
iPod according to TFIDF
Positive Sentiment In Movie Reviews Negative
Sentiment in Movie Reviews
25
Sentimental Context

Components
PMI(Word,Context)
TF
IDF
Log( Actual Co Occur of Word,Seed, context / Prob
by chance)
Values
Abnormality to other docs
Popular words in context
Rare words in the corpus
Words that occur with sentiment words in the
query documents

26
iPod according to Sentimental Context
27
iPod Nike according to Sentimental Context
28
iPodNike According to Apple
29
iPod Audio according to Sentiment Context
30
iPod Shuffle According to Sentiment Context
31
iPod Warranty According to Sentimental Context
32
iPod Battery according to Sentiment Context
33
iPod nano battery According to Sentimental Context
34
Google Hits (Battery Related)

iPod battery good 13.5 Mill
iPod battery bad 900 K
iPod nano battery good 3 Mill
iPod nano battery bad 785 K
iPod shuffle battery good 1.6 Mill
iPod shuffle battery bad 230 K
iPod shuffle battery price good 2.6 Mill (not a
typo)
iPod shuffle battery price bad 230 K
iPod battery price good 13.5 Mill
iPod battery price bad 850 K
iPod nano battery price good 3 Mill
iPod nano battery price bad 785 K

35
(No Transcript)
36
Summary
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

Interesting problem with many potential
applications
Domain dependence is the core challenge
The keys to success are
Vast quantities of unlabeled data
Semantic knowledge from freely available sources
Semantics must guide and influence but not
overrule the statistics

37
Questions?
38
BACKUP SLIDES
39
PMI - Pointwise Mutual Information
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion

a.k.a. Specific Mutual Information
Do 2 variables occur more often with each other
than chance?

Write a Comment

User Comments (0)

About PowerShow.com

Automatic Domain Adaptive Sentiment Analysis Phase 1 PowerPoint PPT Presentation