Title: Automatic Domain Adaptive Sentiment Analysis Phase 1
 1Automatic Domain Adaptive Sentiment Analysis 
Phase 1
  2Outline
- Introduction 
- Problem Definition 
- Thesis Statement 
- Motivation 
- Background and Related Work 
- Challenges 
- Approaches 
- Research Plan 
- Approach 
- Evaluation 
- Timeline 
- Conclusion
3Problem Definition
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Sentiment Analysis is the automatic detection and 
 measurement of sentiment in text segments by
 machines.
- 3 Sub Tasks 
- Objective vs. Subjective 
- Topic Detection 
- Positive vs. Negative 
- Commonly applied to web data 
- Very Domain Dependent
4Sentiment Analysis Example
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion 
 5Thesis Statement
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- This dissertation will develop and evaluate 
 techniques to discover and encode
 domain-specific, domain-independent, and semantic
 knowledge to improve both single and multiple
 domain sentiment analysis problems on textual
 data given low labeled data conditions.
6Motivation Private Sector
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Market Research 
- Surveys 
- Focus Groups 
- Feature Analysis 
- Customer targeting (Free samples etc) 
- Consumer Sentiment Search 
- Compare pros and cons 
- Overall opinion of products/services
7Motivation Public Sector
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Political 
- Alternative Polling 
- Determine popular support for legislation 
- Choose campaign issues 
- National Security 
- Detect individuals at risk for radicalization 
- Determine local sentiment about US policy 
- Determine local values and sentimental icons 
- Portray actions positively using local flavor 
- Public Health 
- Detect potential suicide victims 
- Detect mentally unstable people
8Challenges
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Text Representation 
- Unedited Text 
- Sentiment Drift 
- Negation 
- Sarcasm 
- Sentiment Target Identification 
- Granularity 
- Domain Dependence
9Domain Dependence 1Domain Dependent Sentiment
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- The same sentence can mean two very different 
 things in different domains
- Ex Read the book. lt Good for books, bad for 
 movies
- Ex Jolting, heart pounding, Youre in for one 
 hell of a bumpy ride! Good for movies and books,
 bad for cars.
- Sentimental word associations change with domain 
- Fuzzy cameras are bad, but fuzzy teddy bears are 
 good.
- Big trucks are good, but big iPods are bad. 
- Bad is bad, but bad villains are good.
10Domain Dependence 2 Endless Possibilities
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion 
 11Domain Dependence 3Organization and Granularity
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion 
 12Theory of the Three Signals
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Authors communicate messages using three types of 
 signals
- Domain-Specific Signals 
- Domain-Independent Signals 
- Semantic Signals 
- More specific signals are generally more powerful 
 than more generic signals
13Domain-Specific Signals
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Fuzzy teddy bears 
- Sharp pictures 
- Sharp knives 
- Smooth rides 
- New ideas 
- Fast servers 
- Fast cars 
- Slow roasted burgers 
- Slow motion 
- Small cameras 
- Big cars
- Dependent on problem and domain 
- Considered more useful by readers 
- Tells what is good or bad about topic 
- Domain knowledge determines sentiment orientation 
- Very strong in context, but weak or misleading 
 out of context
- Can cause over generalization error when 
 overvalued
- New domain-specific signal words are ignored in 
 CDT
14Proposed Approach
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Sentiment Search is more than just a 
 classification problem
- Detecting and Using the three signals 
- Dynamic Domain Adapting Classifiers 
- Generic Feature Detection using unlabeled data 
- Semantic Feature Spaces
15Dynamic Domain Adapting Classifiers
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- A (preferably domain-independent) model is built 
 using computationally intense algorithms before
 query time on a set of labeled data.
- Users interact at a query box level 
- Query results define the domain of interest 
- Domain specific adaptations are calculated 
- compares how the domain of interest is different 
 from known cases
- uses semantic knowledge about word senses and 
 relations
- must be fast algorithm users are waiting 
- Domain specific adaptations are woven into the 
 domain independent model
- resulting model is temporary 
- used to classify documents as positive, negative, 
 or objective
- Sentimental search results are processed for 
 significant components and presented for human
 consumption
16Overview
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
Key User Level, Source Data, Knowledge,Labeled 
Data Algorithms, Search Results 
 17Subjective Context Scoring
- Multiply 
- PMI(Word,Context) 
- IDF 
- Co-occurance with know generic sentiment seed 
 words times their bias (From movie reviews)
- Seeds 
- bad,worst,stupid,ridiculous, terrible,poorly 
- great,best,perfect,wonderful, excellent,effective
18Rocchio Baseline
- Rocchio - Query Expansion algorithm for search 
- Similar goals to ours, find more relevant words 
- Does not account for sentiment 
- The new query is a weight sum of 
- Matching document vectors 
- Query vector 
- Non-matching document vectors (negative value). 
19Papa Johns According to TFIDF 
 20Papa Johns According to Subjective Context 
 21George Bush According to TFIDF 
 22George Bush According to Subjective Context 
 23iPod according to Rocchio 
 24iPod according to TFIDF
Positive Sentiment In Movie Reviews Negative 
Sentiment in Movie Reviews 
 25Sentimental Context
- Components 
- PMI(Word,Context) 
- TF 
- IDF 
- Log( Actual Co Occur of Word,Seed, context / Prob 
 by chance)
- Values 
- Abnormality to other docs 
- Popular words in context 
- Rare words in the corpus 
- Words that occur with sentiment words in the 
 query documents
26iPod according to Sentimental Context 
 27iPod Nike according to Sentimental Context 
 28iPodNike According to Apple 
 29iPod Audio according to Sentiment Context 
 30iPod Shuffle According to Sentiment Context 
 31iPod Warranty According to Sentimental Context 
 32iPod Battery according to Sentiment Context 
 33iPod nano battery According to Sentimental Context 
 34Google Hits (Battery Related)
- iPod battery good  13.5 Mill 
- iPod battery bad  900 K 
- iPod nano battery good  3 Mill 
- iPod nano battery bad  785 K 
- iPod shuffle battery good  1.6 Mill 
- iPod shuffle battery bad  230 K 
- iPod shuffle battery price good  2.6 Mill (not a 
 typo)
- iPod shuffle battery price bad  230 K 
- iPod battery price good  13.5 Mill 
- iPod battery price bad  850 K 
- iPod nano battery price good  3 Mill 
- iPod nano battery price bad  785 K
35(No Transcript) 
 36Summary
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- Interesting problem with many potential 
 applications
- Domain dependence is the core challenge 
- The keys to success are 
- Vast quantities of unlabeled data 
- Semantic knowledge from freely available sources 
- Semantics must guide and influence but not 
 overrule the statistics
37Questions? 
 38BACKUP SLIDES 
 39PMI - Pointwise Mutual Information
1. Intro - 2. Related Work - 3. Research Plan - 
4. Conclusion
- a.k.a. Specific Mutual Information 
- Do 2 variables occur more often with each other 
 than chance?