Automatic Domain Adaptive Sentiment Analysis Phase 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Domain Adaptive Sentiment Analysis Phase 1

Description:

Title: Determining, Creating, and Encoding Semantic, Domain-Specific, and Domain-Independent Knowledge for Sentiment Analysis Author: Patricia Ordonez – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 40
Provided by: PatriciaO153
Category:

less

Transcript and Presenter's Notes

Title: Automatic Domain Adaptive Sentiment Analysis Phase 1


1
Automatic Domain Adaptive Sentiment Analysis
Phase 1
  • Justin Martineau

2
Outline
  • Introduction
  • Problem Definition
  • Thesis Statement
  • Motivation
  • Background and Related Work
  • Challenges
  • Approaches
  • Research Plan
  • Approach
  • Evaluation
  • Timeline
  • Conclusion

3
Problem Definition
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Sentiment Analysis is the automatic detection and
    measurement of sentiment in text segments by
    machines.
  • 3 Sub Tasks
  • Objective vs. Subjective
  • Topic Detection
  • Positive vs. Negative
  • Commonly applied to web data
  • Very Domain Dependent

4
Sentiment Analysis Example
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
5
Thesis Statement
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • This dissertation will develop and evaluate
    techniques to discover and encode
    domain-specific, domain-independent, and semantic
    knowledge to improve both single and multiple
    domain sentiment analysis problems on textual
    data given low labeled data conditions.

6
Motivation Private Sector
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Market Research
  • Surveys
  • Focus Groups
  • Feature Analysis
  • Customer targeting (Free samples etc)
  • Consumer Sentiment Search
  • Compare pros and cons
  • Overall opinion of products/services

7
Motivation Public Sector
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Political
  • Alternative Polling
  • Determine popular support for legislation
  • Choose campaign issues
  • National Security
  • Detect individuals at risk for radicalization
  • Determine local sentiment about US policy
  • Determine local values and sentimental icons
  • Portray actions positively using local flavor
  • Public Health
  • Detect potential suicide victims
  • Detect mentally unstable people

8
Challenges
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Text Representation
  • Unedited Text
  • Sentiment Drift
  • Negation
  • Sarcasm
  • Sentiment Target Identification
  • Granularity
  • Domain Dependence

9
Domain Dependence 1Domain Dependent Sentiment
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • The same sentence can mean two very different
    things in different domains
  • Ex Read the book. lt Good for books, bad for
    movies
  • Ex Jolting, heart pounding, Youre in for one
    hell of a bumpy ride! Good for movies and books,
    bad for cars.
  • Sentimental word associations change with domain
  • Fuzzy cameras are bad, but fuzzy teddy bears are
    good.
  • Big trucks are good, but big iPods are bad.
  • Bad is bad, but bad villains are good.

10
Domain Dependence 2 Endless Possibilities
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
11
Domain Dependence 3Organization and Granularity
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
12
Theory of the Three Signals
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Authors communicate messages using three types of
    signals
  • Domain-Specific Signals
  • Domain-Independent Signals
  • Semantic Signals
  • More specific signals are generally more powerful
    than more generic signals

13
Domain-Specific Signals
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Fuzzy teddy bears
  • Sharp pictures
  • Sharp knives
  • Smooth rides
  • New ideas
  • Fast servers
  • Fast cars
  • Slow roasted burgers
  • Slow motion
  • Small cameras
  • Big cars
  • Dependent on problem and domain
  • Considered more useful by readers
  • Tells what is good or bad about topic
  • Domain knowledge determines sentiment orientation
  • Very strong in context, but weak or misleading
    out of context
  • Can cause over generalization error when
    overvalued
  • New domain-specific signal words are ignored in
    CDT

14
Proposed Approach
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Sentiment Search is more than just a
    classification problem
  • Detecting and Using the three signals
  • Dynamic Domain Adapting Classifiers
  • Generic Feature Detection using unlabeled data
  • Semantic Feature Spaces

15
Dynamic Domain Adapting Classifiers
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • A (preferably domain-independent) model is built
    using computationally intense algorithms before
    query time on a set of labeled data.
  • Users interact at a query box level
  • Query results define the domain of interest
  • Domain specific adaptations are calculated
  • compares how the domain of interest is different
    from known cases
  • uses semantic knowledge about word senses and
    relations
  • must be fast algorithm users are waiting
  • Domain specific adaptations are woven into the
    domain independent model
  • resulting model is temporary
  • used to classify documents as positive, negative,
    or objective
  • Sentimental search results are processed for
    significant components and presented for human
    consumption

16
Overview
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
Key User Level, Source Data, Knowledge,Labeled
Data Algorithms, Search Results
17
Subjective Context Scoring
  • Multiply
  • PMI(Word,Context)
  • IDF
  • Co-occurance with know generic sentiment seed
    words times their bias (From movie reviews)
  • Seeds
  • bad,worst,stupid,ridiculous, terrible,poorly
  • great,best,perfect,wonderful, excellent,effective

18
Rocchio Baseline
  • Rocchio - Query Expansion algorithm for search
  • Similar goals to ours, find more relevant words
  • Does not account for sentiment
  • The new query is a weight sum of
  • Matching document vectors
  • Query vector
  • Non-matching document vectors (negative value).

19
Papa Johns According to TFIDF
20
Papa Johns According to Subjective Context
21
George Bush According to TFIDF
22
George Bush According to Subjective Context
23
iPod according to Rocchio
24
iPod according to TFIDF
Positive Sentiment In Movie Reviews Negative
Sentiment in Movie Reviews
25
Sentimental Context
  • Components
  • PMI(Word,Context)
  • TF
  • IDF
  • Log( Actual Co Occur of Word,Seed, context / Prob
    by chance)
  • Values
  • Abnormality to other docs
  • Popular words in context
  • Rare words in the corpus
  • Words that occur with sentiment words in the
    query documents

26
iPod according to Sentimental Context
27
iPod Nike according to Sentimental Context
28
iPodNike According to Apple
29
iPod Audio according to Sentiment Context
30
iPod Shuffle According to Sentiment Context
31
iPod Warranty According to Sentimental Context
32
iPod Battery according to Sentiment Context
33
iPod nano battery According to Sentimental Context
34
Google Hits (Battery Related)
  • iPod battery good 13.5 Mill
  • iPod battery bad 900 K
  • iPod nano battery good 3 Mill
  • iPod nano battery bad 785 K
  • iPod shuffle battery good 1.6 Mill
  • iPod shuffle battery bad 230 K
  • iPod shuffle battery price good 2.6 Mill (not a
    typo)
  • iPod shuffle battery price bad 230 K
  • iPod battery price good 13.5 Mill
  • iPod battery price bad 850 K
  • iPod nano battery price good 3 Mill
  • iPod nano battery price bad 785 K

35
(No Transcript)
36
Summary
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • Interesting problem with many potential
    applications
  • Domain dependence is the core challenge
  • The keys to success are
  • Vast quantities of unlabeled data
  • Semantic knowledge from freely available sources
  • Semantics must guide and influence but not
    overrule the statistics

37
Questions?
38
BACKUP SLIDES
39
PMI - Pointwise Mutual Information
1. Intro - 2. Related Work - 3. Research Plan -
4. Conclusion
  • a.k.a. Specific Mutual Information
  • Do 2 variables occur more often with each other
    than chance?
Write a Comment
User Comments (0)
About PowerShow.com