Title: Opinion Retrieval from Blogs
1Opinion Retrieval from Blogs
Wei Zhang1 Clement Yu1 Weiyi Meng2
wzhang_at_cs.uic.edu yu_at_cs.uic.edu
meng_at_cs.binghamton.edu 1 Department of Computer
Science, University of Illinois at Chicago 2
Department of Computer Science, Binghamton
University
CIKM 2007
1
2Outline
- Overview of the opinion retrieval
- Topic retrieval
- Opinion identification
- Ranking documents by opinion similarity
- Experimental results
CIKM 2007
2
3Overview of the Opinion Retrieval
- Opinion retrieval
- Given a query, find documents that have
subjective opinions about the query - A query book
- Relevant This is a very good book.
- Irrelevant This book has 123 pages.
4Overview of the Opinion Retrieval
- Introduced at TREC 2006 Blog Track
- 14 groups, 57 submitted runs in TREC 2006
- 20 groups, 104 runs in TREC 2007 (on going)
- Key problems
- Opinion features
- Query-related opinions
- Rank the retrieved documents
5Our Algorithm
6Topic Retrieval
- Retrieve query-relevant documents
- No opinion involved
- Features
- Phrase recognition
- Query expansion
- Two document-query similarities
7Topic Retrieval Phrase Recognition
- Semantic relationship among the words
- For phrase similarity calculation purpose
- 4 types
- Proper noun University of Lisbon
- Dictionary phrase computer science
- Simple phrase white car
- Complex phrase small white car
8Topic Retrieval Query Expansion
- Find the synonyms
- wto ?? world trade organization
- Same importance
- Add additional terms
- wto ? negotiate, agreements, Tariffs,
9Topic Retrieval - Similarity
- Sim(Query, Doc) ltSim_P, Sim_Tgt
- Phrase similarity
- Having or not having a phrase
- Sim_P sum ( idf(P_i) )
- Term similarity
- Sum of the Okapi scores of all the query terms
- Document ranking
- D1 is ranked higher than D2, if
- (Sim_P1gtSim_P2) OR (P1P2 AND T1gtT2)
10Opinion Identification
Subjective training data
Objective training data
Feature Selection
retrieved documents
opinionative documents
SVM classifier
From topic retrieval
To opinion ranking
11Opinion Identification Training Data
- Subjective training data
- Review web sites
- Documents having opinionative phrases
- Objective training data
- Dictionary entries
- Documents not having opinionative phrases
12Opinion Identification Feature Selection
- The words expressing opinions
- Pearsons Chi-square test
- Test of the independence between subjectivity
label and words via contingency table - Count the number of sentences
- Unigrams and bigrams
13Opinion Identification Classifier
- A support vector machine (SVM) classifier
Objective sentences
Subjective sentences
Features
Feature vector representation
Training
SVM classifier
14Opinion Identification Classifier
Document
SVM classifier
Sentence 1
Label 1objective
Sentence 2
Label 2subjective
Sentence n
Label nobjective
15Opinion Similarity - Query-Related Opinions
- Find the query-related opinions
query
opinionative sentence
text window
document
document
16Opinion Similarity Similarity 1
- Assumption 1
- Higher topic relevance
- ?Higher rank
- OSim_ir Sim(Query, Doc)
17Opinion Similarity Similarity 2
- Assumption 2
- More query-related opinions
- ?Higher rank
- OSim_stcc total number of sentences
- OSim_stcs total score of sentences
18Opinion Similarity Similarity 3
- A linear combination of 1 and 2
- a Osim_ir (1-a) OSim_stcc
- b Osim_ir (1-b) OSim_stcs
19Opinion Similarity Experimental Results
- TREC 2006 Blog Track data
- 50 queries, 3.2 million Blog documens
- UIC at TREC 2006 Blog Track
- Title-only queries scored the first
- 28 - 32 higher than best TREC 2006 scores
- Good things learned
- More training data
- Combined similarity function
20Conclusions
- Designed and implemented an opinion retrieval
system. IR text classification for opinion
retrieval - The best known retrieval effectiveness on TREC
2006 blog data - Extend to polarity classification
positive/negative/mixed - Plan to improve feature selection
21Questions?
- wzhang_at_cs.uic.edu
- http//www.cs.uic.edu/wzhang/