Title: Personalizing Web Search
1Personalizing Web Search
- Jaime Teevan, MIT
- with Susan T. Dumais
- and Eric Horvitz, MSR
2(No Transcript)
3Demo
4Personalizing Web Search
- Motivation
- Algorithms
- Results
- Future Work
5Personalizing Web Search
- Motivation
- Algorithms
- Results
- Future Work
6Study of Personal Relevancy
- 15 SIS users x 10 queries
- Evaluate 50 results
- Highly relevant / Relevant / Irrelevant
- Query selection
- Previously issued query
- Chose from 10 pre-selected queries
- Collected evaluations for 137 queries
- 53 of pre-selected queries (2-9/query)
7Relevant Results Have Low Rank
Highly Relevant
Relevant
Irrelevant
8Same Query, Different Intent
- Different meanings
- Information about the astronomical/astrological
sign of cancer - information about cancer treatments
- Different intents
- is there any new tests for cancer?
- information about cancer treatments
9Same Intent, Different Evaluation
- Query Microsoft
- information about microsoft, the company
- Things related to the Microsoft corporation
- Information on Microsoft Corp
- 31/50 rated as not irrelevant
- Only 6/31 do more than one agree
- All three agree only for www.microsoft.com
10More to Understand
- Do people cluster?
- Even if they cant state their intention
- How are the differences reflected?
- Can they be seen from the information on a
persons computer? - Can we do better than the ranking that would make
everyone the most happy? - Best common ranking 38
- Best personalized ranking 55
11Personalizing Web Search
- Motivation
- Algorithms
- Results
- Future Work
12Personalization Algorithms
- Standard IR
- Related to relevance feedback
- Query expansion
Query
Server
Document
Client
User
v. Result re-ranking
13Result Re-Ranking
- Takes full advantage of SIS
- Ensures privacy
- Good evaluation framework
- Look at light weight user models
- Collected on server side
- Sent as query expansion
14BM25
with Relevance Feedback
Score S tfi wi
N
ni
R
ri
N ni
wi log
15BM25 with Relevance Feedback
Score S tfi wi
N
ni
R
ri
(ri0.5)(N-ni-Rri0.5) (ni-ri0.5)(R-ri0.5)
wi log
16User Model as Relevance Feedback
Score S tfi wi
N
R
N NR ni niri
ri
ni
(ri0.5)(N-ni-Rri0.5) (ni- ri0.5)(R-ri0.5)
(ri0.5)(N-ni-Rri0.5) (ni- ri0.5)(R-ri0.5)
wi log
17User Model as Relevance Feedback
World
Score S tfi wi
N
User
R
ri
ni
18User Model as Relevance Feedback
World
Score S tfi wi
N
User
World related to query
R
ri
ni
ni
N
19User Model as Relevance Feedback
World
Score S tfi wi
N
User
World related to query
R
ri
ni
R
ni
User related to query
N
ri
Query Focused Matching
20User Model as Relevance Feedback
World Focused Matching
World
Score S tfi wi
N
User
Web related to query
R
ri
ni
R
ni
User related to query
N
ri
Query Focused Matching
21Parameters
- Matching
- User representation
- World representation
- Query expansion
22Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
23Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
24User Representation
- Stuff Ive Seen (SIS) index
- Recently indexed documents
- Web documents in SIS index
- Query history
- Relevance judgments
- None
25Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
All SIS Recent SIS Web SIS Query
history Relevance feedback None
26Parameters
- Matching
- User representation
- World representation
- Query expansion
Query Focused World Focused
All SIS Recent SIS Web SIS Query
History Relevance Feedback None
27World Representation
- Document Representation
- Full text
- Title and snippet
- Corpus Representation
- Web
- Result set title and snippet
- Result set full text
28Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
All SIS Recent SIS Web SIS Query
history Relevance feedback None
Full text Title and snippet
Web Result set full text Result set title and
snippet
29Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
All SIS Recent SIS Web SIS Query
history Relevance feedback None
Full text Title and snippet
Web Result set full text Result set title and
snippet
30Query Expansion
- All words in document
- Query focused
The American Cancer Society is dedicated to
eliminating cancer as a major health problem by
preventing cancer, saving lives, and diminishing
suffering through ...
The American Cancer Society is dedicated to
eliminating cancer as a major health problem by
preventing cancer, saving lives, and diminishing
suffering through ...
31Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
All SIS Recent SIS Web SIS Query
history Relevance feedback None
Full text Title and snippet
Web Result set full text Result set title and
snippet
All words Query focused
32Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
All SIS Recent SIS Web SIS Query
history Relevance feedback None
Full text Title and snippet
Web Result set full text Result set title and
snippet
All words Query focused
33Parameters
- Matching
- User representation
- World representation
- Query expansion
Query focused World focused
All SIS Recent SIS Web SIS Query
history Relevance feedback None
Full text Title and snippet
Web Result set full text Result set title and
snippet
All words Query focused
34Personalizing Web Search
- Motivation
- Algorithms
- Results
- Future Work
35Baselines
- Best possible
- Random
- Text based ranking
- Web ranking
- URL Boost
http//mail.yahoo.com/inbox/msg10
http//mail.yahoo.com/inbox/msg10
1
http//mail.yahoo.com/inbox/msg10
1
36Best Parameter Settings
- Richer user representation better
- SIS gt Recent gt Web gt Query History gt None
- Suggests rich client important
- Efficiency hacks dont hurt
- Snippets query focused
- Length normalization not an issue
- Query focus good
37Text Alone Not Enough
- Better than some baselines
- Better than random
- Better than no user representation
- Better than relevance feedback
- Worse than Web results
- Blend in other features
- Web ranking
- URL boost
38Good, but Lots of Room to Grow
- Best combination 9.1 improvement
- Best possible 51.5 improvement
- Assumes best Web combination selected
- Only improves results 2/3 of the time
39Personalizing Web Search
- Motivation
- Algorithms
- Results
- Future Work
40Finding the Best Parameter Setting
- Almost always some parameter setting that
improves results - Use learning to select parameters
- Based on individual
- Based on query
- Based on results
- Give user control?
41Further Exploration of Algorithms
- Larger parameter space to explore
- More complex user model subsets
- Different parsing (e.g., phrases)
- Tune BM25 parameters
- What is really helping?
- Generic user model or personal model
- Use different indices for the queries
- Deploy system
42Practical Issues
- Efficiency issues
- Can interfaces mitigate some of the issues?
- Merging server and client
- Query expansion
- Get more relevant results in the set to be
re-ranked - Design snippets for personalization
43Thank you!