Title: Technologies for Personalized Web search
1Technologies for Personalized Web search
2Outline
- Search Histories
- Query Terms Expansion
- Browse Histories
- Relevance Feedback
31.Search Histories
- phases
- - collecting information from users (all
searches, for which at least one of the results
was clicked were logged per user) - - Creating User profile and Result Profile
- - Evaluation
41.1 System Architecture
- GoogleWrapper
- - monitors users maintaining a log of
- ? submitted queries for which at least one
- result was visited
- ? user-selected snippets from retrieved
results - ? retrieved top 10 results (title and
summary) - Categorizer
- - classify each queries and snippets for each
- user as well as the snippets for the
search - engine results.
- - SVM, K-NN
51.2 Creation of User Profile
- User Interests
- - snippets (titles plus the textual
summaries) - Representation
- - ODP Concept Hierarchy, 1869 concepts, the top
3 level - Concept weight
- Item Classification
- - per item added to 4 concepts for query
- - per item added to 5 concepts for snippet
61.2 Personalizing Search Results
- Calculate conceptual match based on similarity
between each document profile and user profile - - The search result titles and summaries are
classified to create a document profile - Re-rank results based on conceptual match
- - rank order produced called conceptual rank
71.2 Personalizing Search Results
- a has a value between 0 and 1
- The conceptual and search engine based
- rankings can be blended in different
- proportions by varying the value of a
81.3 Experimental Setup
- Preliminary study
- - randomly selected 100 queries out of 576
- collected the top-ranked result was the
most - frequently selected (60), the second
(20), - and the third (14).
- Conclusions
- - users rarely look at results beyond the first
page - - user judgments are affected by presentation
order - Top 10 results
- - Reimplement GoogleWrapper so that only top 10
- results are displayed in random order
91.4 Experimental Data
- Monitored six volunteers (6 months).
- Removed duplicate queries for each user
- Distributed 45 queries per user (270 total) into
the following sets - - 240 (40 per user) queries were used for
training the 2 user profiles (query-based and
snippet-based) - - 30 (5 per user) queries were used for testing
personalized search parameters -
101.Search Histories - Experiment 1
- Profile built classifying and combining each
query (4 concepts from each classified query). - Average Google Rank is 5.1
- Best average conceptual rank is 3.4 ( using 1
concepts from user profile and 30 queries to
create the profile)
111.Search Histories - Experiment 2
- Profile built classifying and combining each
snippet (5 concepts from each classified
snippet). - Average Google Rank is 5.1
- Best average conceptual rank is 3.2 ( using 20
concepts from user profile and 30 snippets to
create the profile)
121.Search Histories - Experiment 3
- Final rank combination of original rank with
conceptual rank. - Query-based profile.
131.Search Histories - Experiment 4
- Final rank combination of original rank with
conceptual rank. - Snippet-based profile.
141.Search Histories - Experiment 5
- To verify that the 2 types of profiles
(query-based and snippet-based) are able to
improve queries never seen before. - 12 (2 per user) testing queries never seen
before - Query-based profile (30 queries and 4 concepts
per query) - Snippet-based profile (30 snippets and 20
concepts per snippet)
152. Query Terms Expansion
- Expanding with Local Desktop Analysis
- - Term and Document Frequency
- - Lexical Compounds
- - Sentence Selection
- Expanding with Global Desktop Analysis
- - Term Co-occurrence Statistics
- - Thesaurus Based Expansion
- Experiments
- Adaptive Analysis
-
162.1 Expanding with Local Desktop Analysis
- Term and Document Frequency
- - Calculation of Term Score
- - Calculation Document Frequency
172.1 Expanding with Local Desktop Analysis
- Lexical Compounds
- - select frequent compound terms
182.1 Expanding with Local Desktop Analysis
- Sentence Selection
- - methods
- (1) Identified the set of relevant Desktop
documents - (2) Outputted summary containing the most
importance sentences - - calculation of sentence score
PS(Avg(NS) - SentenceIndex)/Avg2(NS) for the
first 10 sentences and 0 otherwise TQ the number
of query terms present in sentence NQthe total
number of terms from the query
192.2 Expanding with Global Desktop Analysis
- Term Co-occurrence Statistics
- - Co-occurrence based keyword similarity search
202.2 Expanding with Global Desktop Analysis
- Term Co-occurrence Statistics
- - similarity calculation
- (1) Cosine Similarity
- (2) Mutual Information
- (3) Likelihood Ratio
212.2 Expanding with Global Desktop Analysis
- Thesaurus Based Expansion
222.3 Experiments
- Experimental Setup
- 18 Subjects
- Indexing Documents with Lucene
- Selection of Queries
- (1) Top Log Query in AltaVista, Avg length2
- (2) Randomly Selected Log Query, Avg length2.3
- (3) Self-selected Specific Query, only one
meaning, Avg length2.9 - (4) Self-selected Ambiguous Query, at least 3
meanings, Avg length1.8
232.3 Experiments
- Experimental Setup (continue)
- Evaluation of query terms score of 1 to 5
- - top3.11
- - random3.72
- - Self-selected Specific4.45
- - self-selected ambiguous4.39
- Issued 72 queries, evaluated 6000 URLs
0-2(non-relevant, relevant, highly relevant). - Evaluation metrics NDCG
- the number of generated expansion keywords4
242.3.1 Experiment 1
252.3.2 Experiment 2
263. Adaptive Analysis
- Query Clarity level
- Query Length
- - the number of non-stop words in the query
- The distribution of informative amount in query
terms - -
- -
- sidf is the standard deviation of the idf of the
terms in Q - idfmax and idfmin are the maximum and minimum
idf among the terms in Q respectively
273. Adaptive Analysis
- Query Clarity level (continue)
- Query Scope (relates to the IDF of the entire
query) - Query Clarity (measures the divergence between
the language model associated to the user query
and the language model associated to the
collection)
283. Adaptive Analysis
293. Adaptive Analysis
- Metrics for Clarity
- Query Scope
- Query Clarity
303. Adaptive Analysis
- Categorizing Clarity Prediction
313. Adaptive Analysis
323. Adaptive Analysis-Experiment
333. Adaptive Analysis-Experiment