Technologies for Personalized Web search - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Technologies for Personalized Web search

Description:

ODP Concept Hierarchy, 1869 concepts, the top 3 level. Concept weight ... Issued 72 queries, evaluated 6000 URLs :0-2(non-relevant, relevant, highly relevant) ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 34

Provided by: rucX

Category:

more less

Transcript and Presenter's Notes

Title: Technologies for Personalized Web search

1
Technologies for Personalized Web search
2
Outline

Search Histories
Query Terms Expansion
Browse Histories
Relevance Feedback

3
1.Search Histories

phases
- collecting information from users (all
searches, for which at least one of the results
was clicked were logged per user)
- Creating User profile and Result Profile
- Evaluation

4
1.1 System Architecture

GoogleWrapper
- monitors users maintaining a log of
? submitted queries for which at least one
result was visited
? user-selected snippets from retrieved
results
? retrieved top 10 results (title and
summary)
Categorizer
- classify each queries and snippets for each
user as well as the snippets for the
search
engine results.
- SVM, K-NN

5
1.2 Creation of User Profile

User Interests
- snippets (titles plus the textual
summaries)
Representation
- ODP Concept Hierarchy, 1869 concepts, the top
3 level
Concept weight
Item Classification
- per item added to 4 concepts for query
- per item added to 5 concepts for snippet

6
1.2 Personalizing Search Results

Calculate conceptual match based on similarity
between each document profile and user profile
- The search result titles and summaries are
classified to create a document profile
Re-rank results based on conceptual match
- rank order produced called conceptual rank

7
1.2 Personalizing Search Results

The final rank

a has a value between 0 and 1
The conceptual and search engine based
rankings can be blended in different
proportions by varying the value of a

8
1.3 Experimental Setup

Preliminary study
- randomly selected 100 queries out of 576
collected the top-ranked result was the
most
frequently selected (60), the second
(20),
and the third (14).
Conclusions
- users rarely look at results beyond the first
page
- user judgments are affected by presentation
order
Top 10 results
- Reimplement GoogleWrapper so that only top 10
results are displayed in random order

9
1.4 Experimental Data

Monitored six volunteers (6 months).
Removed duplicate queries for each user
Distributed 45 queries per user (270 total) into
the following sets
- 240 (40 per user) queries were used for
training the 2 user profiles (query-based and
snippet-based)
- 30 (5 per user) queries were used for testing
personalized search parameters

10
1.Search Histories - Experiment 1

Profile built classifying and combining each
query (4 concepts from each classified query).
Average Google Rank is 5.1
Best average conceptual rank is 3.4 ( using 1
concepts from user profile and 30 queries to
create the profile)

11
1.Search Histories - Experiment 2

Profile built classifying and combining each
snippet (5 concepts from each classified
snippet).
Average Google Rank is 5.1
Best average conceptual rank is 3.2 ( using 20
concepts from user profile and 30 snippets to
create the profile)

12
1.Search Histories - Experiment 3

Final rank combination of original rank with
conceptual rank.
Query-based profile.

13
1.Search Histories - Experiment 4

Final rank combination of original rank with
conceptual rank.
Snippet-based profile.

14
1.Search Histories - Experiment 5

To verify that the 2 types of profiles
(query-based and snippet-based) are able to
improve queries never seen before.
12 (2 per user) testing queries never seen
before
Query-based profile (30 queries and 4 concepts
per query)
Snippet-based profile (30 snippets and 20
concepts per snippet)

15
2. Query Terms Expansion

Expanding with Local Desktop Analysis
- Term and Document Frequency
- Lexical Compounds
- Sentence Selection
Expanding with Global Desktop Analysis
- Term Co-occurrence Statistics
- Thesaurus Based Expansion
Experiments
Adaptive Analysis

16
2.1 Expanding with Local Desktop Analysis

Term and Document Frequency
- Calculation of Term Score
- Calculation Document Frequency

17
2.1 Expanding with Local Desktop Analysis

Lexical Compounds
- select frequent compound terms

18
2.1 Expanding with Local Desktop Analysis

Sentence Selection
- methods
(1) Identified the set of relevant Desktop
documents
(2) Outputted summary containing the most
importance sentences
- calculation of sentence score

PS(Avg(NS) - SentenceIndex)/Avg2(NS) for the
first 10 sentences and 0 otherwise TQ the number
of query terms present in sentence NQthe total
number of terms from the query
19
2.2 Expanding with Global Desktop Analysis

Term Co-occurrence Statistics
- Co-occurrence based keyword similarity search

20
2.2 Expanding with Global Desktop Analysis

Term Co-occurrence Statistics
- similarity calculation
(1) Cosine Similarity
(2) Mutual Information
(3) Likelihood Ratio

21
2.2 Expanding with Global Desktop Analysis

Thesaurus Based Expansion

22
2.3 Experiments

Experimental Setup
18 Subjects
Indexing Documents with Lucene
Selection of Queries
(1) Top Log Query in AltaVista, Avg length2
(2) Randomly Selected Log Query, Avg length2.3
(3) Self-selected Specific Query, only one
meaning, Avg length2.9
(4) Self-selected Ambiguous Query, at least 3
meanings, Avg length1.8

23
2.3 Experiments

Experimental Setup (continue)
Evaluation of query terms score of 1 to 5
- top3.11
- random3.72
- Self-selected Specific4.45
- self-selected ambiguous4.39
Issued 72 queries, evaluated 6000 URLs
0-2(non-relevant, relevant, highly relevant).
Evaluation metrics NDCG
the number of generated expansion keywords4

24
2.3.1 Experiment 1
25
2.3.2 Experiment 2
26
3. Adaptive Analysis

Query Clarity level
Query Length
- the number of non-stop words in the query
The distribution of informative amount in query
terms
-
-

sidf is the standard deviation of the idf of the
terms in Q
idfmax and idfmin are the maximum and minimum
idf among the terms in Q respectively

27
3. Adaptive Analysis

Query Clarity level (continue)
Query Scope (relates to the IDF of the entire
query)
Query Clarity (measures the divergence between
the language model associated to the user query
and the language model associated to the
collection)

28
3. Adaptive Analysis
29
3. Adaptive Analysis