Technologies for Personalized Web search - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Technologies for Personalized Web search

Description:

ODP Concept Hierarchy, 1869 concepts, the top 3 level. Concept weight ... Issued 72 queries, evaluated 6000 URLs :0-2(non-relevant, relevant, highly relevant) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 34
Provided by: rucX
Category:

less

Transcript and Presenter's Notes

Title: Technologies for Personalized Web search


1
Technologies for Personalized Web search
2
Outline
  • Search Histories
  • Query Terms Expansion
  • Browse Histories
  • Relevance Feedback

3
1.Search Histories
  • phases
  • - collecting information from users (all
    searches, for which at least one of the results
    was clicked were logged per user)
  • - Creating User profile and Result Profile
  • - Evaluation

4
1.1 System Architecture
  • GoogleWrapper
  • - monitors users maintaining a log of
  • ? submitted queries for which at least one
  • result was visited
  • ? user-selected snippets from retrieved
    results
  • ? retrieved top 10 results (title and
    summary)
  • Categorizer
  • - classify each queries and snippets for each
  • user as well as the snippets for the
    search
  • engine results.
  • - SVM, K-NN

5
1.2 Creation of User Profile
  • User Interests
  • - snippets (titles plus the textual
    summaries)
  • Representation
  • - ODP Concept Hierarchy, 1869 concepts, the top
    3 level
  • Concept weight
  • Item Classification
  • - per item added to 4 concepts for query
  • - per item added to 5 concepts for snippet

6
1.2 Personalizing Search Results
  • Calculate conceptual match based on similarity
    between each document profile and user profile
  • - The search result titles and summaries are
    classified to create a document profile
  • Re-rank results based on conceptual match
  • - rank order produced called conceptual rank

7
1.2 Personalizing Search Results
  • The final rank
  • a has a value between 0 and 1
  • The conceptual and search engine based
  • rankings can be blended in different
  • proportions by varying the value of a

8
1.3 Experimental Setup
  • Preliminary study
  • - randomly selected 100 queries out of 576
  • collected the top-ranked result was the
    most
  • frequently selected (60), the second
    (20),
  • and the third (14).
  • Conclusions
  • - users rarely look at results beyond the first
    page
  • - user judgments are affected by presentation
    order
  • Top 10 results
  • - Reimplement GoogleWrapper so that only top 10
  • results are displayed in random order

9
1.4 Experimental Data
  • Monitored six volunteers (6 months).
  • Removed duplicate queries for each user
  • Distributed 45 queries per user (270 total) into
    the following sets
  • - 240 (40 per user) queries were used for
    training the 2 user profiles (query-based and
    snippet-based)
  • - 30 (5 per user) queries were used for testing
    personalized search parameters

10
1.Search Histories - Experiment 1
  • Profile built classifying and combining each
    query (4 concepts from each classified query).
  • Average Google Rank is 5.1
  • Best average conceptual rank is 3.4 ( using 1
    concepts from user profile and 30 queries to
    create the profile)

11
1.Search Histories - Experiment 2
  • Profile built classifying and combining each
    snippet (5 concepts from each classified
    snippet).
  • Average Google Rank is 5.1
  • Best average conceptual rank is 3.2 ( using 20
    concepts from user profile and 30 snippets to
    create the profile)

12
1.Search Histories - Experiment 3
  • Final rank combination of original rank with
    conceptual rank.
  • Query-based profile.

13
1.Search Histories - Experiment 4
  • Final rank combination of original rank with
    conceptual rank.
  • Snippet-based profile.

14
1.Search Histories - Experiment 5
  • To verify that the 2 types of profiles
    (query-based and snippet-based) are able to
    improve queries never seen before.
  • 12 (2 per user) testing queries never seen
    before
  • Query-based profile (30 queries and 4 concepts
    per query)
  • Snippet-based profile (30 snippets and 20
    concepts per snippet)

15
2. Query Terms Expansion
  • Expanding with Local Desktop Analysis
  • - Term and Document Frequency
  • - Lexical Compounds
  • - Sentence Selection
  • Expanding with Global Desktop Analysis
  • - Term Co-occurrence Statistics
  • - Thesaurus Based Expansion
  • Experiments
  • Adaptive Analysis

16
2.1 Expanding with Local Desktop Analysis
  • Term and Document Frequency
  • - Calculation of Term Score
  • - Calculation Document Frequency

17
2.1 Expanding with Local Desktop Analysis
  • Lexical Compounds
  • - select frequent compound terms

18
2.1 Expanding with Local Desktop Analysis
  • Sentence Selection
  • - methods
  • (1) Identified the set of relevant Desktop
    documents
  • (2) Outputted summary containing the most
    importance sentences
  • - calculation of sentence score

PS(Avg(NS) - SentenceIndex)/Avg2(NS) for the
first 10 sentences and 0 otherwise TQ the number
of query terms present in sentence NQthe total
number of terms from the query
19
2.2 Expanding with Global Desktop Analysis
  • Term Co-occurrence Statistics
  • - Co-occurrence based keyword similarity search

20
2.2 Expanding with Global Desktop Analysis
  • Term Co-occurrence Statistics
  • - similarity calculation
  • (1) Cosine Similarity
  • (2) Mutual Information
  • (3) Likelihood Ratio

21
2.2 Expanding with Global Desktop Analysis
  • Thesaurus Based Expansion

22
2.3 Experiments
  • Experimental Setup
  • 18 Subjects
  • Indexing Documents with Lucene
  • Selection of Queries
  • (1) Top Log Query in AltaVista, Avg length2
  • (2) Randomly Selected Log Query, Avg length2.3
  • (3) Self-selected Specific Query, only one
    meaning, Avg length2.9
  • (4) Self-selected Ambiguous Query, at least 3
    meanings, Avg length1.8

23
2.3 Experiments
  • Experimental Setup (continue)
  • Evaluation of query terms score of 1 to 5
  • - top3.11
  • - random3.72
  • - Self-selected Specific4.45
  • - self-selected ambiguous4.39
  • Issued 72 queries, evaluated 6000 URLs
    0-2(non-relevant, relevant, highly relevant).
  • Evaluation metrics NDCG
  • the number of generated expansion keywords4

24
2.3.1 Experiment 1
25
2.3.2 Experiment 2
26
3. Adaptive Analysis
  • Query Clarity level
  • Query Length
  • - the number of non-stop words in the query
  • The distribution of informative amount in query
    terms
  • -
  • -
  • sidf is the standard deviation of the idf of the
    terms in Q
  • idfmax and idfmin are the maximum and minimum
    idf among the terms in Q respectively

27
3. Adaptive Analysis
  • Query Clarity level (continue)
  • Query Scope (relates to the IDF of the entire
    query)
  • Query Clarity (measures the divergence between
    the language model associated to the user query
    and the language model associated to the
    collection)

28
3. Adaptive Analysis
29
3. Adaptive Analysis
  • Metrics for Clarity
  • Query Scope
  • Query Clarity

30
3. Adaptive Analysis
  • Categorizing Clarity Prediction

31
3. Adaptive Analysis
  • Analyzed Result

32
3. Adaptive Analysis-Experiment
33
3. Adaptive Analysis-Experiment
Write a Comment
User Comments (0)
About PowerShow.com