Personalized Ontologies for Web Search and Caching - PowerPoint PPT Presentation

About This Presentation
Title:

Personalized Ontologies for Web Search and Caching

Description:

extents to which top 5 categories describe document's content ... Evaluation of re-ranking search results: performance increase of up to 8 ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 21
Provided by: ale106
Learn more at: http://www.ittc.ku.edu
Category:

less

Transcript and Presenter's Notes

Title: Personalized Ontologies for Web Search and Caching


1
Personalized Ontologies for Web Search and Caching
  • Susan Gauch
  • Information and Telecommunications Technology
    Center
  • Electrical Engineering and Computer Science
  • The University of Kansas

2
Outline
  • Motivation
  • User profiles
  • creation and maintenance
  • evaluation
  • Applications
  • re-ranking (and filtering) search results
  • Web caching
  • Conclusions

3
Motivation
  • Decrease access time for Web pages
  • Server approaches
  • use access logs to decrease access times for
    popular pages
  • not tailored to individuals
  • doesnt decrease network traffic
  • Network approaches
  • cache popular pages multiple places in the
    network
  • not tailored to individuals

4
Personalization
  • Different information needs for different users
  • can we learn users interest?
  • Explicitly?
  • Implicitly
  • can we use this information?
  • improved search
  • improved browsing
  • faster Web page access

5
Intelligent Web Caching
  • Improved (and faster) search results
  • pre-caching all search results expensive
  • Internet search engines return 50 irrelevant
    pages
  • improved knowledge of users likely behavior
  • intelligent pre-caching
  • use past behaviors to predict future behaviors
  • pre-cache best pages close to individuals

6
Context
  • ProFusion www.profusion.com
  • OBIWAN distributed content based IR
  • Web clustered into regions
  • clustering criteria content, location, company
  • search query brokered to best regions within
    region brokered to most promising sites
  • browsing a region means browsing its sites
    simultaneously
  • www.ittc.ukases.edu/obiwan

7
User Profiles
  • Applications
  • Usenet news filtering
  • recommendation services web browsing, books
  • intelligent pre-caching
  • Should
  • accurately reflect actual interests
  • require as little feedback as possible
  • be dynamic

8
User profiles Creation
  • Obvious and often used keywords
  • not structured (ambiguous)
  • static
  • have to be explicitly mentioned
  • Our approach
  • watch over a user's shoulder while surfing
  • automatically determine documents content
  • central large ontology (concept hierarchy)

9
Document Classification
  • Documents as weighted
  • keyword vectors
  • n different words-gt n dimensions
  • weights based on
  • word frequency and rarity
  • Browsing hierarchy 10 web pages per node
  • Concatenate them -gt keyword vector
  • Content of a page most similar vector

10
Updating profiles
  • Static document related
  • content weights of top nodes for surfed document
  • length of page
  • Dynamic time spent
  • Combine them
  • for instanceweight (time/length)
  • changes in interest in the five categories
  • User profile weighted ontology

11
Profile evaluation
  • Accordance with actual user interests
  • 10/20 interest categories describe actual
    interests
  • describe interests
  • pretty well 3.5/5
  • Convergence
  • stabilization of ofcategories over time?
  • do converge after 320
  • surfed pages!

12
Profiles Summary
  • Stored as weighted ontologies
  • Profiles represent actual interests quite well
  • Up to 150 top categories
  • Two adjustment functions make profiles converge
  • after 320 pages
  • length of page doesn't really matter, but time
    spent does

13
Personalizing Search Results
  • 50 of top 20 results irrelevant
  • Same search mechanism for 200 million people?
  • Goal
  • identify relevant documents and put them on top
    of the result list
  • (pre-fetch relevant results)
  • Difficult problem 10 increase is very good

14
Re-Ranking
  • Ranking a function of
  • search engine's original ranking
  • extents to which top 5 categories describe
    document's content
  • personal interest in each of these top categories
  • More relevant items on top of result list
  • systems ability topresent all relevant items
  • systems ability to present only relevant items

15
Recall and Precision
  • Combination Recall/Precision graphs
  • Example ranked documents 1,,20
  • relevant 2,5,10,14,19
  • recall points 1/5, 2/5, 3/5, 4/5, 5/5
  • precisions 1/2, 2/5, 3/10, 4/14, 5/19

16
Re-Ranking Evaluation
  • Overall performance increase of up to 8
  • at each recall cutoff, up to 10 more relevant
    documents have been retrieved

17
Browsing Assistance
  • Analyze current page
  • locate links
  • Identify which links are most likely to be
    followed by the user
  • popularity of the link overall
  • relevance of linked page to users interests
  • Problem
  • if you have to download the whole page to analyze
    it, youve increased the network utilization

18
Privacy
  • Is the user aware that their behavior is being
    monitored?
  • Can users turn it off?
  • Where are profiles stored?
  • With whom are profiles shared?
  • How are profiles protected?
  • How are profiles used?

19
Conclusions
  • Automatic creation of structured user profiles is
    possible
  • Profiles are reasonably accurate
  • Applications in improving the search quality and
    Web page access efficiency
  • Evaluation of re-ranking search results
    performance increase of up to 8

20
Future Work
  • Incorporating profile generator into browser
  • Connect system to ProFusion, OBIWAN
  • Personalize structure of ontology
  • Re-train classifier
  • More applications recommendation service, web
    caching, browsing, ...
  • Explicit user feedback?
Write a Comment
User Comments (0)
About PowerShow.com