Navigation-Aided%20Retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Navigation-Aided%20Retrieval

Description:

Navigation-Aided Retrieval. Shashank Pandit and Christopher Olstony ... One is trying to synthesize structure ... account navigability factors ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 16
Provided by: cseLe
Category:

less

Transcript and Presenter's Notes

Title: Navigation-Aided%20Retrieval


1
Navigation-Aided Retrieval
  • Shashank Pandit and Christopher Olstony

Presentation by Yang Yu CSE 450 Web Data Mining
2
Outline
  • Introduction
  • Related Work
  • System Model
  • Prototype System
  • Evaluation
  • Summary Future Work

3
Introduction
  • Background reasons for this work
  • Difficulty in formulating appropriate queries
  • Open-ended search tasks
  • Preference for orienteering
  • Navigation-Aided
  • Retrieval

4
Introduction
  • Organic versus Synthetic Structure
  • One is trying to synthesize structure
    automatically into query results
  • One is trying to use structure that naturally
    exists in documents
  • Advantages of organic NAR
  • Human oversight.
  • Familiar user interface.
  • A single view of the document collection.
  • Robust implementation by a third party
  • Contributions
  • Formal model of navigation-aided retrieval
  • An overview of techniques for a NAR-based
    retrieval system
  • Empirical evaluation via a user study

5
Related Work
  • Selecting Starting Points
  • Best Trails system
  • An ad-hoc scoring function for starting points
  • Restrict starting points to be documents that
    themselves match the query
  • It does not take into account navigability
    factors
  • User interface departs substantially from the
    traditional interface
  • Topic distillation that mainly uses HITS
  • Only effective for broad topic areas for which
    there are many hubs and authorities
  • Guiding Navigation
  • WebWatcher highlights hyperlinks along paths
    taken by previous users who had posed similar
    queries.

6
System Model
  • Generic Model
  • Query submodel
  • Navigation submodel
  • generic scoring function
  • Assumption every member of relevance set St is a
    singleton set.
  • Fatten" St into d1, d2, , dn.

7
System Model
  • Instantiations of Generic Model
  • Conventional Probabilistic IR Model
  • Navigation-Conscious Model
  • The two terms embody the two key factors
  • the number of documents reachable from d that are
    relevant to the search
  • task
  • the ease and accuracy with which the user is able
    to navigate to those documents.

8
Prototype System
  • Preprocessing
  • Content Engine
  • Connectivity Engine ltd1, d2, dW, W(N(d2), d1,
    d2)gt
  • Intermediary

9
Prototype System
10
Prototype System
  • Selecting Starting Points
  • 1. Retrieve from the content engine all documents
    d relevant to q.
  • 2. For each relevant document d retrieved in
    Step 1, then retrieve from the connectivity
    engine all documents d that can navigate to d
  • 3. For each unique document d in Step 2, compute
    the starting point score
  • 4. Sort the documents in decreasing order of this
    score, truncate after the top k documents.

11
Prototype System
  • Adding Navigation Guidance
  • 1. Retrieve from the content engine all documents
    d for which R(d, q)gt T
  • 2. For each document d retrieved in Step 1,
    retrieve from the connectivity engine the tuple
    corresponding to ltd, dgt, if it exists.
  • 3. For each ltd1, d2, dW, W(N(d2), d1, d2)gt tuple
    retrieved in Step 2, highlight links on d that
    point to dW.
  • Efficiency and Scalability

12
Evaluation
  • Experimental Hypotheses
  • In query-only scenarios, Volant does not perform
    significantly worse
  • In combined query/navigation scenarios, Volant
    performs better
  • The best organic starting point is of higher
    quality than one that can be synthesized using
    existing techniques.
  • Search Task Test Sets
  • Unambiguous
  • Ambiguous
  • Performance on Unambiguous Queries

13
Evaluation
  • Performance on Ambiguous Queries
  • 4 Criteria - Breadth Accessibility Appeal
    Usefulness.

14
Summary and Future Work
  • Summary
  • Effectiveness
  • Relationship to conventional IR
  • Relationship to synthetic approaches
  • Future Work
  • Add redundancy to corpora
  • Tune scoring function to be applicable for
    synthetic starting points
  • Unified method can both for exploration and
    directly return document

15
  • Thank you!
  • Questions or Comments?
Write a Comment
User Comments (0)
About PowerShow.com