Title: TOPIC CENTRIC QUERY ROUTING
1TOPIC CENTRIC QUERY ROUTING
- Research Methods (CS689)
- 11/21/00
By Anupam Khanal
2- Introduction
- What is query routing?
- Searching online can be both rewarding and
frustrating. - General search engines such as Yahoo, Lycos
return many - irrelevant information to users query.
- In such context, query routing attempts
dynamically route - each users query to the appropriate
specialized search. -
3Problem Description
- There are many general search engines such as
Yahoo, - Lycos, Alta Vista etc.
- There are also many topic specific search
engines such as VaccationSpot.com,
KidsHealth.com etc. - However, many casual users are not familiar
with all these topic specific search engines. - In such context, topic centric query expansion
is important. -
-
4Why Topic Centric Query Routing
- It is of utmost importance to analyze other query
routing systems as well before we discuss the
importance of Topic - Centric routing.
- Manual Query Routing Services
- - provide the categorized list of specialized
search engine - users have to choose the
search engines - - although keyword search interface is
provided the terms that can be accepted
as the keywords are limited. - Query Routing based on Centroids
- - consist centroids which are summaries
of databases - - these summaries consists a complete
list of terms and frequencies of the
databases. -
5- - search engine is located by dividing
which databases are relevant to a user query by
comparing the query with each centroid. - - this technique cannot be applied to
most of the topic specific search engines
provided on the Web because of the restricted
access to their internal database. - Query Routing Without Centroids
- - Instead of centroids this systems generate
a short text to explain the centroids of
databases. - - if the search keywords are contained is
such text then only the search engine will
be located.
6Research Objective
- In such context, Topic Centric Query Routing is
- appropriate as it uses the routing model to
expand - the query.
- The general framework of the query routing model
is as - follows
- Getting relevant terms from the Web
- - routing model does not use any special
dictionaries, but it uses the Web as the
source of relevant terms. - - finds the Web documents relevant to
the user query - dynamically by submitting that query
to a general search engine. - - the relevant terms are extracted from
those documents.
7- Co-occurrence based evaluation of term
relevance. - - the mutual relevance of terms is evaluated
on the basis - of their co-occurrences in the documents.
- - the co-occurrences of the search keywords
are counted - in all the documents retrieved by the
general search engine. - - the routing model list all the distinct
terms contained in - all documents and counts for each term the
number of - documents that contain both the search key
word and that - term.
- Using a pseudo-feedback technique
- - it is difficult to determine the term
relevance from only the - results of a single document search on the
general search - engine.
-
8- -even relevant terms often have few
co-occurrences in - the selected documents of the first search.
- in such context, query routing model re-evaluates
- such low co-occurrences terms selecting terms to
be - re-evaluated from the first search results,
formulating - new queries by adding the selected terms to the
original - query and performing the co-occurrence based
evaluation - for each formulated query.
-
9Figure 4 Query expansion procedure.
10 6. Clustering all terms in D0 to at most three
clusters W1w11, ..., w1m, W2w21, ..., w2k
and W3w31, ..., w3j. 7. Formulate three
queries Q1-Q3 by combining W1-W3 with Q0 (for
example, Q1"w01 ... w0n w11 ... w1m"). 8. Get
document sets DT1-DT4 and D1-D3 by sending
QT1-QT4 and Q1-Q3 independently to a general
search engine. 9. Count co-occurrences in
DT1-DT4 and D1-D3. Sets of high co-occurrence
terms WTH1-WTH4 and WH1-WH3, as well as WH0 in
step 3, are query expansion results.
11(No Transcript)
12Query Routing Result Query python
User query
- If you are looking for information about
- movie-monty python
Phrase to explain topic
1600 Search/Go to Search the Internet Movie
Database 1600 Search/Go to The Roger Ebert
Movie Files 1600 Search/Go to Horror Search
Recommended topic Search Engines
13Other Topics.
Object oriented programming in python 7500
Search/Go to Index to Object Oriented Information
Sources 3600 Search/Go to Unix Programming
jpython- python in java 6300 Search/Go to
java.sun.com The Source for Java
Technology 5641 Search/Go to Gamelan- The
official Java Directory 4921 Search/Go to
JCentral Search the web for Java 4266
Search/Go to Index to Object Oriented Information
Sources
14- Importance of Topic Centric Query Routing
- Query Routing Model is used.
- Query Routing model doesnt generate centroids.
- IT consists an off line pre-processing
component and online interface. - Offline Query Routing Model takes as input a
set of search engines and creates for each
engine, an approximate textual model of that
engines content or scope.
15- Online Query Routing Model takes a user query
as input and applies a novel query expansion
technique to the query - Then it clusters the output of the query
expansion to suggest multiple topics that user
may be interested in. - Each topic is associated with a set of search
engines, eg., Python
16- Query Expansion model has the ability to
automatically obtain terms relevant to a query
from the web. - Using Query Expansion model, it is not
necessary to maintain a massive dictionary of
terms in a wide range of fields.
17Conclusion
- Topic centric query routing uses a query
expansion model. - Query expansion model obtains all the
information necessary in query routing form the
web. - Thus Query routing model is an intelligent
agent that uses the web as its knowledge and
identifies topics of given queries dynamically
by query expansion.