DBXplorer:%20A%20System%20for%20Keyword-Based%20Search%20over%20Relational%20Databases - PowerPoint PPT Presentation

About This Presentation
Title:

DBXplorer:%20A%20System%20for%20Keyword-Based%20Search%20over%20Relational%20Databases

Description:

Internet search engines have popularized keyword-based search. ... pub-cell returns more number of SQL statements than pub-col as for a keyword in ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 17
Provided by: cmk264
Learn more at: https://crystal.uta.edu
Category:

less

Transcript and Presenter's Notes

Title: DBXplorer:%20A%20System%20for%20Keyword-Based%20Search%20over%20Relational%20Databases


1
DBXplorer A System for Keyword-Based Search over
Relational Databases
  • Sanjay Agrawal
  • Surajit Chaudhuri
  • Gautam Das
  • Presented by
  • Bhushan Pachpande

2
Contents
  1. Introduction
  2. Overview of DBXplorer
  3. Symbol Table Design - Publish
  4. Keyword Search
  5. Support for Generalized Matches
  6. Conclusion

3
Introduction
  • Internet search engines have popularized
    keyword-based search.
  • Searching on traditional database management
    system is done through customized applications
    which are closely tied to the database schema.
  • Traditional database management systems do not
    support keyword-based search.
  • e.g. search the Microsoft intranet on Jim Gray
    to obtain matched rows, i.e., rows in the
    database where Jim Gray occur.
  • In this paper, DBXplorer, an efficient and
    scalable keyword search utility for relational
    databases, is described.
  • The goal is to enable such searches without
    necessarily requiring the users to know the
    schema of the respective databases.

4
Example
Searching for a book
  • Keywords Programming by Ritchie
  • Less probability of presence of both keywords in
    single row of table
  • For result, rows need to be generated by joining
    tables on the fly (all possible combinations)

Authors
AuthorsBooks
Books
BookStores
Store
5
Overview of DBXplorer
  • Objective - Given a set of query keywords,
    DBXplorer returns all rows (either from single
    tables, or by joining tables connected by
    foreign-key joins) such that the each row
    contains all keywords.
  • Applying IR techniques from the documents world
    to databases is difficult, because of
  • Database normalization by which logical units
    of information may be fragmented and scattered
    across several tables
  • Matching row may be obtained by joining several
    tables on the fly
  • IR techniques use Inverted Lists Symbol Table
    in databases
  • Symbol Table - Stores the information about
    keywords at different granularities (column/row),
    i.e. for each keyword stores the list of all rows

6
Overview of DBXplorer - Publish
  • Enabling keyword search in DBXplorer requires 2
    steps
  • Publish
  • enables database for keywords by building Symbol
    table and associated structures
  • Search
  • retrieves matching rows from published database
  • Steps in Publish
  • Step 1 A database is identified, along with the
    set of tables and columns within the database to
    be published.
  • Step 2 Auxiliary tables like Symbol table are
    created for supporting keyword searches.

7
Overview of DBXplorer - Search
  • Steps in Search
  • Looks up symbol table to find tables / columns
    which contain keywords
  • Identify all possible joins (subsets of table if
    joined) whose rows might contain required
    keywords
  • Generate SQL statements for each join (gives rows
    which contain all keywords), rank rows and return

8
Architecture of DBXplorer
  • The publish component provides interfaces to
  • select a database,
  • select tables/columns within the database to
    publish, and
  • modify/remove/maintain the publication.
  • For a given set of keywords, the search component
    provides interfaces to
  • retrieve matching databases from a set of
    published databases, and
  • selectively identify tables, columns/rows that
    need to be searched within each database
    identified in step (1).

Architecture of DBXplorer
9
Symbol Table Design
  • Exact match problem considered only
  • Symbol Table (S) - stores the information about
    keywords at different granularities (column/row),
    i.e. for each keyword stores the list of all rows
  • Column Level granularity (Pub-Col)
  • For every keyword S maintains list of all
    database columns (i.e. table.column)
  • Cell level granularity (Pub-cell)
  • For every keyword S maintains list of all
    database cells (i.e. table.column.rowid)

10
Symbol Table - Design factors
  • Space and Time Requirements
  • Size pub-col are smaller than pub-cell since
    repetition of keyword in a column does not
    increases entries in case of pub-col
  • Time pub-col takes less time to build
  • Keyword search performance
  • Depends on efficient generation and execution of
    SQL statement (built from symbol table entries)
  • pub-cell returns more number of SQL statements
    than pub-col as for a keyword in column there are
    multiple entries
  • Ease of maintenance
  • Insert/Update required for insertion of distinct
    new value in case of pub-col while pub-cell needs
    for every update/insert
  • Same for delete

11
Storing Symbol Table
  • Store symbol tables (pub-col) in database as
    (keyword hash , column Id)
  • FK Compression (Foreign Key)
  • If there is foreign key relationship between c1
    and c2, store only c1
  • CP Compression
  • Partition H into a minimum number of bipartite
    cliques (a bipartite clique is any subgraph of H
    with a maximal number of edges).
  • Compress each clique.
  • Stores symbol table (pub-cell) in database as
    (keyword hash, list of all cellids)

v2 v3 v4
c1 c2
x
Uncompressed hash table
ColumnsMap table
Compressed hash table
12
Search - Enumerating Join Trees
  • Step1 - Looks up symbol table to find tables /
    columns which contain keywords
  • Step2 - Enumerate join trees
  • Identify and enumerate all potential subsets of
    tables in the database that, if joined, might
    contain rows having all keywords.
  • The resulting relation will contain all potential
    rows having all keywords specified in the query.

keywords
  • If we view the schema graph G as an undirected
    graph, this step enumerates join trees, i.e.,
    sub-trees of G such that
  • the leaves belong to MatchedTables
  • together, the leaves contain all keywords of the
    query

Join Trees
13
Search Identify matching rows
  • The input to this final search step is the
    enumerated join trees.
  • Each join tree is then mapped to a single SQL
    statement that joins the tables as specified in
    the tree, and selects those rows that contain all
    keywords.
  • The retrieved rows are ranked before being
    output.
  • Rows ranked by number of joins involved (ties
    broken arbitrarily) (same as keywords occurring
    close to one another in documents are ranked
    higher)

Join Trees
14
Generalized Matches Token Matches
  • Token matches - the keyword in the query matches
    only a token or sub-string of an attribute value
    (e.g., retrieve rows of address by specifying
    only a street name).
  • Pub-Prefix method
  • B tree indexes can be used to retrieve rows
    whose cell matches a given prefix string
  • This clause is of the form
  • WHERE T.C LIKE PK
  • During publishing of a database, for every
    keyword K, the entry (hash(K), T.C, P) is kept in
    the symbol table if there exists a string in
    column T.C which
  • contains a token K, and
  • has prefix P

15
Generalized Matches - Token Matches
Let the hash values of the searchable tokens
i.e., string, ball and round be 1, 2 and 3
respectively
RowId C
1 this is a string
2 this string
3 this is a ball
4 no string
5 any ball is round
Pub-Prefix table
Database table T
Consider searching keyword string Pub-Prefix
table returns prefixes th and no and
subsequent SQL will contain (T.C LIKE
thstring) OR (T.C LIKE nostring)
16
Conclusion
  • This paper discusses DBXplorer, a system that
    enables keyword-based search in relational
    databases.
  • DBXplorer uses symbol table alternatives to store
    the location of keywords in database.
  • DBXplorer support exact matches and generalized
    matches upto some extent.
Write a Comment
User Comments (0)
About PowerShow.com