Answering Imprecise Queries over Autonomous Web Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Answering Imprecise Queries over Autonomous Web Databases

Description:

Least Important Attribute ... Attribute dependence information not provided by sources ... RandomRelax randomly picks attribute to relax ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 22
Provided by: ULL8
Category:

less

Transcript and Presenter's Notes

Title: Answering Imprecise Queries over Autonomous Web Databases


1
Answering Imprecise Queries over Autonomous Web
Databases
  • Ullas Nambiar
  • Dept. of Computer Science
  • University of California, Davis

Subbarao Kambhampati Dept. of Computer
Science Arizona State University
5th April, ICDE 2006, Atlanta, USA
2
Dichotomy in Query Processing
  • IR Systems
  • User has an idea of what she wants
  • User query captures the need to some degree
  • Answers ranked by degree of relevance
  • Databases
  • User knows what she wants
  • User query completely expresses the need
  • Answers exactly matching query constraints

3
Why Support Imprecise Queries ?
4
Others are following
5

What does Supporting Imprecise Queries Mean?
  • The Problem Given a conjunctive query Q over a
    relation R, find a set of tuples that will be
    considered relevant by the user.
  • Ans(Q) xx ? R, Relevance(Q,x) gtc
  • Objectives
  • Minimal burden on the end user
  • No changes to existing database
  • Domain independent
  • Motivation
  • How far can we go with relevance model estimated
    from database ?
  • Tuples represent real-world objects and
    relationships between them
  • Use the estimated relevance model to provide a
    ranked set of tuples similar to the query

6
Challenges
  • Estimating Query-Tuple Similarity
  • Weighted summation of attribute similarities
  • Need to estimate semantic similarity
  • Measuring Attribute Importance
  • Not all attributes equally important
  • Users cannot quantify importance

7
Our Solution AIMQ
8
An Illustrative Example
  • Relation- CarDB(Make, Model, Price, Year)
  • Imprecise query
  • Q - CarDB(Model like Camry, Price like
    10k)
  • Base query
  • Qpr - CarDB(Model Camry, Price 10k)
  • Base set Abs
  • Make Toyota, Model Camry, Price
    10k, Year 2000
  • Make Toyota, Model Camry, Price
    10k, Year 2001

9
Obtaining Extended Set
  • Problem Given base set, find tuples from
    database similar to tuples in base set.
  • Solution
  • Consider each tuple in base set as a selection
    query.
  • e.g. Make Toyota, Model Camry, Price
    10k, Year 2000
  • Relax each such query to obtain similar precise
    queries.
  • e.g. Make Toyota, Model Camry, Price
    , Year 2000
  • Execute and determine tuples having similarity
    above some threshold.
  • Challenge Which attribute should be relaxed
    first?
  • Make ? Model ? Price ? Year ?
  • Solution Relax least important attribute
    first.

10
Least Important Attribute
  • Definition An attribute whose binding value
    when changed has minimal effect on values binding
    other attributes.
  • Does not decide values of other attributes
  • Value may depend on other attributes
  • E.g. Changing/relaxing Price will usually not
    affect other attributes but changing Model
    usually affects Price
  • Requires dependence between attributes to decide
    relative importance
  • Attribute dependence information not provided by
    sources
  • Learn using Approximate Functional Dependencies
    Approximate Keys
  • Approximate Functional Dependency (AFD)
  • X ? A is a FD over r, r ? r
  • If error(X ? A ) r-r/ r lt 1 then X ? A
    is a AFD over r.
  • Approximate in the sense that they are obeyed by
    a large percentage (but not all) of the tuples in
    the database

11
Deciding Attribute Importance
  • Mine AFDs and Approximate Keys
  • Create dependence graph using AFDs
  • Strongly connected hence a topological sort not
    possible
  • Using Approximate Key with highest support
    partition attributes into
  • Deciding set
  • Dependent set
  • Sort the subsets using dependence and influence
    weights
  • Measure attribute importance as
  • Attribute relaxation order is all non-keys first
    then keys
  • Greedy multi-attribute relaxation

12
Query-Tuple Similarity
  • Tuples in extended set show different levels of
    relevance
  • Ranked according to their similarity to the
    corresponding tuples in base set using
  • n Count(Attributes(R)) and Wimp is the
    importance weight of the attribute
  • Euclidean distance as similarity for numerical
    attributes e.g. Price, Year
  • VSim semantic value similarity estimated by
    AIMQ for categorical attributes e.g. Make, Model

13
Categorical Value Similarity
  • Two words are semantically similar if they have a
    common context from NLP
  • Context of a value represented as a set of bags
    of co-occurring values called Supertuple
  • Value Similarity Estimated as the percentage of
    common Attribute, Value pairs
  • Measured as the Jaccard Similarity among
    supertuples representing the values

ST(QMakeToyota)
Model Camry 3, Corolla 4,.
Year 20006,19995 20012,
Price 59954, 65003, 40006
Supertuple for Concept MakeToyota
14
Empirical Evaluation
  • Goal
  • Test robustness of learned dependencies
  • Evaluate the effectiveness of the query
    relaxation and similarity estimation
  • Database
  • Used car database CarDB based on Yahoo Autos
  • CarDB( Make, Model, Year, Price, Mileage,
    Location, Color)
  • Populated using 100k tuples from Yahoo Autos
  • Census Database from UCI Machine Learning
    Repository
  • Populated using 45k tuples
  • Algorithms
  • AIMQ
  • RandomRelax randomly picks attribute to relax
  • GuidedRelax uses relaxation order determined
    using approximate keys and AFDs
  • ROCK RObust Clustering using linKs (Guha et al,
    ICDE 1999)
  • Compute Neighbours and Links between every tuple
  • Neighbour tuples similar to each other
  • Link Number of common neighbours between two
    tuples
  • Cluster tuples having common neighbours

15
Robustness of Dependencies
Attribute dependence order Key quality is
unaffected by sampling
16
Robustness of Value Similarities
Value Similar Values 25K 100k
MakeKia Hyundai 0.17 0.17
Isuzu 0.15 0.15
Subaru 0.13 0.13
MakeBronco Aerostar 0.19 0.21
F-350 0 0.12
Econoline Van 0.11 0.11
Year1985 1986 0.16 0.16
1984 0.13 0.14
1987 0.12 0.12
17
Efficiency of Relaxation
Guided Relaxation
Random Relaxation
18
Accuracy over CarDB
  • 14 queries over 100K tuples
  • Similarity learned using 25k sample
  • Mean Reciprocal Rank (MRR) estimated as
  • Overall high MRR shows high relevance of
    suggested answers

19
Accuracy over CensusDB
  • 1000 randomly selected tuples as queries
  • Overall high MRR for AIMQ shows higher relevance
    of suggested answers

20
AIMQ - Summary
  • An approach for answering imprecise queries over
    Web database
  • Mine and use AFDs to determine attribute order
  • Domain independent semantic similarity estimation
    technique
  • Automatically compute attribute importance scores
  • Empirical evaluation shows
  • Efficiency and robustness of algorithms
  • Better performance than current approaches
  • High relevance of suggested answers
  • Domain independence

21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com