Answering Imprecise Queries over Autonomous Web Databases - PowerPoint PPT Presentation

About This Presentation

Title:

Answering Imprecise Queries over Autonomous Web Databases

Description:

Least Important Attribute ... Attribute dependence information not provided by sources ... RandomRelax randomly picks attribute to relax ... – PowerPoint PPT presentation

Number of Views:22

Avg rating:3.0/5.0

Slides: 22

Provided by: ULL8

Learn more at: https://rakaposhi.eas.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Answering Imprecise Queries over Autonomous Web Databases

1
Answering Imprecise Queries over Autonomous Web
Databases

Ullas Nambiar
Dept. of Computer Science
University of California, Davis

Subbarao Kambhampati Dept. of Computer
Science Arizona State University
5th April, ICDE 2006, Atlanta, USA
2
Dichotomy in Query Processing

IR Systems
User has an idea of what she wants
User query captures the need to some degree
Answers ranked by degree of relevance

Databases
User knows what she wants
User query completely expresses the need
Answers exactly matching query constraints

3
Why Support Imprecise Queries ?
4
Others are following
5

What does Supporting Imprecise Queries Mean?

The Problem Given a conjunctive query Q over a
relation R, find a set of tuples that will be
considered relevant by the user.
Ans(Q) xx ? R, Relevance(Q,x) gtc
Objectives
Minimal burden on the end user
No changes to existing database
Domain independent
Motivation
How far can we go with relevance model estimated
from database ?
Tuples represent real-world objects and
relationships between them
Use the estimated relevance model to provide a
ranked set of tuples similar to the query

6
Challenges

Estimating Query-Tuple Similarity
Weighted summation of attribute similarities
Need to estimate semantic similarity
Measuring Attribute Importance
Not all attributes equally important
Users cannot quantify importance

7
Our Solution AIMQ
8
An Illustrative Example

Relation- CarDB(Make, Model, Price, Year)
Imprecise query
Q - CarDB(Model like Camry, Price like
10k)
Base query
Qpr - CarDB(Model Camry, Price 10k)
Base set Abs
Make Toyota, Model Camry, Price
10k, Year 2000
Make Toyota, Model Camry, Price
10k, Year 2001

9
Obtaining Extended Set

Problem Given base set, find tuples from
database similar to tuples in base set.
Solution
Consider each tuple in base set as a selection
query.
e.g. Make Toyota, Model Camry, Price
10k, Year 2000
Relax each such query to obtain similar precise
queries.
e.g. Make Toyota, Model Camry, Price
, Year 2000
Execute and determine tuples having similarity
above some threshold.
Challenge Which attribute should be relaxed
first?
Make ? Model ? Price ? Year ?
Solution Relax least important attribute
first.

10
Least Important Attribute

Definition An attribute whose binding value
when changed has minimal effect on values binding
other attributes.
Does not decide values of other attributes
Value may depend on other attributes
E.g. Changing/relaxing Price will usually not
affect other attributes but changing Model
usually affects Price
Requires dependence between attributes to decide
relative importance
Attribute dependence information not provided by
sources
Learn using Approximate Functional Dependencies
Approximate Keys
Approximate Functional Dependency (AFD)
X ? A is a FD over r, r ? r
If error(X ? A ) r-r/ r lt 1 then X ? A
is a AFD over r.
Approximate in the sense that they are obeyed by
a large percentage (but not all) of the tuples in
the database

11
Deciding Attribute Importance

Mine AFDs and Approximate Keys
Create dependence graph using AFDs
Strongly connected hence a topological sort not
possible
Using Approximate Key with highest support
partition attributes into
Deciding set
Dependent set
Sort the subsets using dependence and influence
weights
Measure attribute importance as

Attribute relaxation order is all non-keys first
then keys
Greedy multi-attribute relaxation

12
Query-Tuple Similarity

Tuples in extended set show different levels of
relevance
Ranked according to their similarity to the
corresponding tuples in base set using
n Count(Attributes(R)) and Wimp is the
importance weight of the attribute
Euclidean distance as similarity for numerical
attributes e.g. Price, Year
VSim semantic value similarity estimated by
AIMQ for categorical attributes e.g. Make, Model

13
Categorical Value Similarity

Two words are semantically similar if they have a
common context from NLP
Context of a value represented as a set of bags
of co-occurring values called Supertuple
Value Similarity Estimated as the percentage of
common Attribute, Value pairs
Measured as the Jaccard Similarity among
supertuples representing the values

ST(QMakeToyota)
Model Camry 3, Corolla 4,.
Year 20006,19995 20012,
Price 59954, 65003, 40006
Supertuple for Concept MakeToyota
14
Empirical Evaluation

Goal
Test robustness of learned dependencies
Evaluate the effectiveness of the query
relaxation and similarity estimation
Database
Used car database CarDB based on Yahoo Autos
CarDB( Make, Model, Year, Price, Mileage,
Location, Color)
Populated using 100k tuples from Yahoo Autos
Census Database from UCI Machine Learning
Repository
Populated using 45k tuples
Algorithms
AIMQ
RandomRelax randomly picks attribute to relax
GuidedRelax uses relaxation order determined
using approximate keys and AFDs
ROCK RObust Clustering using linKs (Guha et al,
ICDE 1999)
Compute Neighbours and Links between every tuple
Neighbour tuples similar to each other
Link Number of common neighbours between two
tuples
Cluster tuples having common neighbours

15
Robustness of Dependencies
Attribute dependence order Key quality is
unaffected by sampling
16
Robustness of Value Similarities
Value Similar Values 25K 100k
MakeKia Hyundai 0.17 0.17
Isuzu 0.15 0.15
Subaru 0.13 0.13
MakeBronco Aerostar 0.19 0.21
F-350 0 0.12
Econoline Van 0.11 0.11
Year1985 1986 0.16 0.16
1984 0.13 0.14
1987 0.12 0.12
17
Efficiency of Relaxation
Guided Relaxation
Random Relaxation
18
Accuracy over CarDB