Title: Supporting Queries with Imprecise Constraints
1Supporting Queries with Imprecise Constraints
- Ullas Nambiar
- Dept. of Computer Science
- University of California, Davis
Subbarao Kambhampati Dept. of Computer
Science Arizona State University
18th July, AAAI -06, Boston, USA
2Dichotomy in Query Processing
- IR Systems
- User has an idea of what she wants
- User query captures the need to some degree
- Answers ranked by degree of relevance
- Databases
- User knows what she wants
- User query completely expresses the need
- Answers exactly matching query constraints
3Why Support Imprecise Queries ?
4Others are following
5 What does Supporting Imprecise Queries Mean?
- The Problem Given a conjunctive query Q over a
relation R, find a set of tuples that will be
considered relevant by the user. - Ans(Q) xx ? R, Rel(xQ,U) gtc
- Constraints
- Minimal burden on the end user
- No changes to existing database
- Domain independent
Relevance Assessment
6Assessing Relevance Function Rel(xQ,U)
- We looked at a variety of non-intrusive relevance
assessment methods - Basic idea is to learn the relevance function for
user population rather than single users - Methods
- From the analysis of the (sample) data itself
- Allows us to understand the relative importance
of attributes, and the similarity between the
values of an attribute - ICDE 2006WWW 2005 poster
- From the analysis of query logs
- Allows us to identify related queries, and then
throw in their answers - WIDM 2003 WebDB 2004
- From co-click patterns
- Allows us to identify similarity based on user
click pattern - Under Review
7Our Solution AIMQ
8The AIMQ Approach
For the special case of empty query, we start
with a relaxation that uses AFD analysis
9An Illustrative Example
- Relation- CarDB(Make, Model, Price, Year)
- Imprecise query
- Q - CarDB(Model like Camry, Price like
10k) - Base query
- Qpr - CarDB(Model Camry, Price 10k)
- Base set Abs
- Make Toyota, Model Camry, Price
10k, Year 2000 - Make Toyota, Model Camry, Price
10k, Year 2001
10Obtaining Extended Set
- Problem Given base set, find tuples from
database similar to tuples in base set. - Solution
- Consider each tuple in base set as a selection
query. - e.g. Make Toyota, Model Camry, Price
10k, Year 2000 - Relax each such query to obtain similar precise
queries. - e.g. Make Toyota, Model Camry, Price
, Year 2000 - Execute and determine tuples having similarity
above some threshold. - Challenge Which attribute should be relaxed
first? - Make ? Model ? Price ? Year ?
- Solution Relax least important attribute
first.
11Least Important Attribute
- Definition An attribute whose binding value
when changed has minimal effect on values binding
other attributes. - Does not decide values of other attributes
- Value may depend on other attributes
- E.g. Changing/relaxing Price will usually not
affect other attributes but changing Model
usually affects Price - Dependence between attributes useful to decide
relative importance - Approximate Functional Dependencies Approximate
Keys - Approximate in the sense that they are obeyed by
a large percentage (but not all) of tuples in the
database - Can use TANE, an algorithm by Huhtala et al
1999
12Deciding Attribute Importance
- Mine AFDs and Approximate Keys
- Create dependence graph using AFDs
- Strongly connected hence a topological sort not
possible - Using Approximate Key with highest support
partition attributes into - Deciding set
- Dependent set
- Sort the subsets using dependence and influence
weights - Measure attribute importance as
- Attribute relaxation order is all non-keys first
then keys - Greedy multi-attribute relaxation
13Tuple Similarity
- Tuples obtained after relaxation are ranked
according to their - similarity to the corresponding tuples in base
set - where Wi normalized influence weights, ? Wi 1
, i 1 to Attributes(R) - Value Similarity
- Euclidean for numerical attributes e.g. Price,
Year - Concept Similarity for categorical e.g. Make,
Model
14Categorical Value Similarity
- Two words are semantically similar if they have a
common context from NLP - Context of a value represented as a set of bags
of co-occurring values called Supertuple - Value Similarity Estimated as the percentage of
common Attribute, Value pairs - Measured as the Jaccard Similarity among
supertuples representing the values
ST(QMakeToyota)
Model Camry 3, Corolla 4,.
Year 20006,19995 20012,
Price 59954, 65003, 40006
Supertuple for Concept MakeToyota
15Value Similarity Graph
16Empirical Evaluation
- Goal
- Evaluate the effectiveness of the query
relaxation and similarity estimation - Database
- Used car database CarDB based on Yahoo Autos
- CarDB( Make, Model, Year, Price, Mileage,
Location, Color) - Populated using 100k tuples from Yahoo Autos
- Census Database from UCI Machine Learning
Repository - Populated using 45k tuples
- Algorithms
- AIMQ
- RandomRelax randomly picks attribute to relax
- GuidedRelax uses relaxation order determined
using approximate keys and AFDs - ROCK RObust Clustering using linKs (Guha et al,
ICDE 1999) - Compute Neighbours and Links between every tuple
- Neighbour tuples similar to each other
- Link Number of common neighbours between two
tuples - Cluster tuples having common neighbours
17Efficiency of Relaxation
Guided Relaxation
Random Relaxation
18Accuracy over CarDB
- 14 queries over 100K tuples
- Similarity learned using 25k sample
- Mean Reciprocal Rank (MRR) estimated as
- Overall high MRR shows high relevance of
suggested answers
19Handling Imprecision Incompleteness
- Incompleteness in data
- Databases are being populated by
- Entry by lay people
- Automated extraction
- E.g. entering an accord without mentioning
Honda
- Imprecision in queries
- Queries posed by lay users
- Who combine querying and browsing
General Solution Expected Relevance Ranking
Challenge Automated Non-intrusive assessment
of Relevance and Density functions
20Handling Imprecision Incompleteness