Title: CS246
1CS246
2Mind Your Vocabulary
- Q What is the problem?
- A How to integrate heterogeneous sources when
their schema capability are different
3Bestbookbuys.com
Mediator
fn Tom ln Clancy
au Clancy, Tom
Amazon.com
bn.com
4Framework
- User expresses a query using a mediator schema
- Mediator translates the query to source-supported
queries - Mediator collects and postprocess results from
the sources
fn Tom ln Clancy
Mediator
fn Tom ln Clancy
au Clancy, Tom
Amazon.com
bn.com
5Difference From Previous Studies?
- Heterogeneous attributes
- Different vocabularies
- Semantic translation necessary
- Previous studies assumed homogeneous attributes
for all sources - Complex Boolean queries
- Not just conjunctive queries
6Main Challenge
- How to best translate a query when the mediator
and the source use different model/schema? - Author ? lastname, firstname
- Western calendar ? Chinese lunar calendar
7Query Translation Example
- Q For the above schema, best translation for
last Clancy year 1998 month
Jan? - A author Clancy date winter, 1998?
8More Translation Examples
- More translations for the same schemas
- publisher p last l first f?
publisher p author l, f - title t last l first f? title
t author l, f - Do we have to translate every possible query
manually? Is it necessary to have separate rules
for the above translations? Can the system
automatically translate queries? - Any idea?
9Observations
- The system cannot figure out last l first
f ? author l, f - No semantic knowledge
- User needs to provide these types of mappings
- There seem to exist basic mappings
- However, system may compose correct translation
using basic translations - last l first f ? author l, f
- year yy month Jan ? date spring,
yy
10Framework
Mediator Context
Source Context
- Human expert provides a set of basic rules
- last l first f ? author l, f
- year yy month Jan ? date spring,
yy
Basic rules
11Framework
- Given a query, the system automatically
translates the query using the basic rules
Basic rules
Qm First Tom Last Clancy
Qs Author Clancy, Tom
Traslation Algorithm
12Advantage of the Proposed Framework
- Minimizes manual intervention
- Human input only for the initial rule writing
- Can translate any queries
- Not just template queries
13Questions
- How do we know whether a translation is good or
correct? - What basic rules are necessary?
- Do we need a rule for last l first f?
- How do we translate?
- Algorithm for good translation?
14Good Translation?
- Q Why do we think these are good translations?
- last Clancy first Tom ? author
Clancy, Tom - year 2002 month Jan? date
winter, 2002 - A Results for the translated queries are close
to the original queries
15Minimal Subsuming (or Containing) Translation
- Definition of closeness in the paper
- Q original query ? S(Q) translated query
- We also use Q and S(Q) to represent results
- S(Q) minimal superset of Q expressed in the
source terms
16Minimal Subsuming Translation
- Find the minimal subsuming translation from the
original query - Filter out false positives by applying
filtering condition at the mediator
17Any Alternative for Closeness?
- What about maximal subsumed translation?
- Definition of previous studies
- Maybe a good definition when result is large or
filtering is impossible
18Any Alternative for Closeness?
- Consider both false positives and false
negatives Maximize S(Q)?Q / S(Q)?Q - Other definitions possible depending on scenario
19Questions
- How do we know whether a translation is good or
correct? - Minimal subsuming translation
- What basic rules are necessary?
- Do we need a rule for last l first f?
- How do we translate?
- Algorithm for good translation?
20Three Main Concepts
- Query Separability
- Query Safety
- Cross matching
21Query Separability
- Q ln Clancy fn Tom p
Wiley - We still get minimal subsuming translation if we
separately translate - ln Clancy fn Tom and p Wiley
- Q C1 ? C2 ? C3 (? or ) is separable if
S(Q) S(C1) ? S(C2) ? S(C3)
22Disjunction Separability Theorem CGM96
- Disjunctions are always separable
- Q C1 C2 C3 ? S(Q) S(C1) S(C2)
S(C3)for any C1, C2 and C3 - Assuming minimal subsuming translation semantics
- Implication
- Basic rules are necessary only for conjunctions
- e.g., c1 c2, but not c1 c2
- Why?
- Any complex queries can be transformed to DNF
- Significant simplification for a rule writer
23Basic Rules
- Simple conjunction of constraints
- Separability of conjunctions is determined by a
human expert - ln fn but not ln publisher
- User-provided basic rules are assumed to be sound
and complete - Soundness All mappings are correct (minimal
subsuming translation) - Completeness Contains all inseparable simple
conjunctions
24Questions
- How do we know whether a translation is good or
correct? - What basic rules are necessary?
- Do we need a rule for last l first f?
- How do we translate?
- Algorithm for good translation?
25Translation Algorithm
- Simple conjunction query
- Step 1 Find all matching rules
Q
Rules
ln l ? au l ln l fn f ? au
l, f p p ? p p
ln l
fn f
p p
26Translation Algorithm
- Simple conjunction query
- Step 2 Remove subset matching
- Superset matching is more precise
Q
Rules
ln l ? au l ln l fn f ? au
l, f p p ? p p
ln l
fn f
p p
au l
au l, f
p p
27Translation Algorithm
- Simple conjunction query
- Step 3 Generate translated query
Q
Rules
ln l ? au l ln l fn f ? au
l, f p p ? p p
ln l
fn f
p p
au l, f
p p
28Translation Algorithm
Q
29Solution 1 (Algorithm DNF)
- Convert to DNF and translate
- Disjunctions are always separable
- We can individually translate each
disjunct
Q
30Whats Wrong with DNF?
- DNF conversion is exponential
- DNF parse tree is not compact
- Global conversion often not necessary
- Translation of C3 is independent of others
C1
C2
C3
p ... independent
x fn
y fn
z ln ...
31Conjunction Partitioning
- Partition conjuncts into independent groups
- Translate each group separately
- By rewriting local groups
- Top level AND of C3 is preserved.
C1
C2
C3
p ... independent
x fn
y fn
z ln ...
32Independent Groups?
- Q How do we know G1 and G2 are independent?
- A Q G1 G2 is separable
- Q How do we know Q G1 G2 is separable?
33Safety Condition
- Query seperability is difficult to check directly
- Safety condition A practical way to check query
separability - Sufficient condition for query separability
- But not a necessary condition
34Safety Condition for Simple Conjunction
- M(Q) Matching rules for Q
- Q G1 G2
- G1 and G2 are simple conjunction
- G1 C1 C2, G2 C3 C4
- Q is safe iff M(Q) M(G1) ? M(G2)
- That is, Q is safe if there is no cross
matching among G1 and G2 - Cross matching a rule that matches some
constraints in G1 and some constraints in G2 - Example
- G1 fnf1 fn f2, G2 ln ln
- Q G1 G2 unsafe cross matching of fn ln ?
au
35Safety Condition for Complex Disjunction
- M(Q) Matching rules for Q
- Q G1 G2
- G1 and G2 are complex disjunction
- G1 C1 C2, G2 C3 C4
- Disjuntivize Q Q C1 C3 C1 C4
C2 C3 C2 C4 - Q is safe iff every disjunct is safe i.e., if
all C1 C3, C1 C4, C2 C3, and C2
C4 are safe
36Important Theorem
- A query is separable if it is safe (i.e., query
separability ? safety) - A query is safe if there is no cross
matching(i.e., safety ? no cross matching) - If there is a cross-matching between conjuncts,
we cannot separately translate them - Put them into the same group
37Algorithm TDQM
- Recursively traverse the query tree in the
top-down order - At a disjunction node
- Separately translate its children
- At a conjunction node
- Put the children with cross matching into the
same group and rewrite the query locally in each
group
38Algorithm TDQM
xfn
yfn
zln ...
vp ...
wy ...
Recursively traverse the tree top-down
- At a disjunction node
- Separately apply TDQM each child
- Disjunction separability theorem
39Algorithm TDQM
xfn
yfn
zln ...
vp ...
wy ...
- At a conjunction node
- Group children by identifying cross-matchings
- No cross-matching between groups (safety
condition)
40Algorithm TDQM
wy ...
- For groups with more than one conjunct
- Locally rewrite into a disjunctive form (not DNF)
41Algorithm TDQM
wy ...
- For groups with more than one conjunct
- Locally rewrite into a disjunctive form (not DNF)
42Algorithm TDQM
wy ...
vp ...
x
z
y
z
- Continue tree traversal until we reach simple
conjunction and apply basic mappings
43Algorithm TDQM
- Generates minimal subsuming translation
- Resulting translation is compact
- Assuming the original query is compact
- Convert the tree only when it is necessary
44TDQM Summary
- Key conceptsSeperability ? Safety ?? cross
matching - Local rewriting for compact translation
45A Few Remarks
- Final algorithm is straightforward
- Simply put, separately translate each term if
there is no cross-matching - Many people can come up with the algorithm
- But the author developed an amazing theory by
carefully studying basic questions - Initial problem looks rather trivial
- But a mine-field of interesting research topics
46Questions?