CS246 - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

CS246

Description:

User expresses a query using a mediator schema. Mediator translates the query to source ... Western calendar Chinese lunar calendar. Query Translation Example ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 47
Provided by: junghoo
Category:

less

Transcript and Presenter's Notes

Title: CS246


1
CS246
  • Query Translation

2
Mind Your Vocabulary
  • Q What is the problem?
  • A How to integrate heterogeneous sources when
    their schema capability are different

3
Bestbookbuys.com
  • How to integrate?

Mediator
fn Tom ln Clancy
au Clancy, Tom
Amazon.com
bn.com
4
Framework
  • User expresses a query using a mediator schema
  • Mediator translates the query to source-supported
    queries
  • Mediator collects and postprocess results from
    the sources

fn Tom ln Clancy
Mediator
fn Tom ln Clancy
au Clancy, Tom
Amazon.com
bn.com
5
Difference From Previous Studies?
  • Heterogeneous attributes
  • Different vocabularies
  • Semantic translation necessary
  • Previous studies assumed homogeneous attributes
    for all sources
  • Complex Boolean queries
  • Not just conjunctive queries

6
Main Challenge
  • How to best translate a query when the mediator
    and the source use different model/schema?
  • Author ? lastname, firstname
  • Western calendar ? Chinese lunar calendar

7
Query Translation Example
  • Q For the above schema, best translation for
    last Clancy year 1998 month
    Jan?
  • A author Clancy date winter, 1998?

8
More Translation Examples
  • More translations for the same schemas
  • publisher p last l first f?
    publisher p author l, f
  • title t last l first f? title
    t author l, f
  • Do we have to translate every possible query
    manually? Is it necessary to have separate rules
    for the above translations? Can the system
    automatically translate queries?
  • Any idea?

9
Observations
  • The system cannot figure out last l first
    f ? author l, f
  • No semantic knowledge
  • User needs to provide these types of mappings
  • There seem to exist basic mappings
  • However, system may compose correct translation
    using basic translations
  • last l first f ? author l, f
  • year yy month Jan ? date spring,
    yy

10
Framework
Mediator Context
Source Context
  • Human expert provides a set of basic rules
  • last l first f ? author l, f
  • year yy month Jan ? date spring,
    yy

Basic rules
11
Framework
  • Given a query, the system automatically
    translates the query using the basic rules

Basic rules
Qm First Tom Last Clancy
Qs Author Clancy, Tom
Traslation Algorithm
12
Advantage of the Proposed Framework
  • Minimizes manual intervention
  • Human input only for the initial rule writing
  • Can translate any queries
  • Not just template queries

13
Questions
  • How do we know whether a translation is good or
    correct?
  • What basic rules are necessary?
  • Do we need a rule for last l first f?
  • How do we translate?
  • Algorithm for good translation?

14
Good Translation?
  • Q Why do we think these are good translations?
  • last Clancy first Tom ? author
    Clancy, Tom
  • year 2002 month Jan? date
    winter, 2002
  • A Results for the translated queries are close
    to the original queries

15
Minimal Subsuming (or Containing) Translation
  • Definition of closeness in the paper
  • Q original query ? S(Q) translated query
  • We also use Q and S(Q) to represent results
  • S(Q) minimal superset of Q expressed in the
    source terms

16
Minimal Subsuming Translation
  • Find the minimal subsuming translation from the
    original query
  • Filter out false positives by applying
    filtering condition at the mediator

17
Any Alternative for Closeness?
  • What about maximal subsumed translation?
  • Definition of previous studies
  • Maybe a good definition when result is large or
    filtering is impossible

18
Any Alternative for Closeness?
  • Consider both false positives and false
    negatives Maximize S(Q)?Q / S(Q)?Q
  • Other definitions possible depending on scenario

19
Questions
  • How do we know whether a translation is good or
    correct?
  • Minimal subsuming translation
  • What basic rules are necessary?
  • Do we need a rule for last l first f?
  • How do we translate?
  • Algorithm for good translation?

20
Three Main Concepts
  • Query Separability
  • Query Safety
  • Cross matching

21
Query Separability
  • Q ln Clancy fn Tom p
    Wiley
  • We still get minimal subsuming translation if we
    separately translate
  • ln Clancy fn Tom and p Wiley
  • Q C1 ? C2 ? C3 (? or ) is separable if
    S(Q) S(C1) ? S(C2) ? S(C3)

22
Disjunction Separability Theorem CGM96
  • Disjunctions are always separable
  • Q C1 C2 C3 ? S(Q) S(C1) S(C2)
    S(C3)for any C1, C2 and C3
  • Assuming minimal subsuming translation semantics
  • Implication
  • Basic rules are necessary only for conjunctions
  • e.g., c1 c2, but not c1 c2
  • Why?
  • Any complex queries can be transformed to DNF
  • Significant simplification for a rule writer

23
Basic Rules
  • Simple conjunction of constraints
  • Separability of conjunctions is determined by a
    human expert
  • ln fn but not ln publisher
  • User-provided basic rules are assumed to be sound
    and complete
  • Soundness All mappings are correct (minimal
    subsuming translation)
  • Completeness Contains all inseparable simple
    conjunctions

24
Questions
  • How do we know whether a translation is good or
    correct?
  • What basic rules are necessary?
  • Do we need a rule for last l first f?
  • How do we translate?
  • Algorithm for good translation?

25
Translation Algorithm
  • Simple conjunction query
  • Step 1 Find all matching rules

Q
Rules

ln l ? au l ln l fn f ? au
l, f p p ? p p
ln l
fn f
p p
26
Translation Algorithm
  • Simple conjunction query
  • Step 2 Remove subset matching
  • Superset matching is more precise

Q
Rules

ln l ? au l ln l fn f ? au
l, f p p ? p p
ln l
fn f
p p
au l
au l, f
p p
27
Translation Algorithm
  • Simple conjunction query
  • Step 3 Generate translated query

Q
Rules

ln l ? au l ln l fn f ? au
l, f p p ? p p
ln l
fn f
p p

au l, f
p p
28
Translation Algorithm
  • Complex Boolean query?

Q


29
Solution 1 (Algorithm DNF)
  • Convert to DNF and translate
  • Disjunctions are always separable
  • We can individually translate each
    disjunct

Q
30
Whats Wrong with DNF?
  • DNF conversion is exponential
  • DNF parse tree is not compact
  • Global conversion often not necessary
  • Translation of C3 is independent of others


C1
C2
C3
p ... independent

x fn
y fn
z ln ...
31
Conjunction Partitioning
  • Partition conjuncts into independent groups
  • Translate each group separately
  • By rewriting local groups
  • Top level AND of C3 is preserved.


C1
C2
C3
p ... independent

x fn
y fn
z ln ...
32
Independent Groups?
  • Q How do we know G1 and G2 are independent?
  • A Q G1 G2 is separable
  • Q How do we know Q G1 G2 is separable?

33
Safety Condition
  • Query seperability is difficult to check directly
  • Safety condition A practical way to check query
    separability
  • Sufficient condition for query separability
  • But not a necessary condition

34
Safety Condition for Simple Conjunction
  • M(Q) Matching rules for Q
  • Q G1 G2
  • G1 and G2 are simple conjunction
  • G1 C1 C2, G2 C3 C4
  • Q is safe iff M(Q) M(G1) ? M(G2)
  • That is, Q is safe if there is no cross
    matching among G1 and G2
  • Cross matching a rule that matches some
    constraints in G1 and some constraints in G2
  • Example
  • G1 fnf1 fn f2, G2 ln ln
  • Q G1 G2 unsafe cross matching of fn ln ?
    au

35
Safety Condition for Complex Disjunction
  • M(Q) Matching rules for Q
  • Q G1 G2
  • G1 and G2 are complex disjunction
  • G1 C1 C2, G2 C3 C4
  • Disjuntivize Q Q C1 C3 C1 C4
    C2 C3 C2 C4
  • Q is safe iff every disjunct is safe i.e., if
    all C1 C3, C1 C4, C2 C3, and C2
    C4 are safe

36
Important Theorem
  • A query is separable if it is safe (i.e., query
    separability ? safety)
  • A query is safe if there is no cross
    matching(i.e., safety ? no cross matching)
  • If there is a cross-matching between conjuncts,
    we cannot separately translate them
  • Put them into the same group

37
Algorithm TDQM
  • Recursively traverse the query tree in the
    top-down order
  • At a disjunction node
  • Separately translate its children
  • At a conjunction node
  • Put the children with cross matching into the
    same group and rewrite the query locally in each
    group

38
Algorithm TDQM



xfn
yfn
zln ...
vp ...
wy ...
Recursively traverse the tree top-down
  • At a disjunction node
  • Separately apply TDQM each child
  • Disjunction separability theorem

39
Algorithm TDQM



xfn
yfn
zln ...
vp ...
wy ...
  • At a conjunction node
  • Group children by identifying cross-matchings
  • No cross-matching between groups (safety
    condition)

40
Algorithm TDQM

wy ...
  • For groups with more than one conjunct
  • Locally rewrite into a disjunctive form (not DNF)

41
Algorithm TDQM

wy ...
  • For groups with more than one conjunct
  • Locally rewrite into a disjunctive form (not DNF)

42
Algorithm TDQM





wy ...
vp ...
x
z
y
z
  • Continue tree traversal until we reach simple
    conjunction and apply basic mappings

43
Algorithm TDQM
  • Generates minimal subsuming translation
  • Resulting translation is compact
  • Assuming the original query is compact
  • Convert the tree only when it is necessary

44
TDQM Summary
  • Key conceptsSeperability ? Safety ?? cross
    matching
  • Local rewriting for compact translation

45
A Few Remarks
  • Final algorithm is straightforward
  • Simply put, separately translate each term if
    there is no cross-matching
  • Many people can come up with the algorithm
  • But the author developed an amazing theory by
    carefully studying basic questions
  • Initial problem looks rather trivial
  • But a mine-field of interesting research topics

46
Questions?
Write a Comment
User Comments (0)
About PowerShow.com