The Mediation of Information using Xml project - PowerPoint PPT Presentation

About This Presentation
Title:

The Mediation of Information using Xml project

Description:

A criterion for judging the precision of a view DTD is tightness. ... The tightness criterion can be a benchmark for other powerful view definition ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 91
Provided by: len68
Category:

less

Transcript and Presenter's Notes

Title: The Mediation of Information using Xml project


1
The
Mediation of
Information using Xml
project
  • BYAmir Atauna Michael Brautbar

2
What is a Mediator and Why is it Needed?
  • Huge quantity of information on the web.
  • Users wants to find information on the web that
    is related to their problem.
  • Problem The information is distributed across
    many sources, each source provides a
    different interface and exports the data
    in a different format.

3
  • Mediator systems will assist the users by
    providing them integrated views of the data
    they are interested in.
  • Example a Web-shopping mediator will provide
    to the Web value-shopper a view where the
    lowest prices for each product are
    provided.
  • The goal of MIX is to facilitate the development
    of such mediators.

4
Is the mediator concept new?
  • No, the TSIMMIS mediator uses the semistructured
    model OEM (Object Exchange Model).
  • Wrappers export the source data translated to
    OEM.
  • The mediator export an integrated view of the
    wrapper data based on a view definition provided
    by the administrator.

5
  • The view definition is expressed in the Mediator
    Specification Language (MSL).
  • At runtime the mediator receives queries, which
    refer to the view objects and expressed in MSL.
  • First, the incoming query is combined with the
    view definition into a query which refers
    directly to source data.
  • Then the optimizer finds a plan to execute the
    latter query by sending queries to the wrappers
    and combining their results in the mediator.

6
  • The wrappers translate the queries they receive
    into queries understood by the sources.
  • The MSL specifications can be very loose on
    the amount of info they provide on the structures
    they provide.
  • This is a valuable feature when working with
    dynamic semistructured sources.
  • There are two weak points
  • - The user does not know the structure ot the
    underlying data and this impedes his efforts to
    formulate a reasonable queries.

7
  • Second - the mediator may not have complete or
    any information of the metadata and structure of
    each source and this leads to a heavy loss of
    performance
  • MIX solves this problems with DTDs

8
The Philosophy of MIX The Web as a Distributed
Database
  • The developer of this system strongly believe
    that the Web will emerge as a distributed
    database and XML (or some extension/modification
    of XML) will be the data model of this huge
    database.
  • The MIX mediator views XML as a database model
    and uses the mediator concept as known in the DB
    area.

9
(No Transcript)
10
  • Sources will be exporting an XML view of their
    data along with semantic descriptions of the
    content (Source DTDs) and descriptions of the
    interfaces (XML queries) that may be used for
    accessing the data.
  • Users and applications will then be able to
    query these view documents using some XML query
    language.
  • The MIX mediator uses the source DTDs to assist
    the user in query formulation and the query
    processors in running queries more efficiently.

11
  • MIXs query evaluation is done in a lazy approach
    (on demand), i.e. XML queries (expressed in XMAS)
    are unfolded and rewritten at runtime.
  • In the other approach, the eager (warehousing),
    the data integration occurs in a separate
    materialization step, before the actual user
    queries.

12
  • Conventional data repositories are not expected
    to be converted to XML.
  • Wrappers technologies that allow us to logically
    view an information source (which may be a
    relational database, a collection of html pages,
    or even a legacy information system) as a large
    XML source.
  • The wrappers are able to translate XMAS queries
    into queries or commands that the underlying
    source understands.
  • They are also able to translate the result of the
    source into XML.

13
Creating Mediated Views Using MIX mediator and
Querying them with BBQ
  • The XML documents have to be integrated.
  • One goal of MIX is to develop integrated views
    and fast.
  • For this the developers use XMAS as the view
    definition language.

14
(No Transcript)
15
  • The BBQ (Blended Browsing and Querying ) user
    interface enables the users to formulate XMAS
    queries using a GUI that reminds of
    query-by-example interfaces in relational
    database

16
The MIX Architecture
17
  • The graphical user interface BBQ allows the
    construction of queries.
  • In order to accomplish the integration, the MIX
    mediator comprises several modules.
  • - Its main inputs are XMAS queries generated by
    the BBQ, and the mediator view definition (also
    in XMAS) for the integrated view.
  • - The resolution module resolves the user query
    with the mediator view definition, resulting in a
    set of unfolded XML queries that refer to the
    wrapper views.

18
  • - The simplification module is used to further
    simplify the XML queries based on the underlying
    XML DTDs.
  • - The DTD inference module can be used to
    automatically derive view DTDs from source DTDs
    and queries for supporting the integration task
    of the mediation engineer (This is done
    off-line).
  • - The translation module maps the simplified
    queries into the XMAS algebra.

19
  • - The optimization module can be used to
    further optimize the XMAS queries.
  • - The execution engine issues XMAS queries
    against the wrappers, and returns the requested
    XML data to the user, after integrating the
    retrieved data according to the mediator view.
  • The wrappers are used to export data in a uniform
    format to the mediator

20
The XMAS Language
  • The data model of the sources of the mix mediator
    are valid XML docs
  • We need a way to formulate queries that can
    relate to data in multiple XML docs
  • XML document structure may be tightly structured
    as in a relational databases or to
    have no structure at all

21
The XMAS Language Cont
  • So we need a query language that is as strong
    as relational algebra
  • Preferable features of the language
  • Simple formulation of queries
  • Will logically describe what we
    want to say

22
Solution XMAS
  • XMAS stands for XML matching and structuring
    language
  • Declarative ,high level language
  • Build upon ideas of languages like XML -
    QL , MSL.

23
General Structure Of An XMAS Query
  • CONSTRUCT head WHERE
    body1 IN source1
    (AND OR NOT ) body2 IN source2
    (AND OR NOT ) body3 IN source3
    ...
    (AND OR NOT ) bodyn IN sourcen
    (AND OR) predicate

24
  • Body (the where clause)
    specifies the data which is to be extracted from
    the XML sources
  • Head (the construct clause)
    describes how the extracted data is arranged into
    a new answer XML document. In this part we may
    use the collection operator and the ordering
    operator. (Will be
    explained later on)
  • ( Body and head roughly resembles the select and
    where in SQL)

25
  • Predicate defines conditions on the variables
    occurring in the sources
  • Lets look at an example

  • population)

    type (pcdata) (pcdata)

26
For Example We Can Have The Following
XML Doc For That DTD
  • ip91901 alpine
    rural/town 13238n p91903 alpine r
    ural/town 4783

27
Query Example
  • Suppose we want to retrieve all names of big
    neighborhoods ,say where population is greater
    than 30000
  • In XMAS we can write the following
    query

28
  • Construct
  • n
  • N
  • Where
  • n
  • p
  • IN "http//www.Pnaci.Edu/dice/mix/tutorial/neighbo
    rhoods.Xml
  • And p30000

29
How Does It Work
  • Lets look at the body of the query above. This
    tree pattern mimics the tree structure of the
    input XML document
  • The variables N and P are used to get a hold
    of the data at the corresponding locations in the
    tree structure representing the input XML doc.
    In other words , the tree pattern specifies
    that the root element of the XML doc is
    of type big_neighborhoods

30
  • Within big_neighborhoods there must be some
    big_neighborhood subelement ,which itself contain
    name and population subelements
  • In this way , the tree pattern specifies a list
    of pairs of variable bindings for N and P
  • From this list we want to select only those which
    satisfy the condition P 30000
  • To summarize , the body defines a list (n1
    p1) ... (nk pk) of all variable bindings for
    (N,P), which match (or satisfy) the body

31
  • The head consists of an XML tree pattern which
    contains some or all the of the variables of the
    body
  • In the example above , the head define a root
    element big_neighborhoods with a big_neighborhood
    subelement, having in turn a name subelement.
    The latter is used to hold the bindings for
    N which have been obtained through the body
  • Using N expresses that we want to have only
    one big_neighborhoods element that has a number
    of big_neighborhood subelements. (One for each
    name N obtained from the body)

32
The Collection Operator
  • Is used to collect all binding of the subelemnt
    to be put under the father element
  • Has two kinds implicit and explicit
  • The usage for the explicit version is N where
    N is a free variable in that level
  • For example (of the explicit usage),
    consider the previous example

33
The Collection Operator Cont
  • We create exactly one big neighborhood element
    for each binding n1 ... nk of N (thereby
    biding the value of N within the big
    neighborhood element to one ni), and all these
    elements are collected as subelements of the
    parent element

34
The Collection Operator Cont
  • For elements in the head which do not have an
    explicit collection label, an implicit collection
    label may be used
  • The implicit collection variables of an element E
    are those which are free in E
  • The usage for the explicit version is ...
    where is before the beginning of the section
    and is at its end

35
The Collection Operator Cont
  • For example consider the following code
    A
    B C

  • The above corresponds to a nested loop structure

36
The Ordering Operator
  • All subelemnts binding may be ordered by a given
    order
  • If no order is specified a default order is
    used.(Based on the order in which the data was
    found)
  • Example consider the next DTD and the given
    query after it

37

  • pcdata required
  • And the query is CONSTRUCT
    H order by H.Price
    WHERE
    H
    IN "http//www.Mine.Xml"

38
So ,Mmm ,Is XMAS So Powerful ?
  • Home buyer's scenario. A user who wants
    to buy a home .
    he wants to make use of
    information available from the web to guide this
    decision. A possible query that the user may
    issue is find all houses with 3 bedrooms, 2
    baths, interior area at least 1600 sq.Ft.,
    Priced between 250k and 350k, in
    regions where the school rating is at least 70
    (out of 100) and the crime rate is no more
    than 15 incidents per year. Group the answers by
    region and order them by price. For each home
    also show the nearby schools."

39
(No Transcript)
40
Strong As Relational Algebra
  • As mentioned before , one of the features of XMAS
    is that it is as expressive as relational algebra
    . some examples for this
  • Selection
    selection on a variable is made in the
    predicate part of the query
  • Projection write in the head just those variable
    that you want to project

41
  • A natural join can be obtained by equating
    variables in the body
  • Cartesian product may also be expressed easily

42
CONSTRUCT
N S
N, S WHERE
N
Z IN
"http//www.npaci.edu/DICE/MIX/tutorial/neighborho
ods.xml" AND S
Z1
IN "http//www.npaci.edu/D
ICE/MIX/tutorial/schools.xml" AND ZZ1
Cartesian product is easily expressed by removing
the condition ZZ1
43
Merry XMAS
44
DTD Inference
45
The MIX mediator and the advantages of living
with DTD-provided structure
  • The MIX mediator employs DTDs to assist the user
    in information discovery, query formulation and
    to allow the query processor to derive more
    efficient plans.
  • The view DTD inference module derive view DTD
    given the source DTDs and the view.

46
  • The view DTD is passed to the DTD-based query
    interface to enable query formulation.
  • A DTD inference algorithms developed for a
    limited class of XMAS queries/views.
  • - pick-elements XMAS queries, i.e., queries
    whose SELECT clause has a single variable, called
    pick-variable, that binds to elements and WHERE
    clause consists of a single condition that is
    applied to only one source.

47
  • It is easy to compute a loose DTD for a view but
    it is critical to the query interface and the
    query processor to get the one that describe the
    view as precisely as possible.

48
  • Also precise view DTDs may have other
    applications than ours, for example, it may be
    used as a toolkit for generating XSL style sheets
    for presentation of the view.
  • A criterion for judging the precision of a view
    DTD is tightness.
  • A DTD d1 is tighter then a DTD d2 if every
    document described by d1 also described by d2.
  • The tightness criterion can be a benchmark for
    other powerful view definition languages and view
    inference algorithms.

49
  • So the view DTD inference algorithm attempts to
    derive to tightest DTD that contains all the
    possible documents that may appear as the content
    of the view.
  • Unfortunately, even the tightest view DTD
    describes structures that can never appear as the
    views content.
  • For this the view DTD inference algorithm derive
    an extended form of DTDs that typically does not
    have non-tightness problems known as Specialized
    DTDs.

50
Model and Query Language Framework
  • The focus is on XML documents that meet the
    following requirements
  • - XML always valid i.e. Have a DTD.
  • - There are no other attributes than the ID
    attribute and all elements have an ID attribute.
  • - There are no empty elements but elements with
    empty content are allowed.
  • - Mix content elements are not allowed i.e
    elements whose content mixes strings with elements

51
  • DefinitionElement - An element e is a triplet
    consisting a name, name(e), a unique ID and
    content, content(e) which is a sequence of
    elements or PCDATA value.
  • DefinitionA DTD is a set
    n is in N where N is
    the set of names and type(n) is either a regular
    expression over N or PCDATA.
  • L(r) is the regular language described by r.

52
  • DefinitionAn element e satisfies a DTD D,
    e D, if the following conditions exist
  • - name(e) is in N where N is the set of
    element names
  • - if content(e) e1,e2,...,em then name(e1)
    ... Name(e m) are in L(type(name(e)) and ei D
    1
  • Else if content(e) is a string then
    type(name(e))PCDATA.

53
Soundness Tightness
  • DefinitionA view DTD DV is sound if, given
    source DTDs D1,D2,...,Dn and a view definition V,
    for every tuple (d1,d2,...,dn) of n documents
    such that d1 D1,d2 D2,...,dn Dn the view
    document V(d1,d2,...,dn) DV
  • DefinitionA DTD D is tighter then a DTD D if
    every document satisfying D satisfies D.
  • A type is tighter then a type
    if L(r) is contained in L(r).

54
  • Definition A DTD DV is a tightest view DTD for
    given source DTDs D1,D2,...,Dn and a view
    definition V is there is no view DTD DV such
    that DV tighter than DV.

55
Structural Tightness
  • In many practical cases even the tightest view
    DTDs describe view document structures that
    cannot be produced by the view.
  • This information loss phenomenon is formalized
    by introducing the structural tightness property
    of view DTDs.

56
(No Transcript)
57
  • Definition A structural class of documents is a
    set of documents such that for every two
    documents d1,d2 in the class there is a mapping
    that maps
  • - every string of d1 on a string of d2 and vice
    versa.
  • - every id of d1 into an id of d2 and vice
    versa
  • - if the mappings are applied to d1 , d1
    becomes identical to d2 and vice versa

58
  • Definition A structural class of documents
    satisfies a DTD D if the documents of the class
    satisfy D.
  • Definition Given a set of sources DTDs D1,,Dn
  • and a view V, a DTD DV is structurally tight if
  • - it is the tightest DTD of the view given the
    source DTDs
  • - for every structural class S that satisfies
    DV there is a view document I that satisfies DV
    and there are also source documents I1,,In,
    satisfying D1,,Dn and I V(I1,,In).

59
Specialized DTDs
  • Specialized DTDs resolve the inherent
    non-tightness problems of DTDs
  • Query Find all the professor and grad
    sub-elements of department with one journal
    publication.

60
How specialized DTDs are computed?
  • The DTD tightening algorithm recursively
    tightens each type of the initial DTD by means
    of the type refinement algorithm.
  • Definition The type refinement refine(r,n) of a
    regular expression r given a name n is the
    regular expression r that describes all strings
    L(r) that contain at least one instance of n.

61
Converting s-DTDs to DTDs
  • First we obtain the images of all types of the
    s-DTDs.
  • Then we merge all images that have the same name.

62
Schema Inference Algorithm
  • Refinement
  • - Tightens individual types
  • Specialization
  • - uses the refinement algorithm and tightens
    the whole input document.
  • Result List Type Inference.
  • - Discovers the names and order of the types
    that appear in the result.

63
Future Work
  • Powerful Query Languages
  • - group-by, nest, navigation using recursive
    paths in the vertical and horizontal direction,
    check order, manipulate order.
  • More powerful/flexible schema descriptions
  • - XML-Data, DCDs, many academic proposals
  • Conditions for existence of tight/tightest DTDs.
  • Other quality metrics for a view DTD.

64
The BBQ application
introduction
  • BBQ stand for Blended Browsing and
    Querying - a graphical user interface
    for browsing and querying XML data
    sources.
  • There are very few visual interfaces for querying
    and browsing semistructured data, and fewer for
    XML.

65
introduction cont.
  • BBQ support query refinement by having query
    results be sources used in subsequent queries.
    Users can construct a query result document
    (essentially a virtual view) and that document
    becomes a first-class data source within BBQ,
    meaning it can be browsed, queried, or used to
    construct another query result document.

66
introduction cont.
  • This is quiet useful if the user does not know ,
    in advance , what exactly he is looking for.
  • The interface allows users to quickly create
    complex queries without writing XMAS syntax by
    hand.
  • BBQ displays the structure of multiple data
    sources using a paradigm that resembles
    drilling-down in Windows director structures.

67
Mix Mediator
Wrapper
Wrapper
XML Data Source
Data Source
Computational Source
68
The BBQ interface
  • BBQ ,which is XML driven, uses a set of DTDs
    exported by the MIX mediator. They will be
    referred from now on as base DTDs
  • The BBQ interface consists of one main window and
    zero or more floating windows. The main window
    contains a of toolbar, a split pane, and a
    message console, while the floating windows
    contain a toolbar and split pane only.

69
(No Transcript)
70
  • From now on we will use the following DTDs which
    will represent the base DTDs .
  • CSEStudents (CSEStudent)
    degree)

    (PCDATA)

71
  • Interns (Intern) (name, supervisor, sponsor)


72
BBQ power selecting and browsing XML
source DTD and data
  • The DTDs are represented as trees in the obvious
    hierarchical manner an element name is a parent
    node, and that elements sub-elements are its
    children
  • BBQ features special tree nodes to represent XML
    DTD's structural operators such as the choice and
    the seq(uence).

73
  • These special tree nodes give the user a more
    accurate view of the DTD's structure than other
    semistructured-data viewing systems,
    and they also facilitate more complex queries.
  • For example, a default order constraint is
    introduced, namely the one that corresponds to
    the order in which elements are listed on the
    screen.

74
  • XML data corresponding to given DTD are
    represented as a directory tree.
  • The XML data is materialized on demand from the
    source.
  • The buttons labeled next and previous in the XML
    panel retrieve the next and previous n
    instances, respectively.

75
(No Transcript)
76
BBQ power cont. Creating XMAS
Queries with BBQ
  • A query session is the set of events that occur
    while BBQ is connected to the mediator.
  • Each query session consists of one or more query
    cycles. A query cycle is the set of events that
    starts with the user constructing a query, and
    ends with the user browsing the query result.

77
  • The basic BBQ query cycles takes place in four
    steps
  • First, constraints are set on the data sources.
  • Second, a tree representing the query result
    schema is created by dragging and dropping
    elements.
  • Third, the XMAS query is generated and submitted
    to the mediator.
  • Fourth, a DTD is generated for the query result
    and the query result schema and data are
    displayed.

78
First step constraints set
  • Constraints can be set on the leaf nodes of the
    DTD tree or XML tree. Constraints cannot be set
    on nonleaf nodes
  • The operators are a basic set of comparators
    (,, , substr)

79
Example
  • The user right-clicks the degree element and
    selects "View/Edit Constraint...
    from the popup menu. This action
    brings up the "View/Edit Constraint"
    dialog box, where is selected as the
    operator, and PhD is typed in as the
    operand. At this point,
    the user clicks OK

80
(No Transcript)
81
  • Joins can take place within a data source or
    across data sources. Creating a join in BBQ is as
    simple as selecting one leaf element, and
    dragging and dropping it onto another leaf
    elements
  • Suppose the user is interested in CSEStudents
    who are also interns, and whose advisor is also
    their supervisor.

82
(No Transcript)
83
Second construct the head
  • construct a tree that the answer document(s) must
    conform to, called the head or query result tree.
    The right panel of BBQs main window is where the
    head is built.
  • The head is composed of elements (and their
    sub-trees) dragged from source DTDs, and tags
    created on the spot with the Create New Child
    popup menu item.
  • Ordering and group - by operators are also used
    in the creation of the head.

84
(No Transcript)
85
Third and forth steps
  • BBQ converts the visual layout into XMAS query
    language, contacts the MIX mediator and submits
    the query.
  • Finally, BBQ generates a DTD for the query result
    and it is displayed with the corresponding data

86
BBQ Interface
Query in xmas
Xml result ,DTD
Mix mediator
wrapper
wrapper
OODB Database
87
Important things to remember about
the BBQ
  • Enable the query creator to construct queries in
    an easy and graphical-oriented way.
  • Graphically support all the features of the XMAS
    query language.
  • Supports blended browsing and querying
  • accurate representation of DTDs and XML data.

88
  • Allows graphical represantion for the query
    result also.
  • DTD for the result XML page of the given query is
    created by the DTD -inference mechanism.
  • Because of that ,we may treat the query result as
    any other XML source we use.( so we may use this
    result as one of the sources used to build new
    queries.

89
  • These is usually the case when we want to get
    some information from the internet. We dont know
    exactly what we are looking for , and the results
    of the first queries aim us towards the goal of
    our search.

Mix mediator
90
Selected biblography
  • Enhancing Semistructured Data Mediators with
    Document Type Denitions by
    Yannis Papakonstantinou, Pavel
    Velikhov
  • BBQ A Visual Interface for Integrated Browsing
    and Querying of XML Kevin D.
    Munroe, Yannis Papakonstantinou
  • XML-Based Information Mediation with MIX
    Chaitanya Baru Amarnath Gupta Bertram Ludascher
  • Introduction to XMAS by
    the XMAS sub-group of MIX
Write a Comment
User Comments (0)
About PowerShow.com