WebOQL - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

WebOQL

Description:

Supports querying small databases represented as documents (such as catalogs) ... Tails: a tails of tree t are trees obtained by chopping. prefixes of t. ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 51
Provided by: ariel5
Learn more at: https://web.mst.edu
Category:
Tags: weboql | chopping

less

Transcript and Presenter's Notes

Title: WebOQL


1
WebOQL
  • A Web Object Query Language

2
Overview
  • Data model supports abstractions for modeling
    record-based data, structured documents and
    hypertexts
  • Supports querying small databases represented
    as documents (such as catalogs), restructuring
    single pages (converting a large page into
    smaller pages), restructuring sets of pages, for
    example, creating an index page containing a
    hyperlink to each of them and adding to each page
    a hyperlink to index page.
  • Restructuring the content of a web site in order
    to show the same content in another view.

3
Data Model
The WebOQL data model introduces the hypertree a
tree based Data model representing structured
document containing hyperlinks
Hypertrees are Ordered arc-labeled trees with
two kinds of arcs Internal and external.
4
Data Model
Example
Group students
Group professors
Name oded. Seniority 8
Name moshe. Sem 5
Name arik. Sem 8
Label arik home page. URL www/index.html
Label seminar in www. URL www/s.html
Label databases. URL www/index.html
Label moshe home page. URL www/index.html
5
Data Model
Hyper trees are a useful data structure because
the have three important abstractions
  • Collections
  • Nesting
  • Ordering

The reference notion which is very important to
the web structure is captured through the
distinction between internal and external arcs.
Because the nodes have no type the tree can hold
heterogeneous records within its arcs.
6
Data Abstractions
WEB
a pair (t,F) where t is a hypertree and
schema
browsing function
PAGE
F(u) where u is a URL
7
Tree operators
Definitions
Tails a tails of tree t are trees obtained by
chopping prefixes of t.
Simple tree simple trees of tree t are the trees
that are composed of an arc that stems from the
root of t and its sub tree .
Subtree subtrees of t are the trees at the end
of arcs which stem from the root of t.
8
  • q4
  • q5
  • q5!
  • q52

q5
q6
q7
9
Tree operators
Concatenate
Tree1 Tree2
Connects two trees by their roots
10
Tree operators
Hang
Arc1 / Tree1
Hangs the tree from a new arc.
11
Tree operators
Prime
Tree
The first subtree of the argument.
12
Tree operators
Head
Tree x
The first x simple trees of the argument, if x is
not specified then only the first simple tree.
13
Tails of T (prefixes)
Label3
Label3
Label3
Label1
Label2
Label2
A1
A2
B1
B1
14
Label3
Label1
Tree t
Label2
A1
A2
B1
Label1
Label3
Label2
A1
A2
B1
Sample trees of t
null
A1
A2
B1
Sub trees of t
15
HANG
  • Label papers from smith, Format ps.Z/q1
  • Tag UL/Tag LI, Text First Child
  • Tag LI, Text Second Child
  • Tag LI, Text Third Child
  • Url http//a.b.c., Label Click Here

LabelPapers from smith Formatps.Z
TitleRecent.. Urlhttp//..
Title Are.. Urlhttp//www.
HANG concatenate
Url http//a.b.c., Label Click Here
TagUL
TagLI TextFirstChild


16
Tree operators
Peek
Arc.field
Extracts a field from an arcs label, e.g.
Example.Group can have a value of students.
If this filed does not exist a value of nil is
returned.
IsField
Arc?field
Test for the presence of a field from in an arcs
label, e.g. Example?Group evaluates to true,
while Example?Name evaluates to false.
17
  • PPage when a hypertree has an associated URL
    that identifies it.
  • WWeb Collection of interrelated pages.
  • External Arc of each page is a link in the web
  • Schema A web can be optionally have a
    distinguished page to provide entry point to the
    web

18
  • NNo Schema One must know URL of one or more pages

http//a.b.c./three.html
http//a.b.c./one.html
http//a.b.c./two.html
19
Weboql query
Web Web Schema
http//a.b.c./three.html
http//a.b.c./one.html
http//a.b.c./four.html
http//a.b.c./two.html
20
  • ltULgt
  • ltLIgt First Child
  • ltLIgt Second Child
  • ltLIgt Third Child
  • lt/ULgt
  • ltA HREFhttp//a.b.c.gt Click Here lt/A gt

21
Urlhttp//a.b.c. Label Click here
Tag LI TextFirst Child
Tag LI TextThird Child
Tag LI TextSecond Child
Tree representing HTML document consisting of a
list and a hyperlink
  • Trees are ordered
  • Arcs are not labelled with atomic values but
    records

22
groupDBMS
groupCard
groupProgLang
TitleRecent AuthorsSmith PublicationsTech
TitleAre AuthorsSmith PublicationsACM
LabelAbstract Url www
LabelFull Papers Url www
Paper Database CS papers
23
SELECT - FROM - WHERE
This familiar query language construct is used by
WebOQL as the main construct of queries.
Query to evaluate
y.Label, y.URL
Definition of variables
x in example, y in x!
A boolean condition
x.Seniority 8
24
SELECT - FROM - WHERE
For each instantiation of the variables in the
from clause check the condition in the where
clause, if its true then evaluate the query in
the select clause and append it to the result.
25
Select Y.title, y.publication From x in cs
papers, y in x missing data Publication -
undefined
26
  • Compute a listing of the papers publication data
    grouped by title.
  • Select x.Title /
  • Select z.Publication from y in csPapers, z in
    y
  • Where x.title y.title
  • From w in csPapers , x in w

27
  • Schema a distinguished hypertree
  • Browsing function maps strings (URLs) to
    hypertree, it defines a graph where the nodes are
    pages and there is an arc between node a and b if
    the content of the page at node a contains an
    external arc whose url attribute is the url of
    the page at node b.

28
  • Analogy with Relational database
  • Hypertree gt Relations
  • Webs gt databases
  • Schema of a web gtcatalog of a database

29
  • Select x.Tag
  • From x in
  • browse(http//www.cs.toronto.edu)

Tag body
Tag head
30
  • SFW creates a web
  • Select y.Title, y.URL as schema
  • From x in csPapers , y in x
  • Where y.authors smith
  • Create a web page with URL Group Names whose
    content is the list of group names (assume that
    there is no such page in the current web)
  • Select x.Group as Group Names from x in
    csPapers

31
  • Create several pages one for each research
    group (using the group name as URL). Each page
    contains the publications of the corresponding
    group
  • Select x as x.Group from x in csPapers

32
Data Model
  • Records as Labels on Arcs
  • Internal and External Arcs

Tag UL Text one of the
Tag H1, Text City Overview
Tag L1, Text If you are interested
Tag LI, Text One of the
Tag L1, Text All the hotels
Tag XYZ, Text If you are
Tag XYZ, Text
Label Theatres Online, Url http//www, Base
http//www, Text This page contains...
Tag XYZ, Text Contains
Label Sports Zone, Url http//www, Base
http//www, Text Sports Zone
Tag XYZ, Text One of the
Label All the Hotels, Url http//www, Base
http//www, Text These are all
33
Query list elements containing ticket
  • doc http//www.citynet.com/overview.html
  • tag UL/
  • Select y
  • from y in doc !
  • where y.text ticket

Tag UL
Tag LI
Tag LI
Tag XYZ, Text
Label Theatres Online, Url http//www, Base
http//www, Text This page contains...
Label Sports Zone, Url http//www, Base
http//www, Text Sports Zone
Tag XYZ, Text If you are
Tag XYZ, Text One of the
34
Web restructuring
Using these tree operators we have shown how a
tree can be restructured.
To restructure a web we must have a function
which maps one web to another. The new web has
some hypertree as its schema while the browsing
function is an extension of the old webs
browsing function - targets URLs which were not
previously targeted.
The way it is done in WebOQL is by using the AS
clause.
35
Web restructuring
Generally the select clause of WebOQL has the
form of
Select q1 as s1, q2 as s2, ., qn as sn
Si can be either the key word schema, or a string
query.
An as clause which evaluates to schema defines
the schema of the web.
Title y.Group as schema
36
Web restructuring
Generally the select clause of WebOQL has the
form of
Select q1 as s1, q2 as s2, ., qn as sn
Si can be either the key word schema, or a string
query.
An as clause which evaluates to a string defines
a page and is treated as the URL for it.
x.Name as y.Group
37
Web restructuring
After a web is created there are two
possibilities either query it further
(restructure it) or return it to the host
application.
If we want to return the web to the host
application for the sake of showing it to a
browser then we must format the pages in an HTML
compliant way. This is easily done by
restructuring it using HTML tags as labels.
38
Document restructuring
Web documents are a perfect example of semi
structured data since they do not have a fixed
schema and can have various irregularities. In
an HTML document most of the tags may appear any
number of times or not at all.
WebOQL uses a wrapper which creates abstract
syntax trees (AST) from any arbitrary HTML
document. This is easily done since the markup
tags of HTML reflects the logical
relationship between the various information
items.
Example ltULgt ltLIgt item 1. lt/LIgt ltLIgt
item 2. lt/LIgt ltLIgt item 2. lt/LIgt lt/ULgt
39
Document restructuring
Navigation patterns
In the examples we have seen the variables used
in the queries ranged over simple trees of the
tree we queried, however in the WWW variables may
range over several linked sub trees
whose structure is not fully known to us.
select x.text from x in someones.html via
Tag H2
- record predicate which is true for every
internal arc.
TagH2 - record predicate which is true for
every arc which has an H2 tag.
40
Document restructuring
Navigation patterns
In the examples we have seen the variables used
in the queries ranged over simple trees of the
tree we queried, however in the WWW variables may
range over several linked sub trees
whose structure is not fully known to us.
select x.text from x in someones.html via
gtnot(Tag H2)
gt - record predicate which is true for every
external arc.
not(TagH2) - record predicate which is true
for every arc which does not have an H2 tag.
41
Document restructuring
Navigation patterns
When navigation patterns are omitted then they
query is treated as if there was a navigation
pattern which always evaluated to true.
Variables are instantiated in left to right
depth-first or breadth-first search. Since the
default is breadth-first to use depth-first the
key word viadfs is used instead of via.
42
  • Select q1 as s1, q2 as s2, q3 as s3, .qm as sm
    where qis are queries and si is either a string
    query or keyworld Schema.
  • Generate a web consisting of a page for each
    research group containing a title and author of
    all its publications, and an index web page ,
    that lists all the groups and provides links to
    their pages
  • newWeb Select unique Name x.Group, url
    x.Group as schema
  • y.Title, y.Authors as x.Group
  • From x in csPapers, y in x

43
Name Card Punching Url Card Punching
Name Url..
As Schema
Name Prog. Lang Url Prog.Lang..
Prog. Lang.
Card Punching
Titles Assembly Lan Authors John,..
Titles Cobol Authors James J
Titles Recent Authors Smith
Titles Arc Authors Smith
As x. group
44
  • NewerWeb lt newWeb
  • select Tag H3, Text y.Title
  • Tag BR, Text y.Publication
  • Tag BR, Text y.Authors
  • Tag P
  • as x.Name
  • from x in schema, y in x.Name
  • select Tag H2, Text Publications of the
    x.Name Group x.Name
  • Tag A, Label To Index, Url
    http//a.b.c/Index of Projects.html
  • as http//a.b.c/ x.Name .html
  • from x in schema

45
  • select Url http//a.b.c/Index of
    Projects.html as schema,
  • Tag H2, Text Index of Projects
  • Tag UL /
  • select Tag LI /
  • Tag A, Label x.Name,
  • Url http//a.b.c/ x.Url .html
  • from x in schema
  • as http//a.b.c/Index of Projects.html

46
  • ltH2gt Index of Projects lt/H2gt
  • ltULgt
  • ltLIgt ltA HREF http//a.b.c./cardpunching.htmlgt
  • Card Punching
  • lt/Agt
  • lt/LIgt
  • ltLIgt ltA HREF http//a.b.c./programminglanguage
    s.htmlgt
  • Programming Languages
  • lt/Agt
  • lt/LIgt
  • ltLIgt ..
  • lt/ULgt

Index Page
47
  • ltH2gtPublications of the Card Punching group lt/H2gt
  • ltH3gt recent Discoveries in Card Punching lt/H3gt
  • ltBRgt Technical Report TROIS
  • ltBRgt Peter Smith, John Brown
  • ltPgt
  • ltH3gt Are Magnetic Media Better ? lt/H3gt
  • ltBRgt ACM TOCP Vol 3 No. (1942) pp.2337
  • ltBRgt Peter Smith, John Brown
  • ltPgt
  • ltA HREFhttp//a.b.c./IndexnProject.htmlgt
  • To index
  • lt/Agt

Group Pages
48
Navigation Pattern
  • Not (Tag A) - Path of any length composed
    of arcs not having an attribute tag with value
    A.
  • Tag LI Tag A path of length 2
  • gt - all paths in a tree that lead from root to
    an external arc
  • Select x.url
  • from x in http//a.b.c./index.html
  • Via not (tag Table)gt
  • All the external arcs in the document pointed to
    by the http that do not occur within a table

49
  • Select x.url,x.text
  • From x in http//a.b.c./root.html
  • Via (Labled Nextgt)

50
Architecture
Write a Comment
User Comments (0)
About PowerShow.com