Title: WebOQL
1WebOQL
- A Web Object Query Language
2Overview
- Data model supports abstractions for modeling
record-based data, structured documents and
hypertexts - Supports querying small databases represented
as documents (such as catalogs), restructuring
single pages (converting a large page into
smaller pages), restructuring sets of pages, for
example, creating an index page containing a
hyperlink to each of them and adding to each page
a hyperlink to index page. - Restructuring the content of a web site in order
to show the same content in another view.
3Data Model
The WebOQL data model introduces the hypertree a
tree based Data model representing structured
document containing hyperlinks
Hypertrees are Ordered arc-labeled trees with
two kinds of arcs Internal and external.
4Data Model
Example
Group students
Group professors
Name oded. Seniority 8
Name moshe. Sem 5
Name arik. Sem 8
Label arik home page. URL www/index.html
Label seminar in www. URL www/s.html
Label databases. URL www/index.html
Label moshe home page. URL www/index.html
5Data Model
Hyper trees are a useful data structure because
the have three important abstractions
- Collections
- Nesting
- Ordering
The reference notion which is very important to
the web structure is captured through the
distinction between internal and external arcs.
Because the nodes have no type the tree can hold
heterogeneous records within its arcs.
6Data Abstractions
WEB
a pair (t,F) where t is a hypertree and
schema
browsing function
PAGE
F(u) where u is a URL
7Tree operators
Definitions
Tails a tails of tree t are trees obtained by
chopping prefixes of t.
Simple tree simple trees of tree t are the trees
that are composed of an arc that stems from the
root of t and its sub tree .
Subtree subtrees of t are the trees at the end
of arcs which stem from the root of t.
8q5
q6
q7
9Tree operators
Concatenate
Tree1 Tree2
Connects two trees by their roots
10Tree operators
Hang
Arc1 / Tree1
Hangs the tree from a new arc.
11Tree operators
Prime
Tree
The first subtree of the argument.
12Tree operators
Head
Tree x
The first x simple trees of the argument, if x is
not specified then only the first simple tree.
13Tails of T (prefixes)
Label3
Label3
Label3
Label1
Label2
Label2
A1
A2
B1
B1
14Label3
Label1
Tree t
Label2
A1
A2
B1
Label1
Label3
Label2
A1
A2
B1
Sample trees of t
null
A1
A2
B1
Sub trees of t
15HANG
- Label papers from smith, Format ps.Z/q1
- Tag UL/Tag LI, Text First Child
- Tag LI, Text Second Child
- Tag LI, Text Third Child
- Url http//a.b.c., Label Click Here
LabelPapers from smith Formatps.Z
TitleRecent.. Urlhttp//..
Title Are.. Urlhttp//www.
HANG concatenate
Url http//a.b.c., Label Click Here
TagUL
TagLI TextFirstChild
16Tree operators
Peek
Arc.field
Extracts a field from an arcs label, e.g.
Example.Group can have a value of students.
If this filed does not exist a value of nil is
returned.
IsField
Arc?field
Test for the presence of a field from in an arcs
label, e.g. Example?Group evaluates to true,
while Example?Name evaluates to false.
17- PPage when a hypertree has an associated URL
that identifies it. - WWeb Collection of interrelated pages.
- External Arc of each page is a link in the web
- Schema A web can be optionally have a
distinguished page to provide entry point to the
web
18- NNo Schema One must know URL of one or more pages
http//a.b.c./three.html
http//a.b.c./one.html
http//a.b.c./two.html
19Weboql query
Web Web Schema
http//a.b.c./three.html
http//a.b.c./one.html
http//a.b.c./four.html
http//a.b.c./two.html
20- ltULgt
- ltLIgt First Child
- ltLIgt Second Child
- ltLIgt Third Child
- lt/ULgt
- ltA HREFhttp//a.b.c.gt Click Here lt/A gt
21Urlhttp//a.b.c. Label Click here
Tag LI TextFirst Child
Tag LI TextThird Child
Tag LI TextSecond Child
Tree representing HTML document consisting of a
list and a hyperlink
- Trees are ordered
- Arcs are not labelled with atomic values but
records
22groupDBMS
groupCard
groupProgLang
TitleRecent AuthorsSmith PublicationsTech
TitleAre AuthorsSmith PublicationsACM
LabelAbstract Url www
LabelFull Papers Url www
Paper Database CS papers
23SELECT - FROM - WHERE
This familiar query language construct is used by
WebOQL as the main construct of queries.
Query to evaluate
y.Label, y.URL
Definition of variables
x in example, y in x!
A boolean condition
x.Seniority 8
24SELECT - FROM - WHERE
For each instantiation of the variables in the
from clause check the condition in the where
clause, if its true then evaluate the query in
the select clause and append it to the result.
25Select Y.title, y.publication From x in cs
papers, y in x missing data Publication -
undefined
26- Compute a listing of the papers publication data
grouped by title. - Select x.Title /
- Select z.Publication from y in csPapers, z in
y - Where x.title y.title
- From w in csPapers , x in w
27- Schema a distinguished hypertree
- Browsing function maps strings (URLs) to
hypertree, it defines a graph where the nodes are
pages and there is an arc between node a and b if
the content of the page at node a contains an
external arc whose url attribute is the url of
the page at node b.
28- Analogy with Relational database
- Hypertree gt Relations
- Webs gt databases
- Schema of a web gtcatalog of a database
29- Select x.Tag
- From x in
- browse(http//www.cs.toronto.edu)
Tag body
Tag head
30- SFW creates a web
- Select y.Title, y.URL as schema
- From x in csPapers , y in x
- Where y.authors smith
- Create a web page with URL Group Names whose
content is the list of group names (assume that
there is no such page in the current web) - Select x.Group as Group Names from x in
csPapers
31- Create several pages one for each research
group (using the group name as URL). Each page
contains the publications of the corresponding
group - Select x as x.Group from x in csPapers
32Data Model
- Records as Labels on Arcs
- Internal and External Arcs
Tag UL Text one of the
Tag H1, Text City Overview
Tag L1, Text If you are interested
Tag LI, Text One of the
Tag L1, Text All the hotels
Tag XYZ, Text If you are
Tag XYZ, Text
Label Theatres Online, Url http//www, Base
http//www, Text This page contains...
Tag XYZ, Text Contains
Label Sports Zone, Url http//www, Base
http//www, Text Sports Zone
Tag XYZ, Text One of the
Label All the Hotels, Url http//www, Base
http//www, Text These are all
33Query list elements containing ticket
- doc http//www.citynet.com/overview.html
- tag UL/
- Select y
- from y in doc !
- where y.text ticket
Tag UL
Tag LI
Tag LI
Tag XYZ, Text
Label Theatres Online, Url http//www, Base
http//www, Text This page contains...
Label Sports Zone, Url http//www, Base
http//www, Text Sports Zone
Tag XYZ, Text If you are
Tag XYZ, Text One of the
34Web restructuring
Using these tree operators we have shown how a
tree can be restructured.
To restructure a web we must have a function
which maps one web to another. The new web has
some hypertree as its schema while the browsing
function is an extension of the old webs
browsing function - targets URLs which were not
previously targeted.
The way it is done in WebOQL is by using the AS
clause.
35Web restructuring
Generally the select clause of WebOQL has the
form of
Select q1 as s1, q2 as s2, ., qn as sn
Si can be either the key word schema, or a string
query.
An as clause which evaluates to schema defines
the schema of the web.
Title y.Group as schema
36Web restructuring
Generally the select clause of WebOQL has the
form of
Select q1 as s1, q2 as s2, ., qn as sn
Si can be either the key word schema, or a string
query.
An as clause which evaluates to a string defines
a page and is treated as the URL for it.
x.Name as y.Group
37Web restructuring
After a web is created there are two
possibilities either query it further
(restructure it) or return it to the host
application.
If we want to return the web to the host
application for the sake of showing it to a
browser then we must format the pages in an HTML
compliant way. This is easily done by
restructuring it using HTML tags as labels.
38Document restructuring
Web documents are a perfect example of semi
structured data since they do not have a fixed
schema and can have various irregularities. In
an HTML document most of the tags may appear any
number of times or not at all.
WebOQL uses a wrapper which creates abstract
syntax trees (AST) from any arbitrary HTML
document. This is easily done since the markup
tags of HTML reflects the logical
relationship between the various information
items.
Example ltULgt ltLIgt item 1. lt/LIgt ltLIgt
item 2. lt/LIgt ltLIgt item 2. lt/LIgt lt/ULgt
39Document restructuring
Navigation patterns
In the examples we have seen the variables used
in the queries ranged over simple trees of the
tree we queried, however in the WWW variables may
range over several linked sub trees
whose structure is not fully known to us.
select x.text from x in someones.html via
Tag H2
- record predicate which is true for every
internal arc.
TagH2 - record predicate which is true for
every arc which has an H2 tag.
40Document restructuring
Navigation patterns
In the examples we have seen the variables used
in the queries ranged over simple trees of the
tree we queried, however in the WWW variables may
range over several linked sub trees
whose structure is not fully known to us.
select x.text from x in someones.html via
gtnot(Tag H2)
gt - record predicate which is true for every
external arc.
not(TagH2) - record predicate which is true
for every arc which does not have an H2 tag.
41Document restructuring
Navigation patterns
When navigation patterns are omitted then they
query is treated as if there was a navigation
pattern which always evaluated to true.
Variables are instantiated in left to right
depth-first or breadth-first search. Since the
default is breadth-first to use depth-first the
key word viadfs is used instead of via.
42- Select q1 as s1, q2 as s2, q3 as s3, .qm as sm
where qis are queries and si is either a string
query or keyworld Schema. - Generate a web consisting of a page for each
research group containing a title and author of
all its publications, and an index web page ,
that lists all the groups and provides links to
their pages - newWeb Select unique Name x.Group, url
x.Group as schema - y.Title, y.Authors as x.Group
- From x in csPapers, y in x
43Name Card Punching Url Card Punching
Name Url..
As Schema
Name Prog. Lang Url Prog.Lang..
Prog. Lang.
Card Punching
Titles Assembly Lan Authors John,..
Titles Cobol Authors James J
Titles Recent Authors Smith
Titles Arc Authors Smith
As x. group
44- NewerWeb lt newWeb
- select Tag H3, Text y.Title
- Tag BR, Text y.Publication
- Tag BR, Text y.Authors
- Tag P
- as x.Name
- from x in schema, y in x.Name
-
- select Tag H2, Text Publications of the
x.Name Group x.Name - Tag A, Label To Index, Url
http//a.b.c/Index of Projects.html - as http//a.b.c/ x.Name .html
- from x in schema
45-
- select Url http//a.b.c/Index of
Projects.html as schema, - Tag H2, Text Index of Projects
- Tag UL /
- select Tag LI /
- Tag A, Label x.Name,
- Url http//a.b.c/ x.Url .html
-
-
- from x in schema
- as http//a.b.c/Index of Projects.html
46- ltH2gt Index of Projects lt/H2gt
- ltULgt
- ltLIgt ltA HREF http//a.b.c./cardpunching.htmlgt
- Card Punching
- lt/Agt
- lt/LIgt
- ltLIgt ltA HREF http//a.b.c./programminglanguage
s.htmlgt - Programming Languages
- lt/Agt
- lt/LIgt
- ltLIgt ..
- lt/ULgt
Index Page
47- ltH2gtPublications of the Card Punching group lt/H2gt
- ltH3gt recent Discoveries in Card Punching lt/H3gt
- ltBRgt Technical Report TROIS
- ltBRgt Peter Smith, John Brown
- ltPgt
- ltH3gt Are Magnetic Media Better ? lt/H3gt
- ltBRgt ACM TOCP Vol 3 No. (1942) pp.2337
- ltBRgt Peter Smith, John Brown
- ltPgt
- ltA HREFhttp//a.b.c./IndexnProject.htmlgt
- To index
- lt/Agt
Group Pages
48Navigation Pattern
- Not (Tag A) - Path of any length composed
of arcs not having an attribute tag with value
A. - Tag LI Tag A path of length 2
- gt - all paths in a tree that lead from root to
an external arc - Select x.url
- from x in http//a.b.c./index.html
- Via not (tag Table)gt
- All the external arcs in the document pointed to
by the http that do not occur within a table
49- Select x.url,x.text
- From x in http//a.b.c./root.html
- Via (Labled Nextgt)
50Architecture