Title: Modeling and Querying Web Data A Survey
1Modeling and Querying Web DataA Survey
2Overview
- Introduction
- Data Representation for Querying the Web
- Modeling and Querying the Web
- Summary and Future
3Introduction
- Background
- The most common techniques used in searching
information from the Web are based on sending
information retrieval requests to index servers. - Use web query techniques to locate, filter and
present web information. - Challenges
- Difficult to build a common model for Web.
- Hard to extract information from web data.
4Data Representation for Querying the Web
- Graph Data Models
- Based on a labeled graph in which the nodes
represent web pages, edges represent links
between web pages, and the labels on the edges
can be attribute names - Capable of express navigational queries over the
graph structure. - Semistructured Data Models
- Based on labeled directed graphs. There is no
restriction on the number of edges that can go
out from a given node, or on the type of
attribute value. - Be able to query the schema or the labels on the
edges of the graph
5Data Representation for Querying the Web (cont.)
A Hypertree Containing a Publications Database
(WebOQL) AM98
6Data Representation for Querying the Web (cont.)
- Semantic Web Data Models
- Semantic Web is a Web whose content can be
annotated by metadata and be processed
automatically by machines. - The formulation of semantic assertions of
semantic Web is based on Resource Description
Framework (RDF) model LS99, which can be viewed
as a partially labeled directed graph. - They have the ability to exploit the semantics of
the Web content and can provide better query
result than their counterpart that based on the
content and structure of the Web data.
7Data Representation for Querying the Web (cont.)
An Example RDF Graph WWW1
8Modeling and Querying the Web
- Query Languages for Graph Representation of
Website - The query languages combine both the
content-based queries and structure-based
queries. Therefore, they are able to formulate
regular path expression queries and to express
navigational queries over the graph structure. - WebSQL MMM97, W3QL KS95, WebLog LSS96
- Example WebSQL MMM97
9Modeling and Querying the Web (cont.)
- WebSQL MMM97
- Model of Web as a relational database with two
virtual relations Document and Anchor. - Documenturl, title, text, type, length, modif
- Anchorbase, href, label
- To map onto the graph structure of the WWW, each
document in the Document relation is mapped to a
node object in the graph and each hypertext link
between two documents in Anchor relation is
represented by a link object.
10Modeling and Querying the Web (cont.)
- Sample query FLM98 to find a list of tuples of
the form (d1, d2, label), where d1 is a document
stored at local site, d2 is a document stored
somewhere else, and d1 points to d2 by a link
labeled label. Suppose all the local documents
are reachable from www.mysite.start. - SELECT d.url, e.url, a.label
- FROM Document d SUCH THAT
- www.mysite.start ? d,
- Document e SUCH THAT d gt e,
- Anchor a SUCH THAT a.base d.url
- WHERE a.href e.url
11Modeling and Querying the Web (cont.)
- Query Languages for Semi-Structured
Representation of Website - To discover the implicit structure within the
semistructured Web data and then recast the Web
data to fit into the discovered structure - WG-Log CDPT98, ULIXES and PENELOPE AMM97a,
WebOQL AM98 - Example WebOQL AM98
12Modeling and Querying the Web (cont.)
- WebOQL AM98
- Introduced a hypertree data structure. Hypertree
is an ordered arc-labeled tree with two kinds of
arcs, internal arcs and external arcs. Internal
arcs are used to indicate structured objects and
external arcs are used to indicate hyperlinks
among objects. Arcs are labeled with records.
A Hypertree Containing a Publications Database
(WebOQL) AM98
13Modeling and Querying the Web (cont.)
- Represent web pages by hypertree and mapping
function. Mapping function is used to map URLs to
corresponding hypertrees. The hypertree and
mapping function are also called schema and
browsing function of the Web respectively. - Sample query FLM98 to extract the title and
URL of the full version of papers authored by
Smith from the csPapers database. - SELECT y.Title, y.Url
- FROM x in csPapers, y in x
- WHERE y.Authors Smith
14Modeling and Querying the Web
- Query Languages for Semantic Web
- Semantic web is a web whose content can be
annotated by metadata and be processed
automatically by machines. - Semantic query has the ability to exploit the
semantics of the Web content. - RQL KACPS02, SquishQL MSR02 , TRIPLE
SBAHKW02.
15Summary and Future
- Summary
- Web data models are divided into three main
categories graph data model, semistructured data
model and semantic web data model. - Based on these data models, Web query languages
are also classified into three primary groups. - Future
- To develop techniques to manipulate dynamic pages
could be beneficial to Web query application and
it may be a promising direction for future
research. - To combine the query result from different
resource on the Web, especially the result from
both structured and unstructured data sources
also pose some challenges for future research.
16References
- AM98 G. Arocena, A. Mendelzon, WebOQL
Restructuring Documents, Databases, and Webs,
Proc. ICDE'98, Orlando, Florida, Feb. 1998. - CDPT98 S. Comai, E. Damiani, R. Posenato, L.
Tanca, A Schema-based Approach to Modeling and
Querying WWW Data, Proc. of FQAS'98, Roskilde,
May 1998, LNAI 1495. - AMM97a P. Atzeni, G. Mecca, P. Merialdo, To
Weave the Web, International Conference on Very
Large Data Bases (VLDB'97), Athens, Greece,
August 26-29, 1997, pages 206-215. - FLM98 D. Florescu, A. Levy, A. Mendelzon,
Database Techniques for the World-Wide Web A
Survey, SIGMOD Record 27, 3 (1998), 59-74. - KACPS02 G. Karvounarakis, S. Alexaki, V.
Christophides, D. Plexousakis, M. Scholl, RQL A
Declarative Query Language for RDF, WWW2002, May
2002, Honolulu, Hawaii. - KS95 D. Konopnicki and O. Shmueli,
W3QS A query system for the World Wide Web, In
Proc. of the Int. Conf. on Very Large Data Bases
(VLDB), pages 54-65, Zurich, Switzerland, 1995. - LSS96 L. V. S. Lakshmanan, F. Sadri, L.
N. Subramanian, A declarative language for
querying and restructuring the Web, In Proc. of
the sixth International Workshop on Research
Issues in Data Engineering, RIDE96, New Orleans,
February 1996. - MM97 A. O. Mendelzon, T.
Milo, Formal Models of Web Queries, Proceedings
of the Sixteenth ACM Symposium on Principles of
Database Systems, 134-143, 1997. - MMM97 A. Mendelzon, G. Mihaila, T. Milo,
Querying the world wide web, International
Journal on Digital Libraries, 1(1)54-67, 1997.
17- MSR02 L. Miller, A. Seaborne,
A. Reggiori, Three Implementations of SquishQL,
a Simple RDF Query Language, Proceedings of 1st
International Semantic Web Conference. ISWC2002,
Sardinia, Italy, June 9-12, 2002 - SBAHKW02A. Sheth, C. Bertram, D. Avant, B.
Hammond, K. Kochut, Y. Warke, Semantic Content
Management for Enterprises and the Web, IEEE
Internet Computing, July/August 2002, pp.80-87,
2002. - WWW1 http//www.amk.ca/talks/semweb-intro,
Introduction to the Semantic Web and RDF