Modeling and Querying Web Data A Survey - PowerPoint PPT Presentation

About This Presentation
Title:

Modeling and Querying Web Data A Survey

Description:

The most common techniques used in searching information from the Web are based ... WG-Log [CDPT98], ULIXES and PENELOPE [AMM97a], WebOQL [AM98] Example: WebOQL [AM98] ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 18
Provided by: admi1144
Learn more at: https://s2.smu.edu
Category:

less

Transcript and Presenter's Notes

Title: Modeling and Querying Web Data A Survey


1
Modeling and Querying Web DataA Survey
  • By Li Lu

2
Overview
  • Introduction
  • Data Representation for Querying the Web
  • Modeling and Querying the Web
  • Summary and Future

3
Introduction
  • Background
  • The most common techniques used in searching
    information from the Web are based on sending
    information retrieval requests to index servers.
  • Use web query techniques to locate, filter and
    present web information.
  • Challenges
  • Difficult to build a common model for Web.
  • Hard to extract information from web data.

4
Data Representation for Querying the Web
  • Graph Data Models
  • Based on a labeled graph in which the nodes
    represent web pages, edges represent links
    between web pages, and the labels on the edges
    can be attribute names
  • Capable of express navigational queries over the
    graph structure.
  • Semistructured Data Models
  • Based on labeled directed graphs. There is no
    restriction on the number of edges that can go
    out from a given node, or on the type of
    attribute value.
  • Be able to query the schema or the labels on the
    edges of the graph

5
Data Representation for Querying the Web (cont.)
A Hypertree Containing a Publications Database
(WebOQL) AM98
6
Data Representation for Querying the Web (cont.)
  • Semantic Web Data Models
  • Semantic Web is a Web whose content can be
    annotated by metadata and be processed
    automatically by machines.
  • The formulation of semantic assertions of
    semantic Web is based on Resource Description
    Framework (RDF) model LS99, which can be viewed
    as a partially labeled directed graph.
  • They have the ability to exploit the semantics of
    the Web content and can provide better query
    result than their counterpart that based on the
    content and structure of the Web data.

7
Data Representation for Querying the Web (cont.)
An Example RDF Graph WWW1
8
Modeling and Querying the Web
  • Query Languages for Graph Representation of
    Website
  • The query languages combine both the
    content-based queries and structure-based
    queries. Therefore, they are able to formulate
    regular path expression queries and to express
    navigational queries over the graph structure.
  • WebSQL MMM97, W3QL KS95, WebLog LSS96
  • Example WebSQL MMM97

9
Modeling and Querying the Web (cont.)
  • WebSQL MMM97
  • Model of Web as a relational database with two
    virtual relations Document and Anchor.
  • Documenturl, title, text, type, length, modif
  • Anchorbase, href, label
  • To map onto the graph structure of the WWW, each
    document in the Document relation is mapped to a
    node object in the graph and each hypertext link
    between two documents in Anchor relation is
    represented by a link object.

10
Modeling and Querying the Web (cont.)
  • Sample query FLM98 to find a list of tuples of
    the form (d1, d2, label), where d1 is a document
    stored at local site, d2 is a document stored
    somewhere else, and d1 points to d2 by a link
    labeled label. Suppose all the local documents
    are reachable from www.mysite.start.
  • SELECT d.url, e.url, a.label
  • FROM Document d SUCH THAT
  • www.mysite.start ? d,
  • Document e SUCH THAT d gt e,
  • Anchor a SUCH THAT a.base d.url
  • WHERE a.href e.url

11
Modeling and Querying the Web (cont.)
  • Query Languages for Semi-Structured
    Representation of Website
  • To discover the implicit structure within the
    semistructured Web data and then recast the Web
    data to fit into the discovered structure
  • WG-Log CDPT98, ULIXES and PENELOPE AMM97a,
    WebOQL AM98
  • Example WebOQL AM98

12
Modeling and Querying the Web (cont.)
  • WebOQL AM98
  • Introduced a hypertree data structure. Hypertree
    is an ordered arc-labeled tree with two kinds of
    arcs, internal arcs and external arcs. Internal
    arcs are used to indicate structured objects and
    external arcs are used to indicate hyperlinks
    among objects. Arcs are labeled with records.

A Hypertree Containing a Publications Database
(WebOQL) AM98
13
Modeling and Querying the Web (cont.)
  • Represent web pages by hypertree and mapping
    function. Mapping function is used to map URLs to
    corresponding hypertrees. The hypertree and
    mapping function are also called schema and
    browsing function of the Web respectively.
  • Sample query FLM98 to extract the title and
    URL of the full version of papers authored by
    Smith from the csPapers database.
  • SELECT y.Title, y.Url
  • FROM x in csPapers, y in x
  • WHERE y.Authors Smith

14
Modeling and Querying the Web
  • Query Languages for Semantic Web
  • Semantic web is a web whose content can be
    annotated by metadata and be processed
    automatically by machines.
  • Semantic query has the ability to exploit the
    semantics of the Web content.
  • RQL KACPS02, SquishQL MSR02 , TRIPLE
    SBAHKW02.

15
Summary and Future
  • Summary
  • Web data models are divided into three main
    categories graph data model, semistructured data
    model and semantic web data model.
  • Based on these data models, Web query languages
    are also classified into three primary groups.
  • Future
  • To develop techniques to manipulate dynamic pages
    could be beneficial to Web query application and
    it may be a promising direction for future
    research.
  • To combine the query result from different
    resource on the Web, especially the result from
    both structured and unstructured data sources
    also pose some challenges for future research.

16
References
  • AM98 G. Arocena, A. Mendelzon, WebOQL
    Restructuring Documents, Databases, and Webs,
    Proc. ICDE'98, Orlando, Florida, Feb. 1998.
  • CDPT98 S. Comai, E. Damiani, R. Posenato, L.
    Tanca, A Schema-based Approach to Modeling and
    Querying WWW Data, Proc. of FQAS'98, Roskilde,
    May 1998, LNAI 1495.
  • AMM97a P. Atzeni, G. Mecca, P. Merialdo, To
    Weave the Web, International Conference on Very
    Large Data Bases (VLDB'97), Athens, Greece,
    August 26-29, 1997, pages 206-215.
  • FLM98 D. Florescu, A. Levy, A. Mendelzon,
    Database Techniques for the World-Wide Web A
    Survey, SIGMOD Record 27, 3 (1998), 59-74.
  • KACPS02 G. Karvounarakis, S. Alexaki, V.
    Christophides, D. Plexousakis, M. Scholl, RQL A
    Declarative Query Language for RDF, WWW2002, May
    2002, Honolulu, Hawaii.
  • KS95 D. Konopnicki and O. Shmueli,
    W3QS A query system for the World Wide Web, In
    Proc. of the Int. Conf. on Very Large Data Bases
    (VLDB), pages 54-65, Zurich, Switzerland, 1995.
  • LSS96 L. V. S. Lakshmanan, F. Sadri, L.
    N. Subramanian, A declarative language for
    querying and restructuring the Web, In Proc. of
    the sixth International Workshop on Research
    Issues in Data Engineering, RIDE96, New Orleans,
    February 1996.
  • MM97 A. O. Mendelzon, T.
    Milo, Formal Models of Web Queries, Proceedings
    of the Sixteenth ACM Symposium on Principles of
    Database Systems, 134-143, 1997.
  • MMM97 A. Mendelzon, G. Mihaila, T. Milo,
    Querying the world wide web, International
    Journal on Digital Libraries, 1(1)54-67, 1997.

17
  • MSR02 L. Miller, A. Seaborne,
    A. Reggiori, Three Implementations of SquishQL,
    a Simple RDF Query Language, Proceedings of 1st
    International Semantic Web Conference. ISWC2002,
    Sardinia, Italy, June 9-12, 2002
  • SBAHKW02A. Sheth, C. Bertram, D. Avant, B.
    Hammond, K. Kochut, Y. Warke, Semantic Content
    Management for Enterprises and the Web, IEEE
    Internet Computing, July/August 2002, pp.80-87,
    2002.
  • WWW1 http//www.amk.ca/talks/semweb-intro,
    Introduction to the Semantic Web and RDF
Write a Comment
User Comments (0)
About PowerShow.com