Web Data - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Web Data

Description:

Web Data – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 19
Provided by: ccfdb
Category:
Tags: data | trenton | web

less

Transcript and Presenter's Notes

Title: Web Data


1
Web Data
  • Semistructured data
  • XML data

2
Classes of XML Documents
  • Structured
  • Un-normalized relational data
  • Ex product catalogs, inventory data, medical
    records, network messages, logs, stock quotes
  • Mixed
  • Structured data embedded in large text fragments
  • Ex On-line manuals, transcripts, tax forms
  • Application may process XML in both classes
  • Ex SOAP messages
  • Header is structured payload is mixed

3
Structured Data HL7 Lab Report
  • Health-care industry data-exchange format
  • ltHL7gt
  • ltPATIENTgt
  • ltPID IDNum"PATID1234"gt
  • ltPaNagtltFaNagtJoneslt/FaNagtltGiNagtWilliamlt/GiNagt
    lt/PaNagt
  • ltDTofBigtltdategt1961-06-13lt/dategtlt/DTofBigt
  • ltSexgtMlt/Sexgt
  • lt/PIDgt
  • ltOBX SetID"1"gt
  • ltObsVagt150lt/ObsVagt
  • ltObsIdgtNalt/ObsIdgt
  • ltAbnFlgtAbove highlt/AbnFlgt
  • lt/OBXgt
  • ...

4
Mixed Data Library of Congress
  • Documents of U.S. Legislation
  • ltbill bill-stage"Introduction""gt
  • ltcongressgt110th CONGRESSlt/congressgt
  • ltsessiongt1st Sessionlt/sessiongt
  • ltlegis-numgtH.R. 133lt/legis-numgt
  • ltcurrent-chambergtIN THE HOUSE OF
    REPRESENTATIVESlt/current-chambergt
  • ltaction date"June 5, 2008"gt
  • ltaction-descgt
  • ltsponsorgtMr. Englishlt/sponsorgt (for himself
    and ltcosponsorgtMr.Coynelt/cosponsorgt)
    introduced the following
  • bill which was referred to the
    ltcommittee-namegtCommittee on Financial
    Serviceslt/committee-namegt ...
  • lt/action-descgt

5
Wheres the XML Data?
6
Schemas
here lies our interest
  • why ?
  • XML to describe semantics
  • semistructured data to improve processing
  • what ?
  • semistructured data foundational
  • XML several concrete proposals

7
Schemas
  • when ?
  • semistructured data, XML a posteriori
  • RDBMS a priori, to interpret binary data
  • how ?
  • semistructured data schema is independent
  • XML schema is hardwired with the data

8
Outline
  • schemas for semistructured data
  • foundations
  • schema extraction
  • schemas for XML
  • DTD
  • XML-Schema
  • RDF-Schema

9
Schemas An Example
Some database
10
Lower-Bound Schemas
Root
person
company
works-for
managed-by
Employee
Company
c.e.o.
name
address
name
string
11
Upper Bound Schemas
Root
person
company
works-for
managed-by
Employee
Company
c.e.o. employee
name address url
name phone position
description
string
Any
-
12
The Two Questions to Ask
  • Conformance does that data conform to this
    schema ?
  • Classification if so, then which objects belong
    to what classes ?

13
Schemas for Semistructured XML Data
  • Motivations for considering schema
  • Optimize query evaluation
  • Improve storage efficiency
  • Support index construction
  • Facilitate the description of database content
  • Facilitate query formulation
  • Facilitate data integration.

14
Application 1 Improve Secondary Storage
Lower-bound schema
Store rest in overflow graph
15
Application 2 Query Optimization
select X.title from Bib._ X where X..zip
12345
select X.title from Bib.book X where
X.address.zip 12345
Upper-bound schema
Fernandez, Suciu 1998
16
Schema Extraction(From Data)
  • Problem statement
  • given data instance D
  • find the most specific schema S for D
  • In practice S too large, need to relax

S.Nestorov , S.Abiteboul, and R.Motwani,
Inferring structure in semistructure data. In
Proc. of The Workshop on Management of
Semi-structured Data, 1997
17
Schema Extraction(From Data)
  • Roy Goldman, Jennifer Widom DataGuides Enabling
    Query Formulation and Optimization in
    Semistructured Databases. VLDB 1997

18
Schema Extraction(From XML Data)
  • Minos N. Garofalakis, Aristides Gionis, Rajeev
    Rastogi, S. Seshadri, Kyuseok Shim XTRACT A
    System for Extracting Document Type Descriptors
    from XML Documents. SIGMOD Conference 2000
Write a Comment
User Comments (0)
About PowerShow.com