SemiStructured Data and XML - PowerPoint PPT Presentation

About This Presentation
Title:

SemiStructured Data and XML

Description:

Data is represented in some organized fashion. Jacob (Jack) Gryn - Presented November 28, 2002 ... to be queried by a language like SQL; the web designer, ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 29
Provided by: cseY
Category:

less

Transcript and Presenter's Notes

Title: SemiStructured Data and XML


1
Semi-Structured Data and XML
2
Agenda
  • Semi-Structured Data
  • XML

3
Semi-Structured Data an Introduction
  • What is structured data
  • What is non-structured data
  • What is semi-structured data
  • How is semi-structured data represented?
  • What can we do with semi-structured data?

4
What is Structured Data?
  • Strongly typed variables/attributes
  • (ie. int, float, string20)
  • Every attribute in a relation is defined for all
    records
  • Data is represented in some organized fashion

5
An Example of Structured Data
A relational database can be considered
structured data
6
What is Non-Structured Data?
  • Data that has no type definitions
  • Data is not organized according to any pattern
  • No concept of variables or attributes

7
An Example of Non-Structured Data
Bob was born sometime in August of 1949. He has
a reasonable salary of 52000. Someone else was
born on the 12th of a different month, his name
is Bill. By the way, Bob was born on the 13th of
August.
As you can see, such data would be almost
impossible to have a computer automatically parse.
8
Then what is Semi-Structured Data?
  • Anything in between structured and non-structured
    data!

9
Then what is Semi-Structured Data?
  • Everything in between structured and
    non-structured data
  • Variables are loosely typed
  • x1 is valid, so is xhello
  • A record does not need to have all attributes
    defined
  • ie. In a database of cars, if we dont know the
    engine type, we can choose not to define the
    field for tha particular record. Whereas in a
    structured database, the attribute would be
    defined, but set to NULL.
  • An attribute of a record could be another record
  • It does not necessarily have to differentiate
    between an identifier and a value

10
So how is semi-structured data represented?
  • Semi-Structured data can be represented as a tree

11
So how is semi-structured data represented?
  • Semi-Structured data can be represented in the
    form of indented text

Bob Birthday 1949 August 13 Salary 52,0
00 Bill Birthday 1967 April
12
So how is semi-structured data represented?
  • Semi-Structured data can be represented as a
    markup language (ie. HTML, XML, LISP, AceDB,
    Tsimmis)

ltemployee id3gt ltnamegtBoblt/namegt ltextensiongt55
13lt/extensiongt ltdepartmentgtSaleslt/departmentgt lts
alarygt45000lt/salarygt lt/employeegt ltemployee
id1gt ltnamegtEdlt/namegt ltextensiongt6766lt/extensi
ongt ltofficegt312lt/officegt ltdepartmentgtExecutivelt/
departmentgt ltsalarygtConfidentiallt/salarygt ltemploy
eegt
13
Overview
  • Semi-Structured data is not necessarily created
    with the intention of being processed.
  • ie. Web pages are not necessarily intended to be
    queried by a language like SQL the web designer,
    not taking this into consideration may not make
    it easy for the data to be processed by a machine.

14
What can we do with Semi-Structured Data?
  • Since there is some structure, it can be scanned
    and parsed
  • Once the data is parsed, we can query it using
    specialized query languages such as UnQL, GEXT
    and Lorel
  • We can clean it up to be placed into a
    structured relational database

15
XML an Introduction to XML
  • What is XML?
  • What does it offer to creators of DBs?
  • How can XML be used as a DB?
  • Representations of XML
  • Other features of XML
  • Disadvantages to XML

16
Summary / Key Points of Semi-Structured data
  • In between structured and non-structured data
  • Loosely typed attributes
  • Not all attributes need to be defined for every
    record
  • Can be parsed and queried

17
What is XML?
  • XML stands for eXtensible Markup Language
  • Based on tags similar to HTML
  • Actually, XHTML is a form of XML
  • Used to define markup languages

18
What does XML offer to database designers?
  • Readable by humans using Unicode or ASCII text
  • Easy for computers to parse
  • Can easily be used as back-end for web sites

19
How can XML be used as a database?
Consider the following data
ltemployee id3gt ltnamegtBoblt/namegt ltextensiongt55
13lt/extensiongt ltdepartmentgtSaleslt/departmentgt lts
alarygt45000lt/salarygt lt/employeegt ltemployee
id1gt ltnamegtEdlt/namegt ltextensiongt6766lt/extensi
ongt ltofficegt312lt/officegt ltdepartmentgtExecutivelt/
departmentgt ltsalarygtConfidentiallt/salarygt ltemploy
eegt
It can be written in XML as follows
Notice that this is semi-structured data, since
not all the fields are filled in and because they
are loosely typed.
20
In XML, there are few restrictions to how data
can be laid out
  • The tag names can represent either attribute
    names or data itself
  • Tag names can be defined to anything the creator
    wishes

21
But, there are still a few restrictions
  • Every tag that is opened, must be closed.
  • ltnamegtBoblt/namegt
  • Close tag is not needed for empty data
  • ltmyelement/gt
  • If one tag is opened inside the field of another
    tag, it must be closed before the outer tag is
    closed.
  • ltemployeegtltnamegtBoblt/employeegtlt/namegt
  • ltemployeegtltnamegtBobgtlt/namegtlt/employeegt
  • Tags are case sensitive

22
How can XML be represented?
  • As a tree structure
  • As text/markup tags

23
How can XML be represented?
  • As a tree structure

Take our previous example
  • Leaf nodes generally, but do not necessarily
    store the data
  • Recent web browsers will show this structure

24
How can XML be represented?
  • As a text/markup language

Take our previous example
ltemployee id3gt ltnamegtBoblt/namegt ltextensiongt55
13lt/extensiongt ltdepartmentgtSaleslt/departmentgt lts
alarygt45000lt/salarygt lt/employeegt ltemployee
id1gt ltnamegtEdlt/namegt ltextensiongt6766lt/extensi
ongt ltofficegt312lt/officegt ltdepartmentgtExecutivelt/
departmentgt ltsalarygtConfidentiallt/salarygt ltemploy
eegt
25
Other features of XML
  • It is easy to parse
  • It can be queried like a database
  • It can be used with XSL Templates to easily
    generate web pages from data
  • It can be used with DTS (Document Type
    Definition) to run as a fully structured database

26
Disadvantages to XML
  • Difficult create indexes on
  • Difficult to optimize queries
  • Requires additional disk space
  • Text format
  • Redundant data in tags
  • No single standard of how data should be stored
    in XML

27
Summary / Key points of XML
  • Data stored using text-based markup language
  • Can also be represented in tree format
  • Can store structured and semi-structured data
  • Easy to parse and query, but inefficient

28
Where to Get More Information
  • Search the web, youll find something!
Write a Comment
User Comments (0)
About PowerShow.com