Title: Introduction the course and XML
1Introduction the course and XML
2Course Info
- CSC8710 Seminar on Database Management
- Time 600-720PM TuTh
- Place State Hall 213
- Webpage http//www.cs.wayne.edu/csc8710
3What is this course about?
- Reading XML papers
- Doing XML projects
- Conducting XML research (writing research papers
on XML)
4Reading XML papers
- Each student will present one paper selected from
the list of papers covering the following - Using RDBMS to store and query XML
- Publishing relational data as XML
- XML constraints
- XML integration
- Query and searching XML data
- XML algebra, XPath, type checking, etc.
5Doing XML projects
- As a warming up of the final research project,
two small programming projects will be given. - They are simple, but good practice of XML
programming - Detailed specifications will be given
6Conducting XML research
- Each group selects one project with a
consultation from the instructor - Team work, each group of two students
- Work closely with your partner and the instructor
- To promote progress, a series of assignments will
be given. They will not be graded, but will
affect the instructors impression of your
progress.
7Goal of the course
- Have an broad knowledge of XML literature
- improved presentation skill
- Have a deep knowledge and experience on the
specific research topic - Ready to do XML research
8Course Info
- Prerequisites
- CSC6710 and CSC7710 Or with the permission of the
instructor. - Instructor
- Shiyong Lu (shiyong_at_cs.wayne.edu )
- Office 430 State Hall
- Telephone 577-1667
- Office hours Tu, Th 400-500PM or by
appointment.
9Prerequisites
- CSC6710 and CSC7710 Or with the permission of the
instructor.
10Course load and grading
- (0 ) a series of assignments will be given,
although they will not be graded. - (30 ) Two programming projects (15 pts each)
- (15 ) Lecture presentation
- (15 ) Final project demonstration
- (40 ) Final professional publication-quality
research paper. - The grade will be given on a group basis (of two
students) except for the individual lecture
presentation.
11Late work penalty
- You can have one late assignment submission up to
one week without any penalty. Please indicate on
the cover page of your submission when you use
your late excuse. If late excuse is not used, a
penalty of 2 per day will be assessed up to one
week. No credits will be given for works handed
in one week after the due date. The late excuse
cannot be used for the final project.
12Academic Honesty
- Copying an assignment from another student in
this class or obtaining a solution from some
other source will lead to an automatic failure
for this course and to a disciplinary action.
Allowing another student to copy one's work will
be treated as an act of academic dishonesty,
leading to the same penalty as copying. get a
failure).
13What is XML?
- Example
- lt?xml version1.0
- ltemailgt
- ltfromgtsmith_at_cs.wayne.edult/fromgt
- lttogt shiyong_at_cs.wayne.edu lt/togt
- ltsubjectgtWhat is XMLlt/subjectgt
- ltbodygt
- Can tell you me what XML is all about?
- lt/bodygt
- lt/emailgt
14What is XML (cont)
- XML eXtensible Markup Language
- No fixed collection of markup tags (meta
language) - Semi-structured self-descrbing
- Separate syntax from semantics
- The standard for representing and exchanging
information on the WWW
15HTML example
- lth1gtRhubarb Cobblerlt/h1gt lth2gtMaggie.Herrick_at_b
bs.mhv.netlt/h2gt - lth3gtWed, 14 Jun 95lt/h3gt
- Rhubarb Cobbler made with bananas as the
main sweetener. It was delicious. Basicly it was - lttablegt
- lttrgtlttdgt 2 1/2 cups lttdgt diced rhubarb
lttrgtlttdgt 2 tablespoons lttdgt sugar lttrgtlttdgt 2 lttdgt
fairly ripe bananas lttrgtlttdgt 1/4 teaspoon lttdgt
cinnamon lttrgtlttdgt dash of lttdgt nutmeg lt/tablegt
Combine all and use as cobbler, pie, or crisp.
Related recipes lta href"GardenQuiche"gtGarden
Quichelt/agt
16A corresponding XML doc
- ltrecipe id"117" category"dessert"gt
- lttitlegtRhubarb Cobblerlt/titlegt ltauthorgtltemailgtMagg
ie.Herrick_at_bbs.mhv.netlt/emailgtlt/authorgt
ltdategtWed, 14 Jun 95lt/dategt - ltdescriptiongt Rhubarb Cobbler made with
bananas as the main sweetener. It was delicious.
lt/descriptiongt ltingredientsgt ltitemgtltamountgt2 1/2
cupslt/amountgtlttypegtdiced rhubarblt/typegtlt/itemgt
ltitemgtltamountgt2 tablespoonslt/amountgtlttypegtsugarlt/t
ypegtlt/itemgt ltitemgtltamountgt2lt/amountgtlttypegtfairly
ripe bananaslt/typegtlt/itemgt ltitemgtltamountgt1/4
teaspoonlt/amountgtlttypegtcinnamonlt/typegtlt/itemgt
ltitemgtltamountgtdash oflt/amountgtlttypegtnutmeglt/typegtlt
/itemgt lt/ingredientsgt ltpreparationgt Combine all
and use as cobbler, pie, or crisp. lt/preparationgt
ltrelated url"GardenQuiche"gtGarden
Quichelt/relatedgt lt/recipegt
17HTML vs XML
- the markup tags are chosen purely for logical
structure this is just one choice of markup
detail level - we need to define which XML documents we regard
as "recipe collections" (XML Schema) - we need a stylesheet to define browser
presentation semantics (XSL) - we need to express queries in a general way
(XQuery)
18A conceptual view of XML
- Character data (XML content)
- XML elements
- XML attributes
19A concrete view of XML
Markup tags denote elements ...ltfoo
attr"val" ...gt...lt/foogt...
a matching element end
tag the
contents of the element an
attribute with name attr and value val, values
enclosed by ' or " an element start tag
with name foo There is a short-hand notation for
empty elements ...ltfoo attr"val".../gt...
20A concrete view of XML (cont)
- An XML document must be well-formed
- start and end tags must match
- element tags must be properly nested
- some more subtle syntactical requirements
21A concrete view of XML (cont)
- XML is case sensitive!
- Special characters can be escaped using Unicode
character references - 60 and lt both yield lt
- 38 and amp both yield
- lt!-- comment --gt
- lt!DOCTYPE ...gt document type declaration
(described later...)
22XML application examples
- XHTML, W3C's XMLization of HTML 4.0.
- CML, Chemical Markup Language
- WML, Wireless Markup Language for WAP services
- ThML, Theological Markup Language
- Much more
23Why XML
- It is hot () (both in industry and academic)!
- Syntax itself is not enough, but tools and
languages to process XML - For database people, how to manage XML data
(storage, update, query, and exchange, and
transformation, etc)
24XML techniques
- common extensions to the core XML specificationa
namespace mechanism, document inclusion, etc. - schemas grammars to define classes of documents
- linking between documentsa generalization of
HTML anchors and links - addressing parts of read-only documentsflexible
and robust pointers into documents - transformationconversion from one document class
to another - queryingextraction of information, generalizing
relational databases
25XML namespaces
- ltwidget type"gadget"gt lthead size"medium"/gt
ltbiggtltsubwidget ref"gizmo"/gtlt/biggt ltinfogt
ltheadgt lttitlegtDescription of
gadgetlt/titlegt lt/headgt ltbodygt
lth1gtGadgetlt/h1gt A gadget contains a big
gizmo lt/bodygt lt/infogt - Problem the meaning of head and big depends on
the context!
26XML namespaces (cont)
- Simple solution qualify names with URIs
(Universal Resource Identifiers)
lthttp//www.w3.org/TR/xhtml1headgt
\
/ \ /
------------------------------------
qualifying URI
local name - Do not be confused by the use of URIs for
namespaces - they are not supposed to point to anything
- it is simply the cheapest way of getting unique
names - we rely on existing organizations that control
domain names
27XML namespaces (cont)
- lt... xmlnsfoo"http//www.w3.org/TR/xhtml1"gt
... ltfooheadgt...lt/fooheadgt ...lt/...gt
28XML namespaces (cont)
- xmlnsprefix"URI" declares a namespace with a
prefix and a URI - the scope of declaration is lexical, the element
containing the declaration and all descendants
can be overridden by nested declaration - both element and attribute names can be qualified
with namespaces - the name of the prefix is irrelevant
-applications should use only the URI
29XML name spaces (cont)
- ltwidget xmlns"http//www.widget.org"
xmlnsxhtml"http//www.w3.org/TR/xhtml1"
type"gadget"gt lthead size"medium"/gt
ltbiggtltsubwidget ref"gizmo"/gtlt/biggt ltinfogt
ltxhtmlheadgt ltxhtmltitlegtDescription of
gadgetlt/xhtmltitlegt lt/xhtmlheadgt
ltxhtmlbodygt ltxhtmlh1gtGadgetlt/xhtmlh1gt
A gadget contains a big gizmo
lt/xhtmlbodygt lt/infogtlt/widgetgt
30XML schemas
- A schema is a definition of the syntax of an
XML-based language (i.e. a class of XML
documents). - A schema language is a formal language for
expressing schemas. (DTD, XML Schema)
31DTD Document Type Defintion
- lt!DOCTYPE root-element doctype-declaration...
- lt!ELEMENT element-name content-modelgt, content
model , ,, , , ? - lt!ATTLIST element-name attr-name attr-type
attr-default ...gt
32Element Type Declaration
- elementdecl 'lt!ELEMENT' Name
contentspec 'gt' - contentspec 'EMPTY' 'ANY' Mixed
Children - No element type may be declared more than once
33Element Type Declaration Example
- lt!ELEMENT br EMPTYgt
- lt!ELEMENT p (PCDATAemph) gt
- lt!ELEMENT name.para content.para gt
- lt!ELEMENT container ANYgt
34Empty Elements
- EmptyElemTag 'lt' Name (Attribute) '/gt
- Example
- ltIMG align"left" src"http//www.w3.org/Icons/WWW
/w3c_home" /gt - ltbr/gt
- Question is ltagtlt/agt an empty element?
35DTD (cont)
- lt!ATTLIST element-name attr-name attr-type
attr-default ...gtdeclares which attributes are
allowed or required in which elements attribute
types - CDATA any value is allowed (the default)
- (value...) enumeration of allowed values
- ID, IDREF, IDREFS ID attribute values must be
unique (contain "element identity"), IDREF
attribute values must match some ID (reference to
an element) - ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION
just forget these... (consider them deprecated) - attribute defaults
- REQUIRED the attribute must be explicitly
provided - IMPLIED attribute is optional, no default
provided - "value" if not explicitly provided, this value
inserted by default - FIXED "value" as above, but only this value is
allowed
36Attribute-list declaration example
- lt!ATTLIST termdef
- id ID REQUIRED
- name CDATA IMPLIEDgt
- lt!ATTLIST list
- type (bulletsorderedglos
sary) "ordered"gt - lt!ATTLIST form
- method CDATA FIXED "POST"gt
37A DTD example
- lt!ELEMENT collection (description,recipe)gt
- lt!ELEMENT description ANYgt
- lt!ELEMENT recipe (title,ingredient,preparation,co
mment?,nutrition)gt - lt!ELEMENT title (PCDATA)gt
- lt!ELEMENT ingredient (ingredient,preparation)?gt
- lt!ATTLIST ingredient name CDATA REQUIRED amount
CDATA IMPLIED unit CDATA IMPLIEDgt - lt!ELEMENT preparation (step)gt
- lt!ELEMENT step (PCDATA)gt
- lt!ELEMENT comment (PCDATA)gt
- lt!ELEMENT nutrition EMPTYgt
- lt!ATTLIST nutrition protein CDATA REQUIRED
carbohydrates CDATA REQUIRED fat CDATA REQUIRED
calories CDATA REQUIRED alcohol CDATA IMPLIEDgt
38XML Schema language requirement
- more expressive than XML DTDs
- expressed in XML
- self-describing
- simple enough to be implemented with modest
design and runtime resources - Structures and data types
39XML doc and schema examples
- ltcard xmlns"http//businesscard.org"gt
ltnamegtJohn Doelt/namegt - lttitlegtCEO, Widget Inc.lt/titlegt
- ltemailgtjohn.doe_at_widget.comlt/emailgt
- ltphonegt(202) 456-1414lt/phonegt
- ltlogo url"widget.gif"/gt
- lt/cardgt
40XML doc and schema examples (business_card.xsd,
cont)
- ltschema xmlns"http//www.w3.org/2001/XMLSchema"
xmlnsb"http//businesscard.org"
targetNamespace"http//businesscard.org"gt - ltelement name"card" type"bcard_type"/gt
ltelement name"name" type"string"/gt - ltelement name"title" type"string"/gt ltelement
name"email" type"string"/gt - ltelement name"phone" type"string"/gt ltelement
name"logo" type"blogo_type"/gt - ltcomplexType name"card_type"gt
- ltsequencegt ltelement ref"bname"/gt
- ltelement ref"btitle"/gt
- ltelement ref"bemail"/gt
- ltelement ref"bphone" minOccurs"0"/gt
- ltelement ref"blogo" minOccurs"0"/gt
- lt/sequencegt
- lt/complexTypegt
- ltcomplexType name"logo_type"gt
- ltattribute name"url" type"anyURI"/gt
lt/complexTypegt lt/schemagt
41Overview of XML Schema
- a (global) element declaration associates an
element name with a type - a complex type definition defines requirements
for attributes, sub-elements, and character data
in elements of that type - attribute declarations describe which attributes
that may or must appear - element references describe which sub-elements
that may or must appear, how many, and in which
order - a simple type definition defines a set of strings
to be used as attribute values or character data