Chapter 4 - Quality Control with Schemas Learning XML by Erik T. Ray - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 4 - Quality Control with Schemas Learning XML by Erik T. Ray

Description:

is a pass or fail test for XML documents (validation) ... zap )* August 2006. 7. Element: Character Notations. Question Mark: Character: ... – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 31
Provided by: jackd64
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4 - Quality Control with Schemas Learning XML by Erik T. Ray


1
Chapter 4 - Quality Control with Schemas
Learning XMLbyErik T. Ray
  • Slides were developed by Jack DavisCollege of
    Information Scienceand TechnologyRadford
    University

2
Schemas
  • define an XML tag set- primarily elements,
    attributes, entities and structure
  • is a pass or fail test for XML documents
    (validation)
  • insure that a document fulfills a minimum set of
    requirements, finding flaws that could result in
    anomalous processing
  • are not required
  • A validating XML parser takes an XML instance as
    input and produces a validation report as output.
    This report typically lists errors found in the
    document (where it does not conform to the
    schema)
  • Validation considersstructure, data typing,
    integrity (status of links between nodes and
    resources), business rules (spell checking,
    checksum)

3
Schema Types
  • DTD - Document Type DefinitionThe oldest and
    most widely supported schema language.DTD's
    don't support namespaces (can't mix tag sets
    within a single DTD) and have very weak data
    typing.
  • The W3C built XML SchemaXML Schemas are
    themselves XML documents, so they can be checked
    for well-formedness and validity.XML Schema
    support namespaces and have a much broader
    ability to specify data types, including things
    like date types.
  • Other schema definition languages are available
    (RELAX NG, Schematron, ).

4
DTD's
  • XML elements and attributes are defined in a DTD
  • DTD's are extensible - meaning they can be
    extended to meet the needs of the current task
  • A DTD can be specified within an XML document
    (internal) or in a separate file (external).
  • Many free DTD's exist on the internet today and
    can be freely downloaded
  • DTD's declare a set of allowed elements. A
    conforming XML document can't use any elements
    not defined in this set.
  • DTD's define a content model for each element.
    This describes what elements or data can go
    inside an element, in what order, in what number,
    and whether they are required or optional.
  • DTD's declare a set of allowed attributes for
    each element with data types and default values.
  • DTD's provide mechanisms to manage the model,
    providing links to other components.

5
Element Declarations
  • Element declarationlt!ELEMENT element_name
    (content model)gtContent ModelText
  • Description text or character data
  • Syntax (PCDATA)
  • Elements
  • Description contains other elements
  • Syntax (element_1, element_2, )
  • Mixed Content
  • Description contains both text and other
    elements
  • Syntax (PCDATA element_1 element2 )
  • Empty
  • Description does not contain any content
  • Syntax EMPTY
  • Any
  • Description can contain text or elements
  • Syntax ANY

6
Element Declaration Syntax
  • Declaration syntax is flexible when it comes to
    whitespace. You can add extra space anywhere
    except in the string of characters at the
    beginning that identifies the declaration
    type.For example, these are all
    acceptablelt!ELEMENT thingie
    ALLgtlt!ELEMENT thingie
    ALLgtlt!ELEMENT thingie ( foo

    bar
    zap )gt

7
Element Character Notations
  • Question Mark
  • Character ?
  • Description element may occur zero or one time
  • Usage email?
  • Asterisk
  • Character
  • Description element may occur zero or more times
  • Usage email
  • Plus
  • Character
  • Description element may occur one or many times
  • Usage email

8
Element Character Notations (cont.)
  • Parentheses
  • Character ( )
  • Description used to indicate a set
  • Usage (name, address, zip_code)
  • Vertical bar
  • Character
  • Description used to indicate a set of values
  • Usage a b c
  • Comma
  • Character ,
  • Description used to indicate element sequence
  • Usage (a, b, c)

9
Attribute Declarations
  • lt!ATTLIST element_name
  • attribute_name-1 datatype default_value
  • attribute_name-2 datatype default_value
  • attribute_name-3 datatype default_valuegt
  • lt!ATTLIST student
  • level CDATA
    REQUIREDgt
  • lt!ATTLIST student level (fr
    soph jr sr) "fr"gt

10
Attribute Data Types
  • Data type CDATA
  • Description character data
  • Data type ID
  • Description unique identifier to give an
    element a label
  • Data type Enumerated List (i.e., (a, b, c) )
  • Description list of all possible values that the
    attribute can contain

11
Attributes Default Values
  • Attribute type FIXED
  • Description value of the attribute must match
    the value assigned in the DTD
  • Attribute type REQUIRED
  • Description element must contain the attribute
    to be valid
  • Attribute type IMPLIED
  • Description attribute is optional

12
Example XML Document
  • lt?xml version1.0 standaloneyes?gt
  • ltemailsgt
  • ltmessage numa1 tojoe64acmeshipping.com
  • frombrenda64xyzcompany.com
    date02/09/01gt
  • ltsubject titleOrder 10011/gt
  • ltbodygt
  • Joe, Please let me know if order number 10011
    has shipped.
  • Thanks,
  • Brenda
  • lt/bodygt
  • ltreply status"yes"/gt
  • lt/messagegt
  • lt/emailsgt

13
Internal DTD
  • lt!DOCTYPE emails lt!ELEMENT emails
    (message)gt
  • lt!ELEMENT message (subject?, body, reply)gt
  • lt!ATTLIST message
  • num ID REQUIRED
  • to CDATA REQUIRED
  • from CDATA FIXED brenda64xyzcompany.com
  • date CDATA REQUIREDgt
  • lt!ELEMENT subject EMPTYgt
  • lt!ATTLIST subject
  • title CDATA IMPLIEDgt
  • lt!ELEMENT body ANYgt
  • lt!ELEMENT reply EMPTYgt
  • lt!ATTLIST reply
  • status (yes no) "no"gt
  • gt
  • In a standalone XML document this is prepended
    to the XML document. If it's an external DTD the
    XML document must contain a declaration like the
    following.lt!DOCTYPE emails SYSTEM "emails.dtd"gt
    orlt!DOCTYPE emails SYSTEM "http//"gt

14
DTD Census Example
  • Here's an example XML document. The information
    in this example is a census document. The
    following example is a typical Census example XML
    document. It's created after an interview with
    one family. Consider that all such documents
    could be compiled and overall statistics
    generated. example 4-1
  • Here's the DTD that generates the rules by which
    the Census XML documents are created.example
    4.2

15
DTD Design
  • DTD design and construction is part science and
    part art form. The basic concepts are simple,
    but maintaining hundreds of element and attribute
    declarations while keeping them readable and
    bug-free can be a challenge.
  • Keep it organizedGood comments can save hours of
    scrutinizing later, do not wait until the end to
    document. Keep declarations separated into
    sections by their purpose.Pad declarations with
    lots of whitespace. Content models and attribute
    lists suffer from dense syntax, so spacing out
    the parts, even placing them on separate lines,
    helps. Indent lines inside declarations to make
    the delimiters clearer. Use extra space between
    logical divisions.DTD's will require updating
    as requirements change. Number versions to avoid
    lots of confusion later.

16
DTD Design (cont.)
  • Parameter entitiesParameter entities can hold
    recurring parts of declarations and allow you to
    edit them in one place. In the external subset,
    they can be used in element-type declarations to
    hold element groups and content models, or in
    attribute list declarations to hold attribute
    definitions. For example, assume you want every
    element to have an optional ID attribute for
    linking and an optional class attribute to assign
    specific role information. Parameter entities,
    which apply only in DTDs, look much like ordinary
    general entities, but have an extra in the
    declaration. You can declare a parameter entity
    as in the followinglt!ENTITY common.atts "
    id ID implied class
    CDATA implied" gtthe entity can be used in
    attribute list declarationslt!ATTLIST foo
    common.attsgtlt!ATTLIST bar common.atts
    extra CDATA FIXED "blah"gt

17
Attributes vs. Elements
  • Making a DTD from scratch is not easy. You have
    to break information down into its conceptual
    atoms and package it as a hierarchical structure,
    but it's not always clear how to divide the
    information. Choose names that make sense.
    Element names like thing, object, and chunk are
    nearly impossible to figure out.Hierarchy adds
    information. A newspaper has articles that
    contain paragraphs and heads. Containers create
    boundaries to make it easier to write stylesheets
    and processing applications. Strive for a tree
    structure that resembles a wide, bushy shrub. If
    you go too deep, the markup begins to overwhelm
    the content and it becomes harder to edit a
    document too shallow and the information
    content is diluted.

18
Attributes vs. Elements (cont.)
  • Know when to use elements over attributes. An
    element holds content that is part of your
    document. An attribute modifies the behavior of
    an element. The trick is to find a balance
    between using general elements with attributes to
    specify purpose and creating an element for every
    single contingency.There are advantages to
    splitting a monolithic DTD into smaller
    components, or modules. The first is that a
    modularized DTD can be easier to maintain. XML
    provides two ways to modularize your DTD. The
    first is to store parts in separate files, then
    import them with external parameter entities.
    The second is to use a syntactic device called a
    conditional section.

19
Importing Modules
  • To import whole DTD's or parts of DTDs, use an
    external parameter entity.lt!ELEMENT catalog
    (title, metadata, front,
    entries)gtlt!ENTITY basic.stuff
    SYSTEM
    "basics.mod"gtbasic.stufflt!ENTITY
    frnt.matter SYSTEM "front.mod"gtfrnt.matterlt!EN
    TITY metadata PUBLIC "-//Standards
    Stuff//DTD Metadata v3.2//EN"
    "http//www.standards- ."gtmetadataThis DTD
    has two local components, which are specified by
    system identifiers. Each component has a .mod
    filename extension, which is a traditional way to
    show that a file contains declarations but should
    not be used as a DTD on its own.

20
Examples
  • standalone.xml
  • itfac.xmlReview the itfac.xml document, then
    students should develop the dtd.
  • faculty.dtd
  • faculty.css

21
XML Schema Overview
  • XML Schema specification released by the W3C in
    May 2001, and contains two parts
  • Part I - structure
  • Part II - data types
  • Developed as an alternative to DTDs and is much
    more powerful
  • Features
  • Pattern matching
  • Rich set of data types
  • Attribute grouping
  • Supports XML namespaces
  • Follows XML syntax

22
XML Schemas
  • The XML Schema specification was released by the
    W3C in May of 2001
  • XML Schemas, like DTDs, are used to describe the
    structure of an XML document
  • The XML Schema specification consists of two
    parts
  • XML Schema Structures. This specification
    consists of a definition language for describing
    and constraining the content of XML documents
  • XML Schema Datatypes. This specification defines
    the datatypes to be used in XML schemas.
  • The namespace for XML Schema is
  • http//www.w3.org/2001/XMLSchema

23
XML Schema - advantages
  • XML Schema allows you to import vocabularies (tag
    sets).
  • XML Schemas are XML documents, so they can be
    validated
  • The XML Schema specification contains a number of
    built-in datatypes, and also allows developers to
    create their own datatypes
  • Some of the datatypes arexsstring
    textxstoken contains textual tokens
    xsQName namespace-qualified
    namexsdecimal pos neg floats and
    int'sxsinteger integersxsfloat
    floating pt. numberxsID,IDREF
    identification tokenxsboolean true or
    falsexstime HHMMSSxsdate
    CCYY-MM-DDxsdateTime CCYY-MM-DDTHHMMSS-Z
    one

24
Complex Elements
  • Most elements are not simple. They can contain
    elements, attributes, and character data with
    specialized formats. So, complex elements can be
    defined.Here's an example complex type
    definition.ltxselement name"date"gt
    ltxscomplexTypegt ltxsallgt ltxselement
    ref"year"/gt ltxselement ref"mo"/gt
    ltxselement ref"day"/gt lt/xsallgt
    lt/xscomplexTypegtlt/xselementgtltxselement
    name"year" type"xsinteger"/gtltxselement
    name"mo" type"xsinteger/gtltxselement
    name"day" type"xsinteger/gt

25
Restriction Elements
  • In the previous example the month number was just
    given as type integer. However, this would allow
    the user to insert any integer into the document
    for the month number, obviously we'd like to
    restrict the month number to 1-12.ltxssimpleType
    name"monthNum"gt ltxsrestriction
    base"xsinteger"gt ltxsminInclusive value"1"
    /gt ltxsmaxInclusive value"12" /gt
    lt/xsrestrictiongtlt/xssimpleTypegtltxselement
    name"mo" type"monthNum"/gt

26
Restriction Elements (cont.)
  • Restrictions can create fixed values, constrain
    the length of strings, and match patterns with
    regular expressions. Here's an example that
    restricts a postal code (three digits followed by
    three capital letters).ltxselement
    name"postalcode"
    type"pcode"/gtltxssimpleType name"pcode"gt
    ltxsrestriction base"xstoken"gt ltxspattern
    value"0-93A-Z3"/gt lt/xsrestrictiongtlt/x
    ssimpleTypegt
  • Can also implement enumeration typesltxs
    simpleType name"gender"gt ltxsrestriction
    base"xstoken"gt ltxsenumeration
    value"female"/gt ltxsenumeration
    value"male"/gt lt/xsrestrictiongtlt/xssimpleType
    gt

27
XML Schema Occurrence Constraints
  • Occurrence constraints define the number of times
    a particular element can or must occur
  • Attributes
  • minOccursDefines the minimum number of times an
    element can occur. Default value is 1
  • maxOccurs Defines the maximum number of times
    an element can occur. Default value is 1
  • Can set the value of the maxOccurs attribute to
    unbounded to indicate that there is no maximum
    number of times the element can occur

28
XML Schema Simple Type Example
  • XML schemas are put together like DTD's with
    element and attribute declarations along with
    type declarations. A simple example shows the
    structure.
  • XML filelt?xml version1.0?gt ltemail
  • xmlnsxsi"http//www.w3.org/2001/XMLSchema-insta
    nce"
  • xsinoNamespaceSchemaLocation
    "email_schema.xsd"gt
  • This is my e-mail message
  • lt/emailgt
  • Schema file
  • lt?xml version1.0?gtltxsdschema xmlnsxsd
    http//www.w3.org/2001/XMLSchemagt
    ltxsdelement nameemail
    typexsdstring/gtlt/xsdschemagt

29
XML Schemas
  • XML Schemas utilizetype extensiontype
    restrictionlistsunionsnamespace
    featuresand much, much more.This brief
    presentation only scratches the surface of XML
    schemas.

30
XML Schema Example
  • Here's a schema for the Census example that a DTD
    was defined for. Note the differences.example
    4-6
Write a Comment
User Comments (0)
About PowerShow.com