Title: Extensible Markup Language XML
1Extensible Markup LanguageXML
2DatabasesTypes
- Data is facts and figures
- Database is a related set of data
- Kinds of databases
- Unstructured
- Meaning of data interpreted by user
- Semi-Structured
- Structure of data wrapped around data
- Structured
- Fixed structure of data
- Data added to the fixed structure
3XMLDefinition and Example
- XML is a text based markup language that is fast
becoming a standard of data interchange - An open standard from W3C
- A direct descendant from SGML
- Example Product Inventory Data
- ltProductgt
- ltNamegtRefrigeratorlt/Namegt
- ltModel NumbergtR3456d2hlt/Model Numbergt
- ltManufacturergtGeneral Electriclt/Manufacturergt
- ltPricegt1290.00lt/Pricegt
- ltQuantitygt1200lt/Quantitygt
- lt/Productgt
4XMLData Interchange
- XMLs key role is data interchange
- Two business partners want to exchange customer
data - Agree on a set of tags
- Exchange data without having to change internal
databases - Other business partners can participate by using
the same tagset - New tags can be added to extend the functionality
-
Key to successful data interchange is building
consensus and standardizing of tag sets
5XML Universal Data
- TCP/IP ? Universal Networking
- HTML ? Universal Rendering
- Java ? Universal Code
- XML ? Universal Data
- Numerous standard bodies are set up for
standardization of tags in different domains - ebXML
- XBRL
- MML
- CML
6HTML vs. XML Comparison
- Both are markup languages
- HTML has fixed set of tags
- XML allows user to specify the tags based on
requirements - Usage
- HTML tags specify how to display data
- XML tags specify semantics of the data
- Tag Interpretation
- HTML specifies what each tag and attribute means
- XML tags delimit data leave interpretation to
the parsing application - Well formedness
- HTML very tolerant of rule violations (nesting,
matching tags) - XML very strictly follows rules of well
formedness
7XML Structure
- Prolog
- Instructs the parser as to what it it parsing
- Contains processing instructions for processor
- Body
- Tags - Entities
- Attributes - Properties of Entities
- Comments - Statements for clarification in the
document - Example
- lt?xml version1.0 encodingUTF-8
standaloneyes ?gt ? Prolog - ltcontactgt
- ltnamegt
- ltfirst namegtSanjaylt/first namegt
- ltlast namegtGoellt/last namegt
- lt/namegt
- ltaddressgt ? Body
- ltstreetgt56 Della Streetlt/streetgt
- ltcitygtPhoenixlt/citygt
- ltstategtAZlt/stategt
- ltzipgt15784lt/zipgt
8XML Prolog
- Syntax lt?xml version1.0 encodingUTF-8
- standaloneyes ?gt
- Contains eclaration that identifies a document as
xml - Version
- Version of XML markup language used in the data
- Not optional
- Encoding
- Identifies the character set used to encode the
data - Default compressed Unicode UTF-8
- Standalone
- Tells whether or not this document references
external entity - May contain entity definitions and tag
specifications
9XML Syntax Elements Attributes
- Uses less-than and greater-than characters (ltgt)
as delimiters - Every opening tag must having an accompanying
closing tag - ltFirst NamegtSanjaylt/First Namegt
- Empty tags do not require an accompanying closing
tag. - Empty tags have a forward slash before the
greater-than sign e.g. ltName/gt - Tags can have attributes which must be enclosed
in double quotes - ltname firstSanjay lastGoel)
- Elements should be properly nested
- The nesting can not be interleaved
- Each document must have one single root element
- Elements and attribute names are case sensitive
10Tree Structure Elements
- XML documents have a tree structure containing
multiple levels of nested tags. - Root element is a single XML element which
encloses all of the other XML elements and data
in the document - All other elements are children of the root
element - lt?xml version1.0 encodingUTF-8
standaloneyes ?gt - ltcontactgt ? Root Element
- ltnamegt
- ltfirst namegtSanjaylt/first namegt
- ltlast namegtGoellt/last namegt
- lt/namegt
- ltaddressgt
- ltstreetgt56 Della Streetlt/streetgt ? Child
Elements - ltcitygtPhoenixlt/citygt
- ltstategtAZlt/stategt
- ltzipgt15784lt/zipgt
- lt/addressgt
- lt/contactgt
-
11Attributes Definition and Example
- Attributes are properties associated with an
element - Each attribute is a name value pair
- No element may contain two attributes with same
name - Name and value are strings
- Example
- lt?xml version1.0 encodingUTF-8
standaloneyes ?gt - ltcontactgt
- ltname firstSanjay lastGoelgtlt/namegt ?
Attributes - ltaddressgt
- ltstreetgt56 Della Streetlt/streetgt ? Nested
Elements - ltcitygtPhoenixlt/citygt
- ltstategtAZlt/stategt
- ltzipgt15784lt/zipgt
- lt/addressgt
- lt/contactgt
12Elements vs. Attributes Comparison
- Data should be stored in Elements
- Information about data (meta-data) should be
stored in attributes - When in doubt use elements
- Rules of thumb
- Elements should have information which some one
may want to read. - Attributes are appropriate for information about
document that has nothing to do with content of
document - e.g. URLs, units, references, ids belong to
attributes - What is your meta-data may be some ones data
13CommentsBasics
- XML comments begin with lt!--and end with --gt
- All data between these delimiters is discarded
- lt!-- This is a list of names of people --gt
- Comments should not come before XML declaration
- Comments can not be placed inside a tag
- Comments may be used to hide and surround tags
- ltNamegt
- ltfirstgtSanjaylt/firstgt
- lt!-- ltlastgtGoellt/lastgt --gt ? Last tag is
ignored - lt/Namegt
- -- string may not occur inside a comment except
as part of its opening and closing tag - lt!-- the Red door -- that is the second --gt ?
Illegal
14Namespaces Basics
- XML documents come from different sources
- Combining elements from different sources can
result in name conflict - Namespaces allow the interpreter to resolve the
elements - Namespaces
- Declared within element start-tag using attribute
xmlns - Represented as an actual URI (since namespaces
are globally unique) - e.g. ltCollection xmlnsbook"http//www.mjyOnline.
com/books" - xmlnscdhttp//www.mjyOnline.com/booksgt
- Here book and cd are short hands for the full
namespace name - Default namespace is used if no other namespace
is defined - It does not have any prefix associated with it
15Namespaces Example
- lt?xml version"1.0"?gt
- lt!-- File Name Collection.xml --gt
- ltCOLLECTION
- xmlnsbook"http//www.mjyOnline.com/books"
- xmlnscd"http//www.mjyOnline.com/cds"gt
- ltITEM Status"in"gt
- ltTITLEgtThe Adventures of Huckleberry
Finnlt/bookTITLEgt - ltAUTHORgtMark Twainlt/bookAUTHORgt
- ltPRICEgt5.49lt/bookPRICEgt
- lt/ITEMgt
- ltITEM Status"in"gt
- ltTITLEgtThe Marble Faunlt/TITLEgt
- ltAUTHORgtNathaniel Hawthornelt/AUTHORgt
- ltPRICEgt10.95lt/PRICEgt
- lt/ITEMgt
- ltITEMgt
- ltITEM Status"out"gt
- ltTITLEgtLeaves of Grasslt/TITLEgt
- ltAUTHORgtWalt Whitmanlt/AUTHORgt
lt?xml version"1.0"?gt lt!-- File Name
Collection.xml --gt ltCOLLECTION ltITEMgt
ltTITLEgtViolin Concertos Numbers 1, 2, and
3lt/TITLEgt ltCOMPOSERgtMozartlt/COMPOSERgt
ltPRICEgt16.49lt/PRICEgt lt/ITEMgt ltTITLEgtViolin
Concerto in Dlt/TITLEgt ltCOMPOSERgtBeethovenlt/C
OMPOSERgt ltPRICEgt14.95lt/PRICEgt
lt/ITEMgt lt/COLLECTIONgt
Books and CDs are tracked in different files if
combined will lead to conflicts
16Namespaces Example
- lt?xml version"1.0"?gt
- lt!-- File Name Collection.xml --gt
- ltCOLLECTION
- xmlnsbook"http//www.mjyOnline.com/books"
- xmlnscd"http//www.mjyOnline.com/cds"gt
- ltbookITEM Status"in"gt
- ltbookTITLEgtThe Adventures of Huckleberry
Finnlt/bookTITLEgt - ltbookAUTHORgtMark Twainlt/bookAUTHORgt
- ltbookPRICEgt5.49lt/bookPRICEgt
- lt/bookITEMgt
- ltcdITEMgt
- ltcdTITLEgtViolin Concerto in Dlt/cdTITLEgt
- ltcdCOMPOSERgtBeethovenlt/cdCOMPOSERgt
- ltcdPRICEgt14.95lt/cdPRICEgt
- lt/cdITEMgt
- ltbookITEM Status"out"gt
- ltbookTITLEgtLeaves of Grasslt/bookTITLEgt
- ltbookAUTHORgtWalt Whitmanlt/bookAUTHORgt
- ltbookPRICEgt7.75lt/bookPRICEgt
ltcdITEMgt ltcdTITLEgtViolin Concertos
Numbers 1, 2, and 3lt/cdTITLEgt
ltcdCOMPOSERgtMozartlt/cdCOMPOSERgt
ltcdPRICEgt16.49lt/cdPRICEgt lt/cdITEMgt
ltbookITEM Status"out"gt ltbookTITLEgtThe
Legend of Sleepy Hollowlt/bookTITLEgt
ltbookAUTHORgtWashington Irvinglt/bookAUTHORgt
ltbookPRICEgt2.95lt/bookPRICEgt lt/bookITEMgt
ltbookITEM Status"in"gt ltbookTITLEgtThe
Marble Faunlt/bookTITLEgt ltbookAUTHORgtNathan
iel Hawthornelt/bookAUTHORgt
ltbookPRICEgt10.95lt/bookPRICEgt
lt/bookITEMgt lt/COLLECTIONgt
17Display XML Style Sheets
- A style sheet is a file that contains
instructions for rendering individual elements in
an XML document - Two kinds of style sheets exist
- Cascading Style Sheets (CSS)
- Extensible Stylesheet language (XSLT)
- Please refer to the following web site for
comprehensive information on style sheets - http//www.w3schools.com/css/default.asp
18Cascading Style SheetsExample
- lt?xml version"1.0"?gt
- lt!-- File Name Inventory01.xml --gt
- lt?xml-stylesheet type"text/css"
href"Inventory01.css"?gt - ltINVENTORYgt
- ltBOOKgt
- ltTITLEgtThe Adventures of Huckleberry
Finnlt/TITLEgt - ltAUTHORgtMark Twainlt/AUTHORgt
- ltBINDINGgtmass market paperbacklt/BINDINGgt
- ltPAGESgt298lt/PAGESgt
- ltPRICEgt5.49lt/PRICEgt
- lt/BOOKgt
- ltBOOKgt
- ltTITLEgtLeaves of Grasslt/TITLEgt
- ltAUTHORgtWalt Whitmanlt/AUTHORgt
- ltBINDINGgthardcoverlt/BINDINGgt
- ltPAGESgt462lt/PAGESgt
- ltPRICEgt7.75lt/PRICEgt
- lt/BOOKgt
ltBOOKgt ltTITLEgtThe Legend of Sleepy
Hollowlt/TITLEgt ltAUTHORgtWashington
Irvinglt/AUTHORgt ltBINDINGgtmass market
paperbacklt/BINDINGgt ltPAGESgt98lt/PAGESgt
ltPRICEgt2.95lt/PRICEgt lt/BOOKgt ltBOOKgt
ltTITLEgtThe Marble Faunlt/TITLEgt
ltAUTHORgtNathaniel Hawthornelt/AUTHORgt
ltBINDINGgttrade paperbacklt/BINDINGgt
ltPAGESgt473lt/PAGESgt ltPRICEgt10.95lt/PRICEgt
lt/BOOKgt ltBOOKgt ltTITLEgtMoby-Dicklt/TITLEgt
ltAUTHORgtHerman Melvillelt/AUTHORgt
ltBINDINGgthardcoverlt/BINDINGgt
ltPAGESgt724lt/PAGESgt ltPRICEgt9.95lt/PRICEgt
lt/BOOKgt lt/INVENTORYgt
19Cascading Style Sheets Example
- / File Name Inventory02.css /
- BOOK
- displayblock
- margin-top12pt
- font-size10pt
- TITLE
- displayblock
- font-size12pt
- font-weightbold
- font-styleitalic
- AUTHOR
- displayblock
- margin-left15pt
- font-weightbold
BINDING displayblock
margin-left15pt PAGES
displaynone PRICE displayblock
margin-left15pt
20Cascading Style Sheets Display
21Formal Languages/Grammars Basics
- A formal language is a set of strings
- It is characterized by a set of rules which
determine which strings are a part of the
language and which are not - In case of programming languages, programs which
compile are grammatical corret (others are not) - In a natural language, like English, correct
sentences follows rules of the English language
grammar - More precisely grammar a defines four things
- A vocabulary out of which the strings are
constructed (terminal symbols) - Vocabulary that is used to formulate grammar
rules (non terminal symbols) - Grammar rules (productions), each of which has a
lhs and a rhs - A designated start symbol
22Validated XML Document Basics
- An XML document is valid if it conforms to the
grammar of the language - Validity is different from well-formedness
- Two ways to specify the grammar of the language
- Document Type Definition (DTD)
- XML Schema
- Why bother with the language grammar
- It provides the blueprint of the language
- Ensures that the data is interchangable
- Eliminates processing errors in custom software
which expects a particular document content and
structure - Validity of the document is checked by using a
validator
23Document Type Declaration Basics
- Document type declaration is a block of XML
markup added to the prolog of the document - It has to follow the XML declaration
- It has to be outside of other markup language
- It defines the content and structure of the
language - Without a document type declaration or schema a
document is merely checked for well-formedness
and not validity - Why bother with the language grammar
- It provides the blueprint of the language
- Ensures that the data is interchangable
- Eliminates processing errors in custom software
which expects a particular document content and
structure - The form of a document type declaration is
- lt!DOCTYPE Name DTDgt
- DTD is document type definition
- Name specifies the name of the document element
24Document Type Definitions Basics
- Document type definition (DTD) consists of a
series of markup declarations enclosed in square
brackets - lt?xml version1.0 standaloneyes?gt
- lt!DOCTYPE GREETING
- lt!ELEMENT GREETING (PCDATA)gt
- gt
- ltGREETINGgt
- Hello XML!
- lt/GREETINGgt
- A DTD can also be stored separately from the XML
document and referenced in it.
25Document Type Definitions Syntax
- Element Type Declaration
- Syntax lt!Element Name contentspecgt
- Name is the name of the element
- contentspec is the content specification
- Example
- lt!Element Title (PCDATA)gt
- Content specification can have four types of
values - EMPTY content Element must not have content
- lt!Element Image EMPTYgt
- ANY Content Can contain any thing
- lt!Element misc ANYgt
- Element Content Child elements but no character
data - lt!DOCTYPE BOOK
- lt!ELEMENT BOOK (TITLE, AUTHOR)gt
- lt!ELEMENT TITLE (PCDATA)gt
- lt!ELEMENT AUTHOR (PCDATA)gt
- Mixed Content character data and child elements
interspersed
26Element Content Specification Types
- Content Specification indicates allowed child
elements and their order - If element has element content it can not contain
any character data - Types of content specifications
- Sequence Indicates that each element must have a
specific sequence of child elements - Example
- lt!Doctype Mountain
- lt!ELEMENT MOUNTAIN (NAME, HEIGHT, STATE)gt
- lt!ELEMENT NAME (PCDATA)
- lt!ELEMENT HEIGHT (PCDATA)
- lt!ELEMENT STATE (PCDATA)
- gt
- Valid XML
- ltMOUNTAINgt
- ltNAMEgtWheelerlt/NAMEgt
- ltHEIGHTgt13161lt/HEIGHTgt
- ltSTATEgtNew Mexicolt/STATEgt
- lt/MOUNTAINgt
27Element Content Specification Types
- Types of content specifications
- Choice Indicates that element can have one of a
series of child elements - Each element is separated by a sign
- Example
- lt!Doctype FILM
- lt!ELEMENT FILM (STAR NARRATOR INSTRUCTOR)gt
- lt!ELEMENT STAR (PCDATA)gt
- lt!ELEMENT NARRATOR (PCDATA)gt
- lt!ELEMENT INSTRUCTOR (PCDATA)gt
- gt
- Valid XML
- ltFILMgt
- ltSTARgtROBERT REDFORDlt/STARgt
- lt/FILMgt
- Invalid XML
- ltFILMgt
- ltNARRATORgtSir Gregory Parsloelt/NARRATORgt
- ltINSTRUCTORgtGalahad Threepwoodlt/INSTRUCTORgt
- lt/FILMgt
28Element Content Specification Number of Elements
- Specifying the number of elements allowed
- ? zero or one
- one or more
- zero or more
- Example
- lt!Doctype Mountain
- lt!ELEMENT MOUNTAIN (NAME, HEIGHT?, STATE)gt
- lt!ELEMENT NAME (PCDATA)
- lt!ELEMENT HEIGHT (PCDATA)
- lt!ELEMENT STATE (PCDATA)
- gt
- Valid XML
- ltMOUNTAINgt
- ltNAMEgtPeublo Peaklt/NAMEgt
- ltNAMEgtTaos Mountainlt/NAMEgt
- ltSTATEgtNew Mexicolt/STATEgt
- lt/MOUNTAINgt
29Element Content Specification Modification
- Modifying a group of elements
- Example
- lt!Doctype FILM
- lt!ELEMENT FILM (STAR NARRATOR
INSTRUCTOR)gt - lt!ELEMENT STAR (PCDATA)gt
- lt!ELEMENT NARRATOR (PCDATA)gt
- lt!ELEMENT INSTRUCTOR (PCDATA)gt
- gt
- Valid XML
- ltFILMgt
- ltNARRATORgtSir Gregory Parsloelt/NARRATORgt
- ltSTARgtROBERT REDFORDlt/STARgt
- ltNARRATORgtPLUG BASHMANlt/NARRATORgt
- lt/FILMgt
30Element Content Specification Nesting
- Nesting in specification
- Example
- lt!Doctype FILM
- lt!ELEMENT FILM TITLE, CLASS,(STAR
NARRATOR INSTRUCTOR)gt - lt!ELEMENT TITLE (PCDATA)gt
- lt!ELEMENT CLASS (PCDATA)gt
- lt!ELEMENT STAR (PCDATA)gt
- lt!ELEMENT NARRATOR (PCDATA)gt
- lt!ELEMENT INSTRUCTOR (PCDATA)gt
- gt
- Valid XML
- ltFILMgt
- ltTITLEgtThe Netlt/TITLEgt
- ltCLASSgtActionlt/CLASSgt
- ltSTARgtSandra Bullocklt/STARgt
- lt/FILMgt
31Element Content Specification Mixed Content Model
- Mixed Content Model Allows element to contain
- Character Data
- Child elements in any position and any frequency
(zero or more repetitions) - Child elements can be interspersed with data
- Character data only
- Example
- lt!ELEMENT TITLE (PCDATA)gt
- Character data and elements
- Example
- lt!ELEMENT TITLE (PCDATA SUBTITLE)gt
- lt!ELEMENT SUBTITLE (PCDATA)gt
- Valid XML
- ltTITLEgtMoby Dick ltSUBTITLEgtOr, The
Whalelt/SUBTITLEgtlt/TITLEgt - ltTITLEgtltSUBTITLEgtOr, The Whalelt/SUBTITLEgtMoby
Dicklt/TITLEgt
32Attribute Specification Basics
- All attributes in the document need to be
specified using an attribute declaration list. It
defines - Defines the name of the attribute
- Defines the data type of each attribute
- Specifies whether an attribute is required or noe
- Syntax lt!ATTLIST Name Attdefsgt
- Name is the name of the element
- Attdefs is a series of one or more attribute
definitions - Attribute definition Syntax Name AttType
DefaultDecl - Name is the attribute name
- AttType is the type of the attribute (CDATA,
Token Type, Enumerated) - DefaultDecl specifies if attribute is required
default values - Example
- lt!ELEMENT FILM (TITLE, (STAR NARRATOR
INSTRUCTOR))gt - lt!ATTLIST FILM Class CDATA fictional Year CDATA
REQUIREDgt
33Entity Specification Types
- There are two kinds of entities in XML documents1
- Character entities (referred by character unicode
number) - Named entities, referred to by name
34XML Parsing Definition and Types
- An XML parser is a program that reads an XML
document and makes its contents available for
processing - There are two standard types of parsers for XML
- Document Object Model (DOM) which makes the
document available as a tree - Simple XML Parser (SAX) which associates an event
with each tag and each block of text - XML parsers are available from many vendors
- Each vendor conforms to the standardized XML
interfaces - One of the best parsers is the xerces parser
- Suns API for XML parsing is JAXP (supports basic
classes and interfaces that a Java XML parser
should support) - Often SAX parsers are used for writing DOM parsers
35SAX Parser Basics
- As the parser scans the document it sends
notifications of events, for instance - Element start
- Element end
- Character sequence between two elements is found
- SAX provides standard names for these callback
functions that are triggerd by these events - void characters (char ch, int start, int
length) notification of character data - void startDocument() notification of start of
document - void endDocument() notification of end of
document - void startElement(String name, AttributeList
atts) notification of start of element - void endElement(String name) notification of end
of element - void processingInstruction(String target, String
data) notification of processing instruction
36SAX Parser Example
- From professional JSP page 658
37XSLT Parser Definition and Uses
- XSLT is an XML structure transforming language
- Any treee transforming language needs an ability
to refer to tree paths - Xpath is the sub-language underneath XSLT for
tree path description - There are two scenarios for use of XSLT
- Browser contains an XSLT and uses it to render
XML documents - XSLT is used for changing the structure of an
existing XML document - To run XSLT the following components are
required - Java 1.4 standard development kit
- James Clarks xt (xt.jar)
38XSLT Parser Basics
- XSLT style sheet is an XML document
- Consists of two parts
- Standard XML declaration including namespace
declaractions - Top level elements that set up the general
framework for the output, e.g., variables or
import parameters from the command line - Processing involves the following
- A current list of nodes from the source document
is created by matching a pattern - Output to the current node is generated by
instantiating a template corresponding the
current pattern - In process of transformation new nodes can be
added to the list - The processing begins by processing a list
containing the entire document - Transformation ends when the node list is empty
39XSLT Parser Example
40Web Services
41Web Services Definition
- Web Services are software programs that use XML
to exchange information with other software
programs via common Internet protocols. - Web services communicate over the network to
provide specific methods that other applications
can invoke. - Thus applications residing on different computer
can work synergistically by invoking methods on
each other - Http is the key protocol used for Web Services.
- Characteristics
- Programmable
- Encapsulate a task
- XML based data exchange allows programs on
heterogenous platforms to communicate (SOAP) - Self-describing (WSDL)
- Discoverable (UDDI)
42Web Services SOAP
- SOAP Simple Object Access Protocol
- Enables data transfer between systems distributed
over a network - A SOAP method send to the a Web Service invokes a
method provided by the service - Web Service may return the result via another
SOAP message - SOAP consists of standardized XML schemas
- Defines a format for transmitting XML messages
over network - Includes data types and message structure
- Layered over an Internet protocol, such as HTTP
and can be used to transfer data across the Web
and other networks - Http allows message transfer across firewall
since Http messages are usually accepted by
firewalls
43Web Services SOAP
- SOAP message consists of three parts
- Envelope
- Header
- Body
- Envelope wraps the entire message and contains
header and body - Header (optional) provides information on
security and routing - Body contains application specific data that is
being transferred - Other alternative to SOAP are XML-RPC
- SOAP de facto standard due to simplicity,
extensibility and interoperability
44Web Services WSDL
- WSDL Web Services Description Language
- Provides means to provide information about a web
service - Instructions of its use
- Capability of the service
- Provides information on connection to the service
and communicate - Syntax is fairly complex
- Normally created using automated tools
- Not important to understand the precise syntax of
WSDL while developing web services
45Web Services UDDI
- UDDI Universal Description, Discovery and
Integration - Allows developers and businesses to publish and
locate web services on a network via use of
registries - The registries can be made private or public
- Structure similar to a phone book
- White pages contain contact information and
textual description - Yellow pages provides classification information
about companies and details of companys
electronic capability - Green pages list technical data relating to
services and business processes