Title: Standards in Information Management: XML
1Standards in Information Management XML
2Learning Objectives
- Learn what XML is
- Learn the various ways in which XML is used
- Learn the key companion technologies
- See how XML is being used in industry as a
meta-language
3Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
4OverviewWhat is XML?
- A tag-based meta language
- Designed for structured data representation
- Represents data hierarchically (in a tree)
- Provides context to data (makes it meaningful)
- Self-describing data
- Separates presentation (HTML) from data (XML)
- An open W3C standard
- A subset of SGML
- vs. HTML, which is an implementation of SGML
5OverviewWhat is XML?
- XML is a use everywhere data specification
XML
XML
XML
XML
6OverviewDocuments vs. Data
- XML is used to represent two main types of
things - Documents
- Lots of text with tags to identify and annotate
portions of the document - Data
- Hierarchical data structures
7OverviewXML and Structured Data
- Pre-XML representation of data
- XML representation of the same data
PO-1234,CUST001,X9876,5,14.98
ltPURCHASE_ORDERgt ltPO_NUMgt PO-1234
lt/PO_NUMgt ltCUST_IDgt CUST001 lt/CUST_IDgt ltITEM_NUM
gt X9876 lt/ITEM_NUMgt ltQUANTITYgt 5
lt/QUANTITYgt ltPRICEgt 14.98 lt/PRICEgt lt/PURCHASE_ORD
ERgt
8OverviewBenefits of XML
- Open W3C standard
- Representation of data across heterogeneous
environments - Cross platform
- Allows for high degree of interoperability
- Strict rules
- Syntax
- Structure
- Case sensitive
9OverviewWho Uses XML?
- Submissions by
- Microsoft
- IBM
- Hewlett-Packard
- Fujitsu Laboratories
- Sun Microsystems
- Netscape (AOL), and others
- Technologies using XML
- SOAP, ebXML, BizTalk, WebSphere, many others
10Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
11Syntax and StructureComponents of an XML Document
- Elements
- Each element has a beginning and ending tag
- ltTAG_NAMEgt...lt/TAG_NAMEgt
- Elements can be empty (ltTAG_NAME /gt)
- Attributes
- Describes an element e.g. data type, data range,
etc. - Can only appear on beginning tag
- Processing instructions
- Encoding specification (Unicode by default)
- Namespace declaration
- Schema declaration
12Syntax and StructureComponents of an XML Document
- lt?xml version1.0 ?gt
- lt?xml-stylesheet type"text/xsl
hreftemplate.xsl"?gt - ltROOTgt
- ltELEMENT1gtltSUBELEMENT1 /gtltSUBELEMENT2
/gtlt/ELEMENT1gt - ltELEMENT2gt lt/ELEMENT2gt
- ltELEMENT3 typestringgt lt/ELEMENT3gt
- ltELEMENT4 typeinteger value9.3gt
lt/ELEMENT4gt - lt/ROOTgt
Elements with Attributes
Elements
Prologue (processing instructions)
13Syntax and StructureRules For Well-Formed XML
- There must be one, and only one, root element
- Sub-elements must be properly nested
- A tag must end within the tag in which it was
started - Attributes are optional
- Defined by an optional schema
- Attribute values must be enclosed in or
- Processing instructions are optional
- XML is case-sensitive
- lttaggt and ltTAGgt are not the same type of element
14Syntax and StructureWell-Formed XML?
- No, CHILD2 and CHILD3 do not nest properly
ltxml? Version1.0 ?gt ltPARENTgt ltCHILD1gtThis is
element 1lt/CHILD1gt ltCHILD2gtltCHILD3gtNumber
3lt/CHILD2gtlt/CHILD3gt lt/PARENTgt
15Syntax and StructureWell-Formed XML?
- No, there are two root elements
ltxml? Version1.0 ?gt ltPARENTgt ltCHILD1gtThis is
element 1lt/CHILD1gt lt/PARENTgt ltPARENTgt ltCHILD1gtThi
s is another element 1lt/CHILD1gt lt/PARENTgt
16Syntax and StructureWell-Formed XML?
ltxml? Version1.0 ?gt ltPARENTgt ltCHILD1gtThis is
element 1lt/CHILD1gt ltCHILD2/gt ltCHILD3gtlt/CHILD3gt lt
/PARENTgt
17Syntax and StructureAn XML Document
lt?xml version'1.0'?gt ltbookstoregt ltbook
genreautobiography publicationdate1981
ISBN1-861003-11-0gt lttitlegtThe
Autobiography of Benjamin Franklinlt/titlegt
ltauthorgt ltfirst-namegtBenjaminlt/first-namegt
ltlast-namegtFranklinlt/last-namegt
lt/authorgt ltpricegt8.99lt/pricegt lt/bookgt
ltbook genrenovel publicationdate1967
ISBN0-201-63361-2gt lttitlegtThe Confidence
Manlt/titlegt ltauthorgt ltfirst-namegtHermanlt
/first-namegt ltlast-namegtMelvillelt/last-namegt
lt/authorgt ltpricegt11.99lt/pricegt
lt/bookgt lt/bookstoregt
18Syntax and Structure Namespaces Overview
- Part of XMLs extensibility
- Allow authors to differentiate between tags of
the same name (using a prefix) - Frees author to focus on the data and decide how
to best describe it - Allows multiple XML documents from multiple
authors to be merged - Identified by a URI (Uniform Resource Identifier)
- When a URL is used, it does NOT have to represent
a live server
19Syntax and Structure Namespaces Declaration
Namespace declaration examples
xmlns bk http//www.example.com/bookinfo/
xmlns bk urnmybookstuff.orgbookinfo
xmlns bk http//www.example.com/bookinfo/
Namespace declaration
Prefix
URI (URL)
20Syntax and Structure Namespaces Examples
ltBOOK xmlnsbkhttp//www.bookstuff.org/bookinfo
gt ltbkTITLEgtAll About XMLlt/bkTITLEgt
ltbkAUTHORgtJoe Developerlt/bkAUTHORgt ltbkPRICE
currencyUS Dollargt19.99lt/bkPRICEgt
ltbkBOOK xmlnsbkhttp//www.bookstuff.org/bookin
fo xmlnsmoneyurnfinancemoneygt
ltbkTITLEgtAll About XMLlt/bkTITLEgt
ltbkAUTHORgtJoe Developerlt/bkAUTHORgt ltbkPRICE
moneycurrencyUS Dollargt
19.99lt/bkPRICEgt
21Syntax and Structure Namespaces Default
Namespace
- An XML namespace declared without a prefix
becomes the default namespace for all
sub-elements - All elements without a prefix will belong to the
default namespace
ltBOOK xmlnshttp//www.bookstuff.org/bookinfogt
ltTITLEgtAll About XMLlt/TITLEgt ltAUTHORgtJoe
Developerlt/AUTHORgt
22Syntax and Structure Namespaces Scope
- Unqualified elements belong to the inner-most
default namespace. - BOOK, TITLE, and AUTHOR belong to the default
book namespace - PUBLISHER and NAME belong to the default
publisher namespace
ltBOOK xmlnswww.bookstuff.org/bookinfogt
ltTITLEgtAll About XMLlt/TITLEgt ltAUTHORgtJoe
Developerlt/AUTHORgt ltPUBLISHER
xmlnsurnpublisherspublinfogt
ltNAMEgtMicrosoft Presslt/NAMEgt
lt/PUBLISHERgt lt/BOOKgt
23Syntax and Structure Namespaces Attributes
- Unqualified attributes do NOT belong to any
namespace - Even if there is a default namespace
- This differs from elements, which belong to the
default namespace
24Syntax and Structure Entities
- Entities provide a mechanism for textual
substitution, e.g. -
- You can define your own entities
- Parsed entities can contain text and markup
- Unparsed entities can contain any data
- JPEG photos, GIF files, movies, etc.
25Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
26The XML Alphabet Soup
- XML itself is fairly simple
- Most of the learning curve is knowing about all
of the related technologies
27The XML Alphabet Soup
28The XML Alphabet Soup
29The XML Alphabet Soup Schemas Overview
- DTD (Document Type Definitions)
- Not written in XML
- No support for data types or namespaces
- XSD (XML Schema Definition)
- Written in XML
- Supports data types
- Current standard recommended by W3C
30The XML Alphabet Soup Schemas Purpose
- Define the rules (grammar) of the document
- Data types
- Value bounds
- A XML document that conforms to a schema is said
to be valid - More restrictive than well-formed XML
- Define which elements are present and in what
order - Define the structural relationships of elements
31The XML Alphabet Soup Schemas DTD Example
ltBOOKgt ltTITLEgtAll About XMLlt/TITLEgt
ltAUTHORgtJoe Developerlt/AUTHORgt lt/BOOKgt
lt!DOCTYPE BOOK lt!ELEMENT BOOK (TITLE, AUTHOR)
gt lt!ELEMENT TITLE (PCDATA) gt lt!ELEMENT
AUTHOR (PCDATA) gt gt
32The XML Alphabet Soup Schemas XSD Example
ltCATALOGgt ltBOOKgt ltTITLEgtAll About
XMLlt/TITLEgt ltAUTHORgtJoe Developerlt/AUTHORgt
lt/BOOKgt lt/CATALOGgt
33The XML Alphabet Soup Schemas XSD Example
ltxsdschema id"NewDataSet targetNamespace"http
//tempuri.org/schema1.xsd" xmlns"http//tempu
ri.org/schema1.xsd" xmlnsxsd"http//www.w3.o
rg/1999/XMLSchema" xmlnsmsdata"urnschemas-mi
crosoft-comxml-msdata"gt ltxsdelement
name"book"gt ltxsdcomplexType
content"elementOnly"gt ltxsdallgt
ltxsdelement name"title" minOccurs"0"
type"xsdstring"/gt ltxsdelement
name"author" minOccurs"0" type"xsdstring"/gt
lt/xsdallgt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement nameCatalog"
msdataIsDataSet"True"gt ltxsdcomplexTypegt
ltxsdchoice maxOccurs"unbounded"gt
ltxsdelement ref"book"/gt lt/xsdchoicegt
lt/xsdcomplexTypegt lt/xsdelementgt lt/xsdschemagt
34The XML Alphabet Soup Schemas Why You Should
Use XSD
- Newest W3C Standard
- Broad support for data types
- Reusable components
- Simple data types
- Complex data types
- Extensible
- Inheritance support
- Namespace support
- Ability to map to relational database tables
- XSD support in Visual Studio.NET
35The XML Alphabet Soup Transformations XSL
- Language for expressing document styles
- Specifies the presentation of XML
- More powerful than CSS
- Consists of
- XSLT
- XPath
- XSL Formatting Objects (XSL-FO)
36The XML Alphabet Soup Transformations Overview
- XSLT a language used to transform XML data into
a different form (commonly XML or HTML)
XML
XML,HTML,
XSLT
37The XML Alphabet Soup Transformations XSLT
- The language used for converting XML documents
into other forms - Describes how the document is transformed
- Expressed as an XML document (.xsl)
- Template rules
- Patterns match nodes in source document
- Templates instantiated to form part of result
document - Uses XPath for querying, sorting, etc.
38The XML Alphabet Soup XPath (XML Path Language)
- General purpose query language for identifying
nodes in an XML document - Declarative (vs. procedural)
- Contextual the results depend on current node
- Supports standard comparison, Boolean and
mathematical operators (, lt, and, or, , , etc.)
39The XML Alphabet Soup XPath Operators
40The XML Alphabet Soup XPath Query Examples
./author (finds all author elements within
current context) /bookstore (find the bookstore
element at the root) / (find the root
element) //author (find all author elements
anywhere in document) /bookstore_at_specialty
textbooks (find all bookstores where the
specialty attribute
textbooks) /book_at_style /bookstore/_at_specialty
(find all books where the style
attribute the specialty attribute
of the bookstore element at the root)
41More XPath Examples
42XPath Functions
- Accessor functions
- node-name, data, base-uri, document-uri
- Numeric value functions
- abs, ceiling, floor, round,
- String functions
- compare, concat, substring, string-length,
uppercase, lowercase, starts-with, ends-with,
matches, replace, - Other functions include functions on boolean
values, dates, nodes, etc.
43The XML Alphabet Soup Data Islands
- XML embedded in an HTML document
- Manipulated via client side script or data binding
ltXML idXMLIDgt ltBOOKgt ltTITLEgtAll About
XMLlt/TITLEgt ltAUTHORgtJoe Developerlt/AUTHORgt
lt/BOOKgt lt/XMLgt ltXML idXMLID
srcmydocument.xmlgt
44The XML Alphabet Soup Data Islands
- Can be embedded in an HTML SCRIPT element
- XML is accessible via the DOM
ltSCRIPT languagexml idXMLIDgt ltSCRIPT
typetext/xml idXMLIDgt ltSCRIPT
languagexml idXMLID
srcmydocument.xmlgt
45The XML Alphabet Soup XML-Based Applications
- Microsoft SQL Server
- Retrieve relational data as XML
- Query XML data
- Join XML data with existing database tables
- Update the database via XML Updategrams
- New XML data type in SQL 2005
- Microsoft Exchange Server
- XML is native representation of many types of
data - Used to enhance performance of UI scenarios (for
example, Outlook Web Access (OWA))
46Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
47XML as a Meta-Language
A Language to create Languages
CSS
SAX
DOM
DSSL
XSL
XML/DTD
XLL
XSLT
GO
XSchema
CML
XPath
MathML
WML
XPointer
BeanML
XQL
48Gene Ontology (GO)
- Describing and manipulating information about the
molecular function, biological process and
cellular component of gene products. - Gene Ontology website
- http//www.geneontology.org
- GO DTD
- ftp//ftp.geneontology.org/pub/go/xml/dtd/go.dtd
- GO Browsers and tools
- http//www.geneontology.org/tools
- GO Resources and samples
- http//www.geneontology.org/annotations
49Math ML
- Describing and manipulating mathematical
notations - MathML website
- www.w3.org/Math
- MathML DTD
- www.w3.org/Math/DTD
- MathML Browser
- www.w3.org/Amaya
- MathML Resources
- www.webeq.com/mathml see sample documents here
50Chemical ML
- Representing molecular and chemical information
- CML website
- www.xml-cml.org
- CML DTD
- www.xml-cml.org/dtdschema/index.html
- CML Browser and Authoring Environment
- www.xml-cml.org/jumbo.html
- CML Resources
- www.xml-cml.org/chimeral/index.html
- see sample documents here
- some require plug-in downloads, can be slow
51Wireless ML
- Allows web pages to be displayed over mobile
devices - WML works with WAP to deliver the content
- Underlying model Deck of Cards that the User can
sift through - WAP/WML website
- www.wapforum.org
- WML DTD
- www.wapforum.org/DTD/wml_1.1.xml
- WAP/WML Resources
- www.oasis-open.org/cover/wap-wml.html
- www.w3scripts.com/wap Tutorial on WML, also see
WAP Demo
52Scalable Vector Graphics
- Describing vector graphics data for use over the
web - Rendering is done on the browser
- Bandwidth demands lower, scaling easier
- SVG website
- www.w3.org/Graphics/SVG
- SVG Plug-Ins
- www.adobe.com/svg
- SVG Resources
- www.irt.org/articles/js176 1999 article and good,
brief tutorial - planet.svg An Example from Deitel
53Bean ML
- Describing software components such as Java Beans
- Defines how the components are interconnected and
can be used - Bean ML Specs and Tools
- www.alphaworks.ibm.com/aw.nsf/techmain/bml
- Bean ML Resources
- www.oasis-open.org/cover/beanML.html
- With Bean ML
- You can mark-up beans using Bean ML
- And invoke different operations on Beans
- Includes BML Scripting Framework
54XBRL
- Extensible Business Reporting Language
- Capturing and representing financial and
accounting information - Variety of situations
- e.g. publishing reports, extracting data for
analysis, regulatory forms etc. - Initiated under the direction of AICPA
- XBRL website
- www.xbrl.org
- XBRL DTDs and Schemas
- http//www.xbrl.org/Core/2000-07-31/default.htm
- Demos and Tools
- http//www.xbrl.org/Demos/demos.htm
- http//www.xbrl.org/Tools.htm
55News ML
- Designed to be media-independent
- Initiated by International Press
Telecommunications Council - Enables tracking of news stories over time
- NewsML website
- www.newsml.org
- NewsML DTD
- http//www.oasis-open.org/cover/newsML.html
- SportsML DTD Derived from NewsML DTD
- http//xml.coverpages.org/sportsML.html
56cXML
- CommerceXML from Ariba plus 40 other companies
- cXML website
- www.cxml.org
- Primary Set of Tools/Implementations to support
cXML - http//www.ariba.com/solutions/solutions_overview.
cfm - See also Whitepapers link explaining how these
can be used for - E-procurement
- E-fulfillment
- And others ..
57xCBL
- xCBL from Microsoft, SAP, Sun
- xCBL website
- www.xcbl.org
- Marketed as XML component library for B2B
e-commerce - Available Resources (see internal links)
- DTDs and Schemas
- XDK SOX Parser and an XSLT Engine
- Example Documents
58ebXML
- UN/CEFACT the United Nations body whose mandate
covers worldwide policy and technical development
in the area of trade facilitation and electronic
business. - www.uncefact.org
- ebXML website
- www.ebxml.org
- Current Endorsements
- http//www.ebxml.org/endorsements.htm
- Still needs buy-in from the larger IS/IT vendors
- Related Effort RosettaNet
- http//www.rosettanet.org/rosettanet/Rooms/Display
Pages/LayoutInitial - Business Processes for IT, Component and Chip
companies
59Conclusion
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
60Resources
- http//www.xml.com/
- http//www.w3.org/xml/
- http//www.w3schools.com/
- http//msdn.microsoft.com/xml/