Title: Fundamental XML for Developers
1Fundamental XML for Developers
- Dr. Timothy M. Chester
- Texas AM University
2Timothy M. Chester is. . .
- Senior IT Manager, Texas AM University
- Application Development, Systems Integration,
Developer Tools Training - Lecturer, Texas AM College of Business
- Courses on Business Programming Fundamentals
(VB.NET, C), XML Advanced Web Development. - Author
- Visual Studio Magazine, Dr. Dobbs Journal, IT
Professional - Consultant
- President Principal, eInternet Studios
- Contact Information
- E-mail tim-chester_at_tamu.edu
- Web http//tim-chester.tamu.edu
3Texas AM University
4You Are. . .
- Software Developers
- New to XML, Object Oriented Development
- Require basics of XML course
- IT Managers
- Need familiarity with XML basics and terminology
- Interested in how XML can affect both software
development and legacy system integration
5This session . . .
- Assumes you know nothing about XML or XML based
technologies - Provides a basic introduction to XML based
technologies - Demonstrates some of the basics of working with
the DOM, XSLT, Schema, WSDL, and SOAP.
6Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
7Underlying Technologies XML Is the Glue
XML
HTML
TCP/IP
Technology
Connecting Applications
Connectivity
Presentation
FTP, E-mail, Gopher
Innovation
Web Pages
Connect the Web
Web Services
Browse the Web
Program the Web
8Evolution of Web
9Web Services Overview Application Model
Partner Web Service
Other Web Services
Partner Web Service
Internet XML
YourCompany.com
End Users
Application Business Logic Tier
Data Access and Storage Tier
Other Applications
10Introducing XML
- XML stands for Extensible Markup Language. A
markup language specifies the structure and
content of a document. - Because it is extensible, XML can be used to
create a wide variety of document types.
11Introducing XML
- XML is a subset of a the Standard Generalized
Markup Language (SGML) which was introduced in
the 1980s. SGML is very complex and can be
costly. - These reasons led to the creation of Hypertext
Markup Language (HTML), a more easily used markup
language. XML can be seen as sitting between SGML
and HTML easier to learn than SGML, but more
robust than HTML.
12The Limits of HTML
- HTML was designed for formatting text on a Web
page. It was not designed for dealing with the
content of a Web page. Additional features have
been added to HTML, but they do not solve data
description or cataloging issues in an HTML
document. - Because HTML is not extensible, it cannot be
modified to meet specific needs. Browser
developers have added features making HTML more
robust, but this has resulted in a confusing mix
of different HTML standards.
13Introducing XML
- HTML cannot be applied consistently. Different
browsers require different standards making the
final document appear differently on one browser
compared with another.
14Introduction to XML Markup
- XML document (intro.xml)
- Marks up message as XML
- Commonly stored in text files
- Extension .xml
15(No Transcript)
16Introduction to XML Markup (cont.)
- XML documents
- Must contain exactly one root element
- Attempting to create more than one root element
is erroneous - Elements must be nested properly
- Incorrect ltxgtltygthellolt/xgtlt/ygt
- Correct ltxgtltygthellolt/ygtlt/xgt
- Must be well-formed
17XML Parsers
- An XML processor (also called XML parser)
evaluates the document to make sure it conforms
to all XML specifications for structure and
syntax. - XML parsers are strict. It is this rigidity built
into XML that ensures XML code accepted by the
parser will work the same everywhere.
18XML Parsers
- Microsofts parser is called MSXML and is built
directly in IE versions 5.0 and above. - Netscape developed its own parser, called
Mozilla, which is built into version 6.0 and
above.
19Parsers and Well-formed XML Documents (cont.)
- XML parsers support
- Document Object Model (DOM)
- Builds tree structure containing document data in
memory - Simple API for XML (SAX)
- Generates events when tags, comments, etc. are
encountered - (Events are notifications to the application)
20Parsing an XML Document with MSXML
- XML document
- Contains data
- Does not contain formatting information
- Load XML document into Internet Explorer 5.0
- Document is parsed by msxml.
- Places plus () or minus (-) signs next to
container elements - Plus sign indicates that all child elements are
hidden - Clicking plus sign expands container element
- Displays children
- Minus sign indicates that all child elements are
visible - Clicking minus sign collapses container element
- Hides children
- Error generated, if document is not well formed
21XML document shown in IE6.
22Character Set
- XML documents may contain
- Carriage returns
- Line feeds
- Unicode characters
- Enables computers to process characters for
several languages
23Characters vs. Markup
- XML must differentiate between
- Markup text
- Enclosed in angle brackets (lt and gt)
- e.g,. Child elements
- Character data
- Text between start tag and end tag
- Welcome to XML!
- Elements versus Attributes
24White Space, Entity References and Built-in
Entities
- Whitespace characters
- Spaces, tabs, line feeds and carriage returns
- Significant (preserved by application)
- Insignificant (not preserved by application)
- Normalization
- Whitespace collapsed into single whitespace
character - Sometimes whitespace removed entirely
- ltmarkupgtThis is character datalt/markupgt
- after normalization, becomes
- ltmarkupgtThis is character datalt/markupgt
25White Space, Entity References and Built-in
Entities (cont.)
- XML-reserved characters
- Ampersand ()
- Left-angle bracket (lt)
- Right-angle bracket (gt)
- Apostrophe ()
- Double quote ()
- Entity references
- Allow to use XML-reserved characters
- Begin with ampersand () and end with semicolon
() - Prevents from misinterpreting character data as
markup
26White Space, Entity References and Built-in
Entities (cont.)
- Build-in entities
- Ampersand (amp)
- Left-angle bracket (lt)
- Right-angle bracket (gt)
- Apostrophe (apos)
- Quotation mark (quot)
- Mark up characters ltgt in element message
- ltmessagegtltgtamplt/messagegt
27Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
28Introduction
- XML Document Object Model (DOM)
- Build tree structure in memory for XML documents
- DOM-based parsers parse these structures
- Exist in several languages (Java, C, C, Python,
Perl, C, VB.NET, VB, etc)
29Introduction
- DOM tree
- Each node represents an element, attribute, etc.
- lt?xml version "1.0"?gtltmessage from "Paul"
to "Tem"gt ltbodygtHi, Tim!lt/bodygtlt/messagegt - Node created for element message
- Element message has child node for body element
- Element body has child node for text "Hi, Tim!"
- Attributes from and to also have nodes in tree
30DOM Implementations
- DOM-based parsers
- Microsofts msxml
- Microsoft.NET System.Xml Namspace
- Sun Microsystems JAXP
31Creating Nodes
- Create XML document at run time
32Traversing the DOM
- Use DOM to traverse XML document
- Output element nodes
- Output attribute nodes
- Output text nodes
33DOM Components
34Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
35Introduction
- XML Path Language (XPath)
- Syntax for locating information in XML document
- e.g., attribute values
- String-based language of expressions
- Not structural language like XML
- Used by other XML technologies
- XSLT
36Nodes
- XML document
- Tree structure with nodes
- Each node represents part of XML document
- Seven types
- Root
- Element
- Attribute
- Text
- Comment
- Processing instruction
- Namespace
- Attributes and namespaces are not children of
their parent node - They describe their parent node
37XPath node types
38XPath node types. (Part 2)
39Location Paths
- Location path
- Expression specifying how to navigate XPath tree
- Composed of location steps
- Each location step composed of
- Axis
- Node test
- Predicate
40Axes
- XPath searches are made relative to context node
- Axis
- Indicates which nodes are included in search
- Relative to context node
- Dictates node ordering in set
- Forward axes select nodes that follow context
node - Reverse axes select nodes that precede context
node
41Node Tests
- Node tests
- Refine set of nodes selected by axis
- Rely upon axis principle node type
- Corresponds to type of node axis can select
42Node-set Operators and Functions (cont.)
- Location-path expressions
- Combine node-set operators and functions
- Select all head and body children element nodes
- head body
- Select last bold element node in head element
node - head/title last()
- Select third book element
- book position() 3
- Or alternatively
- book 3
- Return total number of element-node children
- count( )
- Select all book element nodes in document
- //book
43Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
44Introduction
- Extensible Stylesheet Language (XSL)
- Used to format XML documents
- Consist of two parts
- XSL Transformation Language (XSLT)
- Transform XML document from one form to another
- Use XPath to match nodes
- XSL formatting objects
- Alternative to CSS
45Setup
- XSLT processor
- Microsoft Internet Explorer 6
- Java 2 Standard Edition
- Microsoft.NET System.Xml Namespace
46Templates
- XSLT document
- XML document with root element stylesheet
- template element
- Matches specific XML document nodes
- Uses XPath expression in attribute match
47Templates (cont.)
- XSLT
- Two trees of nodes
- Source tree corresponds to original XML document
- Result tree contains nodes produced by
transformation - Transforms intro.xml into HTML document
48Iteration and Sorting
- XSLT allows
- Iteration through node set
- Element for-each
- Sorting node set
- Element sort
- Attribute ascending (i.e., A-Z)
- Attribute descending (i.e., Z-A)
49Conditional Processing
- Perform conditional processing
- Such as if statement
- Use element choose
- Allows alternate conditional statements
- Similar to switch statement
- Has child elements when and otherwise
- when element content used if condition is met
- otherwise element content used if no conditions
in when condition are met
50XSLT and XPath
- XPath Expression
- locates elements, attributes and text in XML
document
51Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
52Working with Namespaces
- Name collision occurs when elements from two or
more documents share the same name. - Name collision isnt a problem if you are not
concerned with validation. The document content
only needs to be well-formed. - However, name collision will keep a document from
being validated.
53Name Collision
- This figure shows two documents each with a Name
element
54Using Namespaces to Avoid Name Collision
This figure shows how to use a namespace to avoid
collision
55Declaring a Namespace
- A namespace is a defined collection of element
and attribute names. - Names that belong to the same namespace must be
unique. Elements can share the same name if they
reside in different namespaces. - Namespaces must be declared before they can be
used.
56Declaring a Namespace
- A namespace can be declared in the prolog or as
an element attribute. The syntax to declare a
namespace in the prolog is - lt?xmlnamespace nsURI prefixprefix?gt
- Where URI is a Uniform Resource Identifier that
assigns a unique name to the namespace, and
prefix is a string of letters that associates
each element or attribute in the document with
the declared namespace.
57Declaring a Namespace
- For example,
- lt?xmlnamespace nshttp//uhosp/patients/ns
prefixpatgt - Declares a namespace with the prefix pat and
the URI http//uhosp/patients/ns. - The URI is not a Web address. A URI identifies a
physical or an abstract resource.
58(No Transcript)
59(No Transcript)
60Schemas
- A schema is an XML document that defines the
content and structure of one or more XML
documents. - To avoid confusion, the XML document containing
the content is called the instance document. - It represents a specific instance of the
structure defined in the schema.
61Comparing Schemas and DTDs
- This figure compares schemas and DTDs
62Schema Dialects
- There is no single schema form.
- Several schema dialects have been developed in
the XML language. - Support for a particular schema depends on the
XML parser being used for validation.
63Starting a Schema File
- A schema is always placed in a separate XML
document that is referenced by the instance
document.
64Schema Types
- XML Schema recognize two categories of element
types complex and simple. - A complex type element has one or more
attributes, or is the parent to one or more child
elements. - A simple type element contains only character
data and has no attributes.
65Schema Types
- This figure shows types of elements
66Understanding Data Types
- XML Schema supports two data types built-in and
user-derived. - A built-in data type is part of the XML Schema
specifications and is available to all XML Schema
authors. - A user-derived data type is created by the XML
Schema author for specific data values in the
instance document.
67Understanding Data Types
- A primitive data type, also called a base type,
is one of 19 fundamental data types not defined
in terms of other types. - A derived data type is a collection of 25 data
types that the XML Schema developers created
based on the 19 primitive types.
68Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
69WSDL
- Think "TypeLib for SOAP"
- WSDL Web Service Description Language
- Uniform representation for services
- Transport Protocol neutral
- Access Protocol neutral (not only SOAP)
- Describes
- Schema for Data Types
- Call Signatures (Message)
- Interfaces (Port Types)
- Endpoint Mappings (Bindings)
- Endpoints (Services)
70UDDI
- Think "Yahoo!" for WebServices
- Universal Description and Discovery Interface
- WebService-Programmable "Yellow Pages"
- Advertise Sites and Services
- May point to DISCO resources
- Initiative driven by Microsoft, IBM, Ariba
71Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
72SOAP Overview
- A lightweight protocol for exchanging information
in a distributed, heterogeneous environment - It enables cross-platform interoperability
- Interoperable
- OS, object model, programming language neutral
- Hardware independent
- Protocol independent
- Works over existing Internet infrastructure
73SOAP Overview
- Guiding principle Invent no new technology
- Builds on key Internet standards
- SOAP HTTP XML
- Submitted to W3C
- The SOAP specification defines
- The SOAP message format
- How to send messages
- How to receive responses
- Data encoding
74SOAP SOAP Is Not
- Objects-by-reference
- Distributed garbage collection
- Bi-directional HTTP
- Activation
- Complicated
- Doesnt try to solve every problem in distributed
computing - Can be easily implemented
75SOAPThe HTTP Aspect
- SOAP requests are HTTP POST requests
POST /WebCalculator/Calculator.asmx
HTTP/1.1 Content-Type text/xml SOAPAction
http//tempuri.org/Add Content-Length
386 lt?xml version1.0?gt ltsoapEnvelope ...gt
... lt/soapEnvelopegt
76SOAPMessage Structure
The complete SOAP message
SOAP Message
Headers
Protocol binding headers
ltEnvelopegt encloses payload
SOAP Envelope
ltHeadergt encloses headers
SOAP Header
Individual headers
Headers
ltBodygt contains SOAP message name
SOAP Body
Message Name Data
XML-encoded SOAP message name data
77SOAPSOAP Message Format
- An XML document using the SOAP schema
lt?xml version1.0?gt ltsoapEnvelope ...gt
ltsoapHeader ...gt ... lt/soapHeadergt
ltsoapBodygt ltAdd xmlnshttp//tempuri.org/gt
ltn1gt12lt/n1gt ltn2gt10lt/n2gt lt/Addgt
lt/soapBodygt lt/soapEnvelopegt
78SOAPServer Responses
- Server replies with a result message
HTTP/1.1 200 OK ... Content-Typetext/xml Content-
Length 391 lt?xml version1.0?gt ltsoapEnvelope
...gt ltsoapBodygt ltAddResult
xmlnshttp//tempuri.org/gt
ltresultgt28.6lt/resultgt lt/AddResultgt
lt/soapBodygt lt/soapEnvelopegt
79SOAPIndustry Support
- Microsoft
- Rogue Wave Software Inc.
- Scriptics Corp.
- Secret Labs AB
- UserLand Software Inc.
- Zveno Pty. Ltd.
- IBM
- Hewlett Packard
- Intel
- DevelopMentor Inc.
- Digital Creations
- IONA Technologies PLC
- Jetform
- ObjectSpace Inc.
- Rockwell Software Inc.
- SAP
- Compaq
80Agenda
- XML
- Document Object Model (DOM)
- XPATH
- XSLT
- Schema
- WSDL
- SOAP
- Questions
81Questions
82Bibliography
- Harvey Deitels XMLHow To Program
- Prentice Hall XML Reference
- Microsoft Academic Resource Kit