Title: Module 1 Introduction and Motivation
1Module 1Introduction and Motivation
2If I invent another programming language, its
name will contain the letter X.
(N. Wirth, Software Pioniere Konferenz, Bonn 2001)
3Google Indicator
4A history of Languge
52 x (Descartes)
6lx.2x (Church)
7(LAMBDA (x) ( 2 x)) (McCarthy)
8W3C
- lt?xml version1.0gt
- ltlambda-termgt
- ltvarlistgt ltvargtxlt/vargtlt/varlistgt
- ltexpressiongt ltapplicationgt
- ltexprgtltconstgtlt/constgtlt/exprgt
- ltarg-listgtltexprgtltconstgt2lt/constgtlt/exprgt
- ltexprgtltvargtxlt/vargtlt/exprgt
- lt/arg-listgt
- lt/applicationgt lt/expressiongt
- lt/lambda-termgt
9What can the Web do for you?
- Download show HTML Documents
- Forms
- Pre-compiled point queries
- Updates in specific Web application
- Everywhere, any time, platform independent
- Simple keyword search (Google)
- Good for human-human, human-machine communication
10What the Web cannot do?
- Applications do not understand HTML
- Machine-Machine communication difficult
- Distributed Updates
- Long transactions (business processes)
- Powerful Queries
- Where can I find a used car for CHF 1000
- Scalability in the Millions of Machines
11Design Principles of W3C
- Everybody is autonomous
- Everybody can participate (open)
- All Standards are compatible
- All Standards are downwards compatible
- Platform- and vendor independance
12A little bit of history
- Database world
- 1970 relational databases
- 1990 nested relational model and object oriented
databases - 1995 semi-structured databases
- Documents world
- 1974 SGML (Structured Generalized Markup
Language) - 1990 HTML (Hypertext Markup Language)
- 1992 URL (Universal Resource Locator)
Data documents information 1996 XML (Extended
Markup Language) URI (Universal Resource
Identifier)
13What is XML?
- The Extensible Markup Language (XML) is the
universal format for structured documents and
data on the Web. - Base specifications
- XML 1.0, W3C Recommendation Feb '98
- Namespaces, W3C Recommendation Jan '99
14 XML Data Example
- ltbook year1967gt
- lttitlegtThe politics of experience
- lt/titlegt
- ltauthorgt
- ltfirstnamegtRonaldlt/firstnamegt
- ltlastnamegtLainglt/lastnamegt
- lt/authorgt
- lt/bookgt
Elements
- Syntax, no abstract model
- Documents, elements and attributes
- Tree-based, nested, hierarchically organized
structure
15XML vs. relational data
- Relational data
- Invented as a mathematically clean abstract data
model - Philosophy schema first, then data
- Never had a standard syntax for data
- Strict rules for data normalization, flat tables
- Order is irrelevant, textual data supported but
not primary goal - XML
- First killer application publishing industry
- Invented as a syntax for data, only later an
abstract data model - Philosophy data and schemas should not be
correlated, data can exist with or without
schema, or with multiple schemas - No data normalization, flexibility is a must,
nesting is good - Order may be very important, textual data support
a primary goal
16Reasons for the XML success
- XML is a general data representation format
- XML is human readable
- XML is machine readable
- XML is internationalized (UNICODE)
- XML is platform independent
- XML is vendor independent
- XML is endorsed by the World Wide Web Consortium
- XML is not a new technology
- XML is not only a data representation format,
its a full infrastructure of technologies
17XML as a family of technologies
- XML Information Set
- XML Schema
- XML Query
- The Extensible Stylesheet Transformation Language
(XSLT) - XLink, XPointer
- XML Forms
- XML Protocol
- XML Encryption
- XML Signature
- Others
- almost all the pieces needed for a good
XML-based information hub
18Overview of XML Technologies
- W3C Standards
- Data XML, Namespaces, Infoset, Schema
- Communication SOAP, Encryption, WSDL, UDDI
- Processing Xpath, XSLT, Xquery, Xupdate, Xquery
Text - Integration RDF, OWL
- Other Standards
- Vertical domains RosettaNet, ebXML, ml
- Workflow BPEL
- Interfaces DOM, SAX, JAXP, SQL / XML
19Killer Applications for XML
- Data lives forever (longer than program code)
- legacy systems need to keep code to keep data
- huge IT infrastructures
- hello world program is very complex
- Model before Data (you need to know what you
want) - poor time to market, high cost
- SQL Objects are not enough
- middleware, data marshalling,
- No querying of objects, no encapsulation in SQL
- teure (five star guru) programmers needed
- XML Decouple Data and Schema!!!
20Killer XML advantages
- Code/schema/data independence
- Covers the continuous spectrum from totally
structured data to documents - from data management to information management
- Unique model for representing data, metadata and
code
21Data metadata code
- Data (XML), schemas (XML Schemas) and code (XSLT,
XQuery) they all have an XML syntax - Easy to mix and match
- Data in the schemas (not yet)
- Data in code (already done)
- Code in schemas (not yet)
- Code in the data (not yet) dynamic data
22Misunderstanding about XML
- Data is self-describing.
- Tags dont hold semantics, they only hold the
structure of the information - The interpretation of the tags is in the
application that handles the data, not in the
tags themselves.
23XML handicaps
- Tree, and not a graph.
- Many limitations derive from here, and many
complications in the XML processing languages. - Difficulty in modeling NM relationships
- The notion of reference (e.g. XLink, XPointer)
not well integrated in the XML stack - Duplication of concepts
- Many ways to do the same thing
- Justification for a simpler data model like RDF
- Concepts that seem logically unnecessary
- PIs, comments, documents, etc
- Additional complexity factors
- xsinil, QName in content, etc
24Advantages and disadvantages
- 1. Handles the dual aspect of information
lexical and binary 1 and 01 - Essential feature for the 21st century
information management - E.g. XML-based contract to be used in a legal
procedure - Lots of complexity derives from here
- XML Schema deals with both lexical and binary
constraints - XML Data Model has to include both the
dmtyped-value and dmstring-value - Processing language like XQuery and XSLT have to
define their semantics for both aspects - XML data storage and indexing heavily impacted
25Advantages and disadvantages
- 2. Data is context sensitive.
- We cannot do cut and paste in XML
- Certain aspects of the data depend on the context
where the fragment of data occurs (base-URIs,
namespaces,etc) - Valuable feature for document management
- Very hard consequences on storing, indexing and
processing XML - Semantics of expressions also depends on the
context where they appear - Additional consequences on expression evaluation
26Sources of XML data ?
- Inter-application communication data (WS, Rest,
etc) - Mobile devices communication data
- Logs
- Blogs (RSS)
- Metadata (e.g. Schema, WSDL, XMP)
- Presentation data (e.g. XHTML)
- Documents (e.g. Word)
- Views of other sources of data
- Relational, LDAP, CSV, Excel, etc.
- Sensor data
- It would be interesting to know the pie-chart
and the evolution of each branch !
27Some vertical application domains for XML
- HealthCare Level Seven http//www.hl7.org/
- Geography Markup Language (GML)
- Systems Biology Markup Language (SBML)
http//sbml.org/ - XBRL, the XML based Business Reporting standard
http//www.xbrl.org/ - Global Justice XML Data Model (GJXDM)
http//it.ojp.gov/jxdm - ebXML http//www.ebxml.org/
- e.g. Encoded Archival Description Application
http//lcweb.loc.gov/ead/ - Digital photography metadata XMP
- An XML grammar for sensor data (SensorML)
- Real Simple Syndication (RSS 2.0)
- Basically everywhere.
28RosettaNet
- http//www.rosettanet.org
- Non-profit Organisation
- Sponsors IBM, Oracle, NEC, ...
- More than 400 additional members
- Goals
- Dynamic, flexible trading networks
- Operational efficiency (cost reduction)
- New business opportunities
- Technical Goals
- Common language, standard processesfor sharing
of electronic business information
29RosettaNet
- PIPs Partner Interface Processes
- 8 Clusters
- Support
- Partner Product and Service Review
- Product Information
- Order Management
- Inventory Management
- Marketing Information Management
- Service and Support
- Manufacturing
- Segments wiht PIP Definitions in each Cluster
303 Order Management
- Segment 3a Quote and Order Entry
- Segment 3b Transportation and Distribution
- Segment 3c Returns and Finance
- Segment 3d Product Configuration
31Quote and Order Entry
32Example DTD
lt!-- RosettaNet XML Message Schema
3A1_MS_R02_00_QuoteRequest.dtd (16-Apr-2001
1246) This document has been prepared by
Edifecs (http//www.edifecs.com/) based On
the Business Collaboration Framework from
requirements in conformance with the
RosettaNet methodology. --gt lt!ENTITY
common-attributes "id CDATA IMPLIED" gt
lt!ELEMENT Pip3A1QuoteRequest (
fromRole , GlobalDocumentFunctionCode
, Quote , thisDocumentGenerati
onDateTime , thisDocumentIdentifier ,
toRole ) gt
33Example DTD (ctd.)
lt!ELEMENT fromRole (
PartnerRoleDescription ) gt lt!ELEMENT
PartnerRoleDescription (
ContactInformation? ,
GlobalPartnerRoleClassificationCode ,
PartnerDescription ) gt lt!ELEMENT
ContactInformation ( contactName ,
EmailAddress , facsimileNumber?
, telephoneNumber ) gt lt!ELEMENT
contactName ( FreeFormText ) gt
lt!ELEMENT FreeFormText ( PCDATA )
gt lt!ATTLIST FreeFormText xmllang
CDATA IMPLIED gt
34Example DTD (ctd.)
lt!ELEMENT Quote ( comments? ,
financedBy? , GlobalGovernmentPriorityR
atingCode? , GlobalQuoteTypeCode ,
governmentContractIdentifier? ,
PriceCondition? , QuoteCustomerInformati
on? , QuoteLineItem ,
quoteRequestIdentifier ,
requestedResponseDate? ,
RequoteReference? , respondTo ,
submittedDate? , TaxExemptStatus? ,
transportedBy? ) gt
35ebXML
- http//www.ebxml.org
- OASIS Organization for the Advancement of
Structured Information Standards - Non profit, ... (like RosettaNet)
- Competition to RosettaNet
- ebXML Mission To provide an open XML based
infrastructure enabling the global use of
electronic business information in an
interoperable, secure and consistent manner for
all parties. - Uses XML Schema (not DTDs)
36ebXML Example (SOAP)
ltSOAPEnvelope xmlnsSOAP"http//schemas.xmlsoap.
org/ xmlnsxsi"http//www.w3.org/2001/XMLSche
ma-instance" xsischemaLocation"http//schemas.
xmlsoap.org/gt ltSOAPHeader
xmlnseb"http//www.oasis-open.org/
xsischemaLocation"http//gt
ltebMessageHeader ...gt
... lt/ebMessageHeadergt lt/SOAPHeadergt ltSOAPBo
dy xmlnseb"http//www.oasis-open.org/
xsischemaLocation"http//gt ltebManifest
ebversion"2.0"gt ... lt/ebManifestgt lt/SOA
PBodygt lt/SOAPEnvelopegt
37ebXML Header Info (u.a.)
- Conversation ID
- ltebConversationIDgt2000-33-15-7lt/ebConversationID
gt - Sender and Recipient
- ltebFromgt
- ltebPartyId ebtype urndunsgt123lt/ebPartyI
dgt - ltebPartyId ebtype SCACgtRDWYlt/ebPartyIdgt
- ltebRolegthttp//rosettanet.org/roles/Buyerlt/eb
Rolegt - lt/ebFromgt
- ltebTogt
- ltebPartIdgtmailtojoe_at_example.comlt/ebPartIdgt
- ltebRolegthttp//rosettanet.org/roles/Sellerlt/eb
Rolegt - lt/ebTogt