Title: Combined XML, SGML Issues
1Combined XML, SGML Issues
- William J. Bill McCalpin
- MIT, LIT, CDIA, EDP
- AIIM 2002 - March 6, 2002
2About MHE
- MHE is the print2image2Internet consulting firm
- MHEs principals have nearly 40 years of
experience in electronic print streams, in taking
electronic print streams to imaging systems, and
now in taking legacy information to the Internet - See http//www.mhe-consulting.com
3About the Speaker
- William J. Bill McCalpin is a principal at MHE
- Mr. McCalpin was the first - and for years the
only - person in the world to have the MIT, LIT,
CDIA, and EDP designations - Mr. McCalpin serves on the AIIM Accreditation
Committee and AIIM Conference Committee
4About the Speaker (cont.)
- Mr. McCalpin is on the Xplor Board of Directors
and is Treasurer - Mr. McCalpin recently completed a two-year stint
as Xploration Editor-in-Chief - Mr. McCalpin is a frequent speaker at both AIIM
and Xplor
5What Do You Say When They Ask You, When Are You
Going To Support XML?
6But The Real Question Is, Why Should I Support
XML?
7Agenda
- What is XML?
- What do we do in e-Business?
- When do you want to use XML?
- The Right Way and the Wrong Way to use XML
- The Flow of Information
- The XML Bubble
- The answer to when and why
8What is XML?
9XML And SGML
- XML is eXtensible Markup Language
- XML is an instance of SGML, Standard Generalized
Markup Language, an ISO standard (ISO 8879) - XML is extensible because people and
enterprises with common interests get together to
define the tags which describe their data
10XML and HTML
- HTML is a tagged language, but the tags are 40 or
50 grammatical tags like ltpgt or lth1gt - XML is a tagged language, and the tags are
(usually) created and agreed to by domains or
vertical industry segments. E.g. ltaccount_numbergt
or ltcitygt
11The Document
- A document is an organized collection of
information in time - A document contains information which can be
understood by human or machine, and has validity
at some period in time - The information in a document can be organized in
many ways - as text, bitmaps, print streams,
tagged languages, etc.
12The New Document
- Per this definition, the document
- does not depend on which organization of the
information is used (so long as author and
recipient agree) - does not depend on the medium (paper, film,
optical, magnetic or even parchment are all fine) - does not have to have presentation information,
because the recipient may be a machine
13Three Parts of an XML Document
Tagged Data (in XML)
Tag Definitions (in DTD or Schema)
Presentation (in XSL or CSS)
14The XML Document
- Data - data values bounded by XML tags
- Presentation
- CSS - Cascading Style Sheets, like for HTML
- XSL - format information in XML
- Tag Definitions
- DTD - Document Type Definitions - old SGML
definition - Schema - definitions in XML
15Data In the XML Document
- Data is the purpose of an XML document
- Each piece of data is specifically identified by
a tag - Data is organized because the tags match patterns
in the DTD or Schema - An example of data in XML
16Data Example in XML
- ltAUTHORgt
- ltNAMEgtWilliam J. "Bill" McCalpin, EDPP,
CDIA, MIT, LITlt/NAMEgt - ltJOBTITLEgtPrincipallt/JOBTITLEgt
- ltAFFILIATIONgtMHElt/AFFILIATIONgt
- ltADDRESSgt
- ltSTREETgt1400 Cheyenne
Dr.lt/STREETgt - ltCITYgtRichardsonlt/CITYgt
- ltSTATEgtTexaslt/STATEgt
- ltZIPCODEgt75080lt/ZIPCODEgt
- ltEMAILgtmccalpin_at_mhe-consulting.com
lt/EMAILgt - lt/ADDRESSgt
- lt/AUTHORgt
17Presentation in XML
- Tags in XML dont have natural formatting (unlike
HTML), so if presentation is needed, it must be
explicitly defined - CSS can be used for HTML and XML
- XSL can be parsed by an XML parser, and it can be
used by XML and XSLT - XSL example
18Presentation Example
- lt?xml version"1.0"?gt
- ltxslstylesheet xmlnsxsl"http//www.w3.org/TR/WD
-xsl"gt - ltxsltemplate match"author"gt
- ltTABLE WIDTH"100" BORDER"1" CELLSPACING"0...
ltTRgt - ltTD COLSPAN"2"gt
- ltTABLE WIDTH"100" BORDER"1"
CELLSPACING"0... - ltFONT COLOR"000000"gtltxslvalue-of
select"name"/gtlt/FONTgt - lt/TDgt
- ...
- lt/xsltemplategt
- lt/xslstylesheetgt
19Why Two Style Sheet Languages?
20DTD/Schema in XML
- The DTD is the old (SGML) way of defining not
only what tags are valid, but their relative
order, number, mandatory/optional attributes, and
so on - The Schema is a total rewrite - written in XML
itself - which defines all of the above as well
as possible legal values for a tag (e.g.,
integer, date, days of the week, etc.)
21Schema Example
- lt?xml version"1.0"?gt
- ltSchema name"sample_schema" ...gt
- ...
- lt!-- Element Types --gt
- lt!-- data --gt
- ltElementType name"author"gt
- ltelement type"name" minOccurs"1"
maxOccurs"1"/gt - lt/ElementTypegt
- ...
- lt/Schemagt
22What do we do in e-Business?
23What is e-Business?
- Of course, e-Business is really just doing
business using 100 electronic methods such as
the Internet - In e-Business, we do transactions or exchange
information using electronic media rather than
the usual paper media - e-Business can broken down into two parts
- B2C
- B2B
24B2C
- B2C is Business to Consumer
- Your business generates the information, and a
consumer receives it - The consumer is normally interested only in the
data and its presentation - Thus, in this scenario, the consumer needs only
an XML document and CSS/XSL - which is more or
less the same as HTML!
25Important Fact 1
- When you are engaged in B2C, and the recipient is
a consumer with a thin client, then HTML is
usually sufficient - Supplying the data in XML is usually a waste of
time, because the recipient gets no additional
value from the XML over HTML - XHTML is just HTML which is XML compliant
26B2B
- B2C is Business to Business
- Your business generates the information, and
another business receives it - Frequently, the recipient is not a person, but a
software process in the business - Thus, in this scenario, the recipient often needs
only the XML data and the reference to the DTD or
Schema - no presentation may be needed!
27Important Fact 2
- When you are engaged in B2B, and the recipient is
a software process, then XML is often the most
appropriate format - Binary data formats may be smaller, but will
require more work and more maintenance - Dont send presentation information unless the
recipient actually wants your presentation
information!
28When do you want to use XML?
29When Do I Use XML?
- As we have seen, XML is best suited for the
preservation of the authors content - And (X)HTML is best suited for presentation of
information to an enduser - And this leads us to...
30Important Fact 3
- In todays market
- XML is better utilized when communicating with a
thick client - that is, most B2B in which a
software process is the recipient - (X)HTML is better utilized when communicating
with a thin client - that is, most B2C in which
an Internet browser is the recipient - And when is this not true?
31Exceptions to Fact 3
- XML can be used in B2C when the browser is used
with so much Java and other local applications
that the overall process resembles a thick client - (X)HTML can be used in B2B if the recipient is
just a human being rather than a software
process, e.g., when information is transmitted
only to be viewed
32The Right Way And The Wrong Way To Use XML
33CML Chemical Markup Language
- One of the early vertical implementations of
XML - The official site is http//www.xml-cml.org/
- A better site is http//www.ch.ic.ac.uk/chimeral
/ - CML uses the trio of tagged data, Schema, and XSL
34A CML XML Document
- ltmolecule title"caffeine" id"mol_caffeine"gt
- ltformulagtC8 H10 N4 O2lt/formulagt
- ltstring title"CAS"gt58-08-2lt/stringgt
- ...
- lt/moleculegt
35The CML Schema
- lt?xml version"1.0"?gt
- ltSchema name"cml_dev_karne" xmlns"urnschemas-mi
crosoft-comxml-data" xmlnsdt"urnschemas-micros
oft-comdatatypes"gt - ...
- lt!-- Element Types --gt
- lt!-- data --gt
- ltElementType name"molecule" content"eltOnly"
model"open" order"many"gt - ltelement type"formula" minOccurs"0"
maxOccurs""/gt - ...
36A CML Stylesheet
- ltxsltemplate match"molecule"gt
- ltTABLE WIDTH"100" BORDER"1" CELLSPACING"0"
CELLPADDING"3" BORDERCOLOR"CCCCFF"
BGCOLOR"EEEEFF"gt - ltTRgt
- ltTD COLSPAN"2"gt
- ltFONT COLOR"0000AA"gtFormula
- ltFONT COLOR"000000"gtltxslvalue-of
select"formula"/gtlt/FONTgtlt/TDgtltTDgt - ...
37The CML Document
- Note that each data item is tagged
- Note that each tag matches the standard Schema
- Note that the data is used to create a complex
image in the browser - but not the only possible
image!
38A Print to XML/HTML Conversion
- Print stream does not contain any metadata, only
data and presentation information - Tags cannot be meaningful unless they are
reverse-engineered - The result might be only the tagged data and the
stylesheet - Too often, the XML looks like
39Bad XML Example
- / text positioning information /
- .ps0positionabsolutetop533pxleft29pxwidth4
0px - .ps1positionabsolutetop533pxleft317pxwidth
38px - .ps2positionabsolutetop533pxleft454pxwidth
90px - ...
- / font properties information /
- .ft1font-weightboldfont-size22px
- .ft2font-size17px
- .ft3font-size11px
- lt!-- text starts here --gt
- ltSPAN CLASS"ps0"gtltNOBRgtAccount
Numberlt/NOBRgtlt/SPANgt - ltSPAN CLASS"ps1"gtltNOBRgt12345lt/NOBRgtlt/SPANgt
- ltSPAN CLASS"ps2"gtltNOBRgtNamelt/NOBRgtlt/SPANgt
- ...
40An Image to XML Example
- Most information may not be tagged
- ltinvoicegt
- ltaccount_nogt12345lt/account_nogt
- ltnamegtBill McCalpinlt/namegt
- ltdatagt70 02 02 02 02 FE A7 47 47 48 03 F9 A7
42 27 4A 74.lt/datagt - lt/invoice
41The Flow of Information
42The Flow of Information
- E-Business is about the flow of information
between parties as well as within the enterprise - Traditionally, as information moves through the
business process, we lose as much information as
we add - Look at how we used to treat information
43As Information Flow Used to Be
44As Information Flow Used To Be
Data
Data
Toner on paper
Data awareness (metadata)
Presentation information
Scan
Composer
X010101(bits)
Archive
Zap!
45As Information Flow Is Today
46As Information Flow Is Today
Data
Data
Web page, emails, etc.
Data awareness (metadata)
Presentation information
Transform
Composer
Text and graphics
PDF
Zap!
47As Information Flow Should Be
48As Information Flow Should Be
email
Data
Data
Data awareness (metadata)
Data awareness (metadata)
WAP
Complete XML documents
Web page
Presentation information
archive
paper
User
49Or, As In The XML Bubble...
Web page
Process
Add presenta- tion
Data metadata
email
Data metadata
Data metadata
Process
Cell phones
B2B applica- tions
Archive
50Important Fact 4
- Use XML to delay the loss of important
information - Dont throw away information until you commit the
document to a final format which cant support it - In other words, keep the information in XML as
long as possible
51The XML Bubble
52Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
Billing
HR
Pol. Proc.
EDI
53Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
Billing
HR
Pol. Proc.
EDI
XML
EBPP
54Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
55Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
56Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
57Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
58Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
59Todays Billing Process XML
Billing Extract
Post Process
Print/ Format
Data Base
XML App.
60Driver
Driver
XML Applications with business rules
Driver
Email
Driver
61Remember the Question, Why Should I Support
XML?
62Why Should I Support XML?
- I should support XML in B2B, unless the recipient
wants only to view my presentation - I should support (X)HTML in B2C, unless the
recipient has a thick client which can utilize
the XML (cf. Quicken and OFX)
63How Should I Use XML?
- Once information is in XML, I should keep it
there as long as possible - I should use industry accepted DTDs and Schemas
- I shouldnt even think of well-formed XML
(syntactically correct but no DTD/Schema) as real
XML, to avoid confusion
64A Final Note
- The World Wide Consortium (www.w3c.org) is the
standards body for the generic protocols of XML,
such as XML syntax itself, XSL, RDF, etc. - Most domain or vertically centric XML
definitions are supported by the verticals
themselves, e.g., CML, GEML (Gene Expression
Markup Language), etc.
65A Final Note, Part Deux
- At www.xml.org, there are nearly 100 Schema/DTDs
listed from 31 different industries, from AIML
(Astronomical Instrument Markup Language) to
RecipeML (Recipe Markup Language) yes, XML for
the kitchen. - Also see Robin Covers excellent work at
xml.coverpages.org/sgml-xml.html
66Contact Information
- William J. Bill McCalpin
- MIT, LIT, CDIA, EDP
- Principal
- MHE
- 1400 Cheyenne Dr.
- Richardson, Texas 75080-3921 USA
- (972) 231-3660 (v) (972) 690-4521 (f)
- mccalpin_at_mhe-consulting.com
- www.mhe-consulting.com