Title: The XML Bubble
1The XML Bubble
- William J. Bill McCalpin
- EDPP, CDIA, MIT, LIT
- Principal, MHE
2- Xplor 21st Global Conference and Exhibit
- Miami Beach, Florida
- October 30, 2000
3Introduction
4Thesis, Antithesis, Synthesis
- In the philosophy of Hegel, these words show the
inevitable transition of thought, by
contradiction and reconciliation, from
an initial conviction to its opposite and then to
a new, higher conception that involves but
transcends both of them
5The Hegelian Dialetic
- Thesis Most business have well-established,
productive legacy systems - Antithesis XML is springing forth everywhere
- Synthesis XML will be integrated with legacy
systems - enhancing some processes, changing many
others, and eliminating some altogether - In short, XML will affect what you do
6The Document In The 20th Century
7What Is A Document?
- The American Heritage Dictionary defines a
document as information in writing placed on a
medium such as paper, often used as a record. - Documents have been placed on clay tablets, gold
leaf, animal skins, all types of paper,
microfilm, optical storage, and so on
8Information And Presentation
- In every case, the document represents a
fundamental union of information and presentation - But presentation presumes that the primary
audience for the document is a human being - With the coming of the Internet, this is no
longer the case
9The Curse Of Presentation
- Composition products require that you specify a
printer, even before you know where the document
will print
10Why Are Print, Image, And Presentation Formats
Incompatible?
11Printing And Imaging Formats
- Many printing formats AFP, Metacode, DJDE, XES
(UDK), PostScript, PCL, etc. - All formats use external resources like fonts,
forms, graphics, etc., although sometimes
inconsistently - Most are escape-sequence based, some are formal
data architectures, and some are almost
programming languages
12Printing And Imaging Formats
- Many imaging formats - while most used CCITT
Group 4 for image compression, most also had
proprietary data wrappers - Later systems adopted text-based formats such as
PDF, although storing other print streams is not
unknown - Systems which store text-based formats must
wrestle with resource issues
13Different Print Formats
- Why do printers have different formats? Because
of physical constraints imposed by the hardware - resources reduce the amount of data sent through
pipeline to printer - pages must be imaged in less than a fraction of a
second - complex graphics can be developed on the printer,
but this needs a special language
14Different Imaging Formats
- Why do imaging systems have different formats
because of physical constraints imposed by the
hardware - Mass storage was expensive
- Indexing schemes were too close to the
application - Text is avoided sometimes because of resource
issues - Interoperability with other products an issue
15Result
- In each case, data architecture decisions were
made in order to enhance some aspect of
legibility of the stored objects. - If there were no requirement to present the
information (to a human reader), then the
requirement for custom data formats for each
vendor would probably disappear!
16Universal Literacy
- Whos reading our documents?
17The Road To Universal Literacy
- First, only the few could read
- After the printing press, the many began to read
- Eventually, educational reforms brought the
ability to read to all
18Literacy In The Internet Age
- Can there be a spread of literacy beyond all?
- How many webpages have you ever read?
- You will never be able to keep up with the Web
alone
19Intelligent Agents
- Just around the corner is software that will read
the Web for us not search, but read - So we have to spread literacy to an audience
beyond all people, that is - Does increased quality in presentation mean
better computer literacy?
20Noise On The Net
- Think of the average webpage
- three dimensional spinning objects
- marquees scrolling across the bottom
- multiple frames bookmarks
- audio
- These items are all designed to attract the eye
your eye - This does nothing for the machine reading the
webpage
21The Cost Of Data Differences
- NASA lost a 125 million Mars orbiter because
one engineering team used metric units while
another used English units for a key spacecraft
operation... CNN 9/30/99
22The Nature Of XML
23XML And SGML
- XML is eXtensible Markup Language
- XML is an instance of SGML, Standard Generalized
Markup Language, an ISO standard (ISO 8879) - XML is extensible because people and
enterprises with common interests get together to
define the tags which describe their data
24XML And Print Formats
- In most print formats, something like account
number would be - AMB 200 AMI 300 SCFL 01 STO 0, 90 TRN 12345-67890
- In XML, the same information is
- ltaccount_numbergt12345-67890
25XML And Print Formats
- The nature of all print formats is to be focused
on the presentation of the information. - The nature of XML is focused on the authors
content, that is, information is described as
what it is, not how it looks.
26Why XML Over Print?
- Given that print formats are focused on the
presentation, it is often difficult for the
non-human reader to derive information out of the
print data. - E.g., we could have
- AMB 200 AMI 300 SCFL 01 STO 0,90 TRN 12345 RMI120
TRN - RMI 24 TRN 67890 - Note the data is not required to be contiguous
27Separating Information From Presentation
- XML enables the total separation of information
from presentation - Thus, some XML objects have only tagged
information, while others have content and
presentation information
XML
XML
XSL
28The Four Spaces
29Dr. Davidsons DocumentSpace
- Dr. Keith Davidson, EDPP, hypothesized that we
work in something called the DocumentSpace - He believes that industries will become spaces
under the influence of the Internet
30Three Spaces
- Dr. Davidson stated that there were three spaces
PrintSpace, MarketSpace, and DecisionSpace - PrintSpace comprised our existing industry
- MarketSpace covered documents used in financial
transactions - DecisionSpace deals with documents used in
knowledge management
31Three Spaces Become Four
- I have added a fourth space ArchiveSpace, the
use of documents in archival and records
management to preserve information - These four spaces can be viewed as ---gt
32The Use Of The Document In The Four Spaces
33Document And Information
- The document is used as a container of
information, particularly in the exchange of
information across the boundaries between the
four spaces - Documents are used for two reasons
- (1) The lack of common data standards across the
four spaces, and - (2) The requirement that humans be able to read
and process the information
34Print To Image
35Print To Image Format
- Print formats are Metacode, DJDE, AFP, PCL,
PostScript, and so on - Image formats are TIFF, MODCA, other proprietary
formats using CCITT-4, and PDF - Only AFP MODCA, and PostScript PDF are
closely related, but PostScript to PDF requires a
transform, and AFP and MODCA often arent
implemented the same
36Print To Market
37Print To Market Formats
- Print formats are Metacode, DJDE, AFP, PCL,
PostScript, and so on - Financial Interchange formats are OFX/IFX, XML,
and transaction data - The significant data must be extracted out of the
print stream to create data for SGML formats - a
sometimes hazardous process - However, using original transaction data may not
be correct
38Print To Knowledge
39Print To Knowledge Formats
- Print formats are Metacode, DJDE, AFP, PCL,
PostScript, and so on - True Knowledge Management does not yet exist -
its often blob management - XML and its many related standards will make KM
possible, if you think of KM as something like
human knowledge - As noted, XML out of existing processes can be
hazardous
40The Growth Of The XML Bubble
41Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
Billing
HR
Pol. Proc.
EDI
42Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
Billing
HR
Pol. Proc.
EDI
XML
EBPP
43Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML
Billing
HR
Pol. Proc.
EDI
EBPP
44Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML
Billing
HR
Pol. Proc.
EDI
EBPP
45Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML
Billing
HR
Pol. Proc.
EDI
EBPP
46Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML
Billing
HR
Pol. Proc.
EDI
EBPP
47Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML
Billing
HR
Pol. Proc.
EDI
EBPP
48William J. Bill McCalpin
- EDPP, CDIA, MIT, LIT
- Principal, MHE
- 1400 Cheyenne Dr.
- Richardson, Texas 75080-3921
- 972-231-3660 (v) 972-690-4521 (f)
- mccalpin_at_mhe-consulting.com