Title: XPipe An XML Processing Methodology
1XPipe - An XML Processing Methodology
- XML 2001 Florida, USA
- Sean McGrath
- CTO
- Propylon
2What is XPipe?
- It is an architecture / methodology /framework
for developing robust, scaleable, manageable XML
processing systems. - based on proven mechanical manufacturing
techniques. Specifically - The Assembly Line Principle
- Component assembly and component re-use
3What is XPipe
- An open source project hosted on Sourceforge
- http//xpipe.sourceforge.net
- A contribution to the blossoming meme of using
pipeline based processing to tame the burgeoning
complexity of XML transformations - (If you do not find XML transformation
complicated, you are not sufficiently well
informed.) - (And no, XSLT does not solve all your problems)
- A way of thinking about systems that focuses on
information flows rather than APIs
4Contents of this talk
- The XPipe philosophy
- Major functional elements
- Some examples
- Relationship to other technologies
- The XGrid
- Some anticipated objections (and answers)
- Current status
- Current problems
- Future plans
5XPipe Philosophy
Cars Are complex, hierarchical structures
Henry Fords Model T Ford Assembly Line 1914
6XPipe Philosophy
Lunch is a complex, hierarchical structure
Lunch Assembly Line 2001
7XPipe Philosophy
We are complex, hierarchical structures
8XPipe philosophy
- What have these scenes got it common?
- Complex construction of cars, tuna melts and
tendons made possible and efficient through - assembly line manufacturing
- re-usable component processes and component
materials - Why not apply this approach to XML
manufacturing?
9XPipe philosophy
- Why does the assembly line approach work?
- Transformation task decomposition
- Re-usable transformation components
- Transformation decomposition is the key to
complexity management. Just ask - Henry Ford
- Herbert Simon (The Two Watchmakers The
Architecture of Complexity) - George Miller (7/-2)
- Adam Smith (An Inquiry into the Nature And Causes
of the Wealth of Nations,1776) - Any electrical or chemical engineer.
10XPipe philosophy
- Component re-use is the key to productivity
- Ask any form of engineer (electrical, chemical
etc.) apart from software engineers - Component re-use remains a holy grail in software
engineering - XPipe is yet another attempt
11XPipe philosophy
- A lot of data processing will consist of XML to
XML transformation - A lot of non-XML data processing can consist of
XML to XML transformations with the addition of
top and tail transformations - Mantra
- Get data into XML as quickly as possible
- Keep it in XML until the last possible minute
- Bring all your XML tools to bear on solving the
data processing problem
12XPipe philosophy
Input XML
Output XML
Top Transformation
Tail Transformation
Non-XML Input
Non-XML Output
13XPipe philosophy
- The philosophy hinges on the fact that every
complex XML transformation can be broken down
into a series of smaller ones than can be chained
together
14XPipe philosophy
- Only so many ways to re-arrange an XML tree
structure - A finite number of fundamental transformations,
from which all higher order transformations can
be derived
15XPipe philosophy
- Transformation Decomposition leads to
- a series of small, manageable, stand alone
problems with an XML input spec and an XML
output spec. - Can build, test, use and then re-use these
transformation components - Very team development friendly
- High cohesion, loose coupling just like the
professor advised
16XPipe philosophy
- Pipeline approach means you can mix nmatch
black-box components that internally use whatever
paradigm best suited the problem - Lexical
- SAX
- DOM
- XSLT
- XDuce, Pyxie, Haskell
17Sample XPipe
DB /CMS
Character Set Mods
Add Doctype validate strip doctype
Lexical
Re-arrange Elements
Validation
Lexical
DOM
Stats FTP
Schematron/ RelaxNG/ Rhino
SQL Replace
Jython
XHTML Generate
Java
XSLT
18XPipe philosophy
- Assertion developers would use a component
based approach to XML processing if they did not
have to write the plumbing (orchestration,
exception handling) themselves - Gee, this problem is complex. Maybe Ill do it
in multiple stages! Gee, now I have to
orchestrate the stages somehow. Batch files/shell
scripts/driver program all ugly and error
prone. Maybe Ill just write a single program
after all
19XPipe philosophy
- Professional developers spend 50 percent of
their time writing plumbing Adam Bosworth - XPipe aims to look after the plumbing letting
developers concentrate on the interesting stuff
20Major Functional Elements XComponents
- Developed in any language that runs on the Java
Virtual Machine (Jython, Java, XSLT, Rhino
(JavaScript) etc.) - All XComponents are standalone programs of the
form - Name InputXML OutputXML ErrorXML
21Major Functional Elements - XComponents
- XComponents described in XML form. An Xcomponent
consists of - Documentation
- Unit Tests (input,output XML stream pairs)
- Metadata for retrieval
- Input and Output predicates declarative
(DTD/RelaxNG/Schema) or procedural (code)
22Major Functional Elements XComponent Unit Tester
- Standalone program analogous to JUnit or PyUnit
but for XML transformation component testing - Very outsource-friendly and inbetweenable
approach (specify everything but the code
specdoctest harness all in one)
23Major Functional Elements XPipes
- Described in XML
- Consist of
- Documentation
- Input/Output Predicates (Schemas/Code)
- Test Suite
- References to XComponents which are resolved when
the XPipe is installed
24Major Functional Elements XPipe Executive
- Uniprocessor
- XPipe executed on 1 machine, possibly with
separate threads for each XComponent task - Multiprocessor
- XML based protocol to implement Job Shop work
distribution over a P2P network
25Major Functional Elements XPipe Monitor
26Some related open technologies
- - Unix Pipes
- SAX Filters
- TRAX
- XBeans
- Cocoon
- axKit
- JXTA
- Translets
- TupleSpaces
27Simple XComponent examples
- Fundamental Operation Rename Element
- Rename
- Input ltfoogtbazlt/foogt
- Output ltbargtbazlt/bargt
foo
bar
baz
baz
28Simple XComponent examples
- Fundamental Operation - Peel
- Input ltfoogtltbargtbazlt/bargtlt/foogt
- Output ltfoogtbazlt/foogt
foo
foo
bar
baz
baz
29Simple XComponent examples
- Compound Operation - Matryoshka
- Input
- ltfoogtltbargtbazlt/bargtlt/foogt
- Output
- ltfoogtlt/foogtltbargtlt/bargtbaz
foo
bar
foo
bar
baz
baz
30Simple Xcomponent examples
- KlingonCloak
- Input
- ltfoogtltbargtbazlt/bargtlt/foogt
- Output
- lttag namefoogtlttag namebargtbazlt/taggtlt/taggt
foo
tag typefoo
bar
tag typebar
baz
baz
31Sample Xcomponents
- Once you start thinking in terms of Pipes
components appear everywhere - Regular fragmentations
- Doctype changer
- namespace normalizer
- Character set transcoder
- Hash generator
- RelaxNG/Schematron etc
- A validator can be thought of as a component in
an Xpipe that mirrors its input on its output
32Validation as an XComponent
XML A
XML A
RelaxNG Schematron Jython/Java/JACL XComponent
Input
Output
Validation Log
Error
33The XGrid
- Grid Technologies computational power on tap
(http//www.gridforum.org) - The XGrid computational power on tap to
execute XPipes
34The XGrid
35Some objections (with some answers)
- It will be slow
- No it wont - Premature optimization is the root
of all evil! - Speed is a three headed monster. Im old enough
to have left the X axis and currently heading for
Y through Z
The 3 Axes to Speed
36Some objections (with some answers)
- It will be slow (cont.)
- Massive Parallelism will kill all von Neumann
throughput arguments - Documents per second, not seconds per document
- A myriad of compile time optimizations on
XPipes possible - Keep the architecture simple and speed will
sort itself out
37Some objections (with some answers)
- Pipes are not rich enough, real data flows
require graphs - Inside every graph is a collection of straight
segments - Do the smallest thing than can possible work
- XComponents can conditionally flow data in
different directions graph
38Some objections (with some answers)
- Component based software? Harumph! We have heard
that one before - XPipe is data flow based not API based (COM, VBX,
CORBA). They payload is what is important not
the plumbing - Information integration (needed on the server
side) not application integration (needed on the
client side)
39Current Status
- Schemas for XPipes and XComponents on
xpipe.sourceforge.net. feedback required - Sample components (Java/XSLT/Jython) and some
documentation - Simple, illustrative XPipe uniprocessor
executives - Draft of XJCL XGrid Job Control Language
40Current Status
- Uniprocessor XPipe used to develop
- 80-C pipe from Hub notation for a complex
document type to a legacy mainframe display
notation. 120 page spec. - 20-C pipe for semantic validation of legislation
documents - Xpipe and XComponent validators
41Current Problems
- Everybody agrees that an XML document is a tree
but - The content and structure of the tree depends on
the parser - The content and structure of re-generated XML
(The round-tripping problem)
42Current Problems
- Naming things
- Taxonomy of XTLs (XML Transformation Languages)
- Taxonomy of re-usable XComponents and XPipes
43Current Problems
- Flexible transformation scheduling is hard
- Optimal transformation scheduling is very hard
- Packaging
44Future Plans
- Evangelize the idea that DTD validated XML 1.0 is
just Well Formed XML that has been through a pipe
consisting of - A transclusion component (entity expansion)
- A macro pre-processor (conditional marked
sections) - An attribute decorator (implied/fixed attributes)
- A grammar checker
45Valid XML
Well Formed XML
Paremeter Entity Expansion
Conditional Sections
General Entity Expansion
Attribute Decoration
Grammer Validation
Valid XML
46Future Plans
- XPipes and XComponents as web services
(SOAP/XML-RPC, UDDI etc.) - Getting the P2P and Grid Technology communities
input into XGrid. - Getting help to develop the XPipe reference
implementation on Sourceforge
47Future Plans
- Development of commercial implementations of
XPipe integrated with leading EAI systems
(Ongoing) - Use of SCADA tools to develop XPipe process
control and monitoring systems
48Future Plans
- Use of Animation Engineering techniques for CAXTE
tools (Computer Aided XML Transformation
Engineering) - Digging around hierarchy theory, self-assembly,
bio-informatics and nanofabrication for concepts
and tools applicable to XML transformations
49In conclusion
- XPipe is simple
- Simplicity works!
- Plenty of evidence outside of XML engineering
that this approach will work - Plenty of lore and tools from other fields of
science can be brought to bear to build systems
using the XPipe approach
50Thank you
- http//xpipe.sourceforge.net