XPipe An XML Processing Methodology - PowerPoint PPT Presentation

About This Presentation
Title:

XPipe An XML Processing Methodology

Description:

... means you can mix n'match black-box components that internally use whatever ... runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 51
Provided by: seanmc3
Category:

less

Transcript and Presenter's Notes

Title: XPipe An XML Processing Methodology


1
XPipe - An XML Processing Methodology
  • XML 2001 Florida, USA
  • Sean McGrath
  • CTO
  • Propylon

2
What is XPipe?
  • It is an architecture / methodology /framework
    for developing robust, scaleable, manageable XML
    processing systems.
  • based on proven mechanical manufacturing
    techniques. Specifically
  • The Assembly Line Principle
  • Component assembly and component re-use

3
What is XPipe
  • An open source project hosted on Sourceforge
  • http//xpipe.sourceforge.net
  • A contribution to the blossoming meme of using
    pipeline based processing to tame the burgeoning
    complexity of XML transformations
  • (If you do not find XML transformation
    complicated, you are not sufficiently well
    informed.)
  • (And no, XSLT does not solve all your problems)
  • A way of thinking about systems that focuses on
    information flows rather than APIs

4
Contents of this talk
  • The XPipe philosophy
  • Major functional elements
  • Some examples
  • Relationship to other technologies
  • The XGrid
  • Some anticipated objections (and answers)
  • Current status
  • Current problems
  • Future plans

5
XPipe Philosophy
Cars Are complex, hierarchical structures
Henry Fords Model T Ford Assembly Line 1914
6
XPipe Philosophy
Lunch is a complex, hierarchical structure
Lunch Assembly Line 2001
7
XPipe Philosophy
We are complex, hierarchical structures
8
XPipe philosophy
  • What have these scenes got it common?
  • Complex construction of cars, tuna melts and
    tendons made possible and efficient through
  • assembly line manufacturing
  • re-usable component processes and component
    materials
  • Why not apply this approach to XML
    manufacturing?

9
XPipe philosophy
  • Why does the assembly line approach work?
  • Transformation task decomposition
  • Re-usable transformation components
  • Transformation decomposition is the key to
    complexity management. Just ask
  • Henry Ford
  • Herbert Simon (The Two Watchmakers The
    Architecture of Complexity)
  • George Miller (7/-2)
  • Adam Smith (An Inquiry into the Nature And Causes
    of the Wealth of Nations,1776)
  • Any electrical or chemical engineer.

10
XPipe philosophy
  • Component re-use is the key to productivity
  • Ask any form of engineer (electrical, chemical
    etc.) apart from software engineers
  • Component re-use remains a holy grail in software
    engineering
  • XPipe is yet another attempt

11
XPipe philosophy
  • A lot of data processing will consist of XML to
    XML transformation
  • A lot of non-XML data processing can consist of
    XML to XML transformations with the addition of
    top and tail transformations
  • Mantra
  • Get data into XML as quickly as possible
  • Keep it in XML until the last possible minute
  • Bring all your XML tools to bear on solving the
    data processing problem

12
XPipe philosophy
Input XML
Output XML
Top Transformation
Tail Transformation
Non-XML Input
Non-XML Output
13
XPipe philosophy
  • The philosophy hinges on the fact that every
    complex XML transformation can be broken down
    into a series of smaller ones than can be chained
    together

14
XPipe philosophy
  • Only so many ways to re-arrange an XML tree
    structure
  • A finite number of fundamental transformations,
    from which all higher order transformations can
    be derived

15
XPipe philosophy
  • Transformation Decomposition leads to
  • a series of small, manageable, stand alone
    problems with an XML input spec and an XML
    output spec.
  • Can build, test, use and then re-use these
    transformation components
  • Very team development friendly
  • High cohesion, loose coupling just like the
    professor advised

16
XPipe philosophy
  • Pipeline approach means you can mix nmatch
    black-box components that internally use whatever
    paradigm best suited the problem
  • Lexical
  • SAX
  • DOM
  • XSLT
  • XDuce, Pyxie, Haskell

17
Sample XPipe
DB /CMS
Character Set Mods
Add Doctype validate strip doctype
Lexical
Re-arrange Elements
Validation
Lexical
DOM
Stats FTP
Schematron/ RelaxNG/ Rhino
SQL Replace
Jython
XHTML Generate
Java
XSLT
18
XPipe philosophy
  • Assertion developers would use a component
    based approach to XML processing if they did not
    have to write the plumbing (orchestration,
    exception handling) themselves
  • Gee, this problem is complex. Maybe Ill do it
    in multiple stages! Gee, now I have to
    orchestrate the stages somehow. Batch files/shell
    scripts/driver program all ugly and error
    prone. Maybe Ill just write a single program
    after all

19
XPipe philosophy
  • Professional developers spend 50 percent of
    their time writing plumbing Adam Bosworth
  • XPipe aims to look after the plumbing letting
    developers concentrate on the interesting stuff

20
Major Functional Elements XComponents
  • Developed in any language that runs on the Java
    Virtual Machine (Jython, Java, XSLT, Rhino
    (JavaScript) etc.)
  • All XComponents are standalone programs of the
    form
  • Name InputXML OutputXML ErrorXML

21
Major Functional Elements - XComponents
  • XComponents described in XML form. An Xcomponent
    consists of
  • Documentation
  • Unit Tests (input,output XML stream pairs)
  • Metadata for retrieval
  • Input and Output predicates declarative
    (DTD/RelaxNG/Schema) or procedural (code)

22
Major Functional Elements XComponent Unit Tester
  • Standalone program analogous to JUnit or PyUnit
    but for XML transformation component testing
  • Very outsource-friendly and inbetweenable
    approach (specify everything but the code
    specdoctest harness all in one)

23
Major Functional Elements XPipes
  • Described in XML
  • Consist of
  • Documentation
  • Input/Output Predicates (Schemas/Code)
  • Test Suite
  • References to XComponents which are resolved when
    the XPipe is installed

24
Major Functional Elements XPipe Executive
  • Uniprocessor
  • XPipe executed on 1 machine, possibly with
    separate threads for each XComponent task
  • Multiprocessor
  • XML based protocol to implement Job Shop work
    distribution over a P2P network

25
Major Functional Elements XPipe Monitor
26
Some related open technologies
  • - Unix Pipes
  • SAX Filters
  • TRAX
  • XBeans
  • Cocoon
  • axKit
  • JXTA
  • Translets
  • TupleSpaces

27
Simple XComponent examples
  • Fundamental Operation Rename Element
  • Rename
  • Input ltfoogtbazlt/foogt
  • Output ltbargtbazlt/bargt

foo
bar
baz
baz
28
Simple XComponent examples
  • Fundamental Operation - Peel
  • Input ltfoogtltbargtbazlt/bargtlt/foogt
  • Output ltfoogtbazlt/foogt

foo
foo
bar
baz
baz
29
Simple XComponent examples
  • Compound Operation - Matryoshka
  • Input
  • ltfoogtltbargtbazlt/bargtlt/foogt
  • Output
  • ltfoogtlt/foogtltbargtlt/bargtbaz

foo
bar
foo
bar
baz
baz
30
Simple Xcomponent examples
  • KlingonCloak
  • Input
  • ltfoogtltbargtbazlt/bargtlt/foogt
  • Output
  • lttag namefoogtlttag namebargtbazlt/taggtlt/taggt

foo
tag typefoo
bar
tag typebar
baz
baz
31
Sample Xcomponents
  • Once you start thinking in terms of Pipes
    components appear everywhere
  • Regular fragmentations
  • Doctype changer
  • namespace normalizer
  • Character set transcoder
  • Hash generator
  • RelaxNG/Schematron etc
  • A validator can be thought of as a component in
    an Xpipe that mirrors its input on its output

32
Validation as an XComponent
XML A
XML A
RelaxNG Schematron Jython/Java/JACL XComponent
Input
Output
Validation Log
Error
33
The XGrid
  • Grid Technologies computational power on tap
    (http//www.gridforum.org)
  • The XGrid computational power on tap to
    execute XPipes

34
The XGrid
35
Some objections (with some answers)
  • It will be slow
  • No it wont - Premature optimization is the root
    of all evil!
  • Speed is a three headed monster. Im old enough
    to have left the X axis and currently heading for
    Y through Z

The 3 Axes to Speed
36
Some objections (with some answers)
  • It will be slow (cont.)
  • Massive Parallelism will kill all von Neumann
    throughput arguments
  • Documents per second, not seconds per document
  • A myriad of compile time optimizations on
    XPipes possible
  • Keep the architecture simple and speed will
    sort itself out

37
Some objections (with some answers)
  • Pipes are not rich enough, real data flows
    require graphs
  • Inside every graph is a collection of straight
    segments
  • Do the smallest thing than can possible work
  • XComponents can conditionally flow data in
    different directions graph

38
Some objections (with some answers)
  • Component based software? Harumph! We have heard
    that one before
  • XPipe is data flow based not API based (COM, VBX,
    CORBA). They payload is what is important not
    the plumbing
  • Information integration (needed on the server
    side) not application integration (needed on the
    client side)

39
Current Status
  • Schemas for XPipes and XComponents on
    xpipe.sourceforge.net. feedback required
  • Sample components (Java/XSLT/Jython) and some
    documentation
  • Simple, illustrative XPipe uniprocessor
    executives
  • Draft of XJCL XGrid Job Control Language

40
Current Status
  • Uniprocessor XPipe used to develop
  • 80-C pipe from Hub notation for a complex
    document type to a legacy mainframe display
    notation. 120 page spec.
  • 20-C pipe for semantic validation of legislation
    documents
  • Xpipe and XComponent validators

41
Current Problems
  • Everybody agrees that an XML document is a tree
    but
  • The content and structure of the tree depends on
    the parser
  • The content and structure of re-generated XML
    (The round-tripping problem)

42
Current Problems
  • Naming things
  • Taxonomy of XTLs (XML Transformation Languages)
  • Taxonomy of re-usable XComponents and XPipes

43
Current Problems
  • Flexible transformation scheduling is hard
  • Optimal transformation scheduling is very hard
  • Packaging

44
Future Plans
  • Evangelize the idea that DTD validated XML 1.0 is
    just Well Formed XML that has been through a pipe
    consisting of
  • A transclusion component (entity expansion)
  • A macro pre-processor (conditional marked
    sections)
  • An attribute decorator (implied/fixed attributes)
  • A grammar checker

45
Valid XML
Well Formed XML
Paremeter Entity Expansion
Conditional Sections
General Entity Expansion
Attribute Decoration
Grammer Validation
Valid XML
46
Future Plans
  • XPipes and XComponents as web services
    (SOAP/XML-RPC, UDDI etc.)
  • Getting the P2P and Grid Technology communities
    input into XGrid.
  • Getting help to develop the XPipe reference
    implementation on Sourceforge

47
Future Plans
  • Development of commercial implementations of
    XPipe integrated with leading EAI systems
    (Ongoing)
  • Use of SCADA tools to develop XPipe process
    control and monitoring systems

48
Future Plans
  • Use of Animation Engineering techniques for CAXTE
    tools (Computer Aided XML Transformation
    Engineering)
  • Digging around hierarchy theory, self-assembly,
    bio-informatics and nanofabrication for concepts
    and tools applicable to XML transformations

49
In conclusion
  • XPipe is simple
  • Simplicity works!
  • Plenty of evidence outside of XML engineering
    that this approach will work
  • Plenty of lore and tools from other fields of
    science can be brought to bear to build systems
    using the XPipe approach

50
Thank you
  • http//xpipe.sourceforge.net
Write a Comment
User Comments (0)
About PowerShow.com