Title: Streaming Processing of Large XML Data
1Streaming Processing of Large XML Data
- Jana Dvoráková, Filip Zavoral
- processing of large XML data using XSLT with
optimal memory complexity - formal model / implementation framework
- analyzer, SSXT / BUXT transformer
2SSXT - streaming transducer
- Simple Streaming Xml Transducer
- no backward axis, no predicates, no variables
- order-preserving
- branch-disjoint
- ? stack / document depth
- BUXT - Buffering Transducer
3Xord framework - Analyzer
Analyzer XSLT XSD virtually applies templates
to schema all possible node sequences are
processed regexp all possible node sequences
selected by XPath expressions possible reading
orders of the elements names sequence of element
names in the order they are called represents the
processing order of the elements
4SSXT Transformer
- Polymorphic stack
- two types of transformation states - DFA CC
- related to current document level
- sequence of deterministic finite automata states
- concurrent evaluation of XPath expressions
- single DFA for each expression
- start-tag ? DFA transition
- final state ? template call
- cycle configuration
- template and template call being processed
5Evaluation Comparison
Memory consumption (MB) of SSXT algorithm and
tree-based XSLT processors for input XML data of
different size DBLP.xml 700 MB
6Future work
- Future work
- buffering transformer optimizations and
evaluation - multipass streaming algorithms
- overcoming some restrictions to XSLT constructs