Title: OneSAF: XML Performance in Simulation 03SSIW030
1OneSAF XML Performance in Simulation03S-SIW-030
- Boaventura (Ben) DaCosta
- Robin Outar
2Agenda
- OneSAF Overview
- XML Overview
- Document Object Model (DOM) Overview
- Simple API for XML (SAX) Overview
- XML Performance Measures
- XML Parsing
- Validation
- Namespaces
- Performance Benchmarks
- XML Encoding
- OneSAF XML Performance Tips
- Future Efforts
3OneSAF Overview
A composable, next generation Computer-Generated
Forces (CGF) that can represent a full range of
operations, systems, and control process (TTP)
from entity up to brigade level, with variable
level of fidelity that supports multiple Army
Modeling and Simulation (MS) domains (ACR, RDA,
TEMO) applications
Software only
Automated Composable Extensible Interoperable
Platform Independent
Fielded to National Guard Armories RDECs /
Battle Labs Reserve Training Centers All Active
Duty Brigades and Battalions
Designed to eventually replace legacy entity
based Simulations BBS - ModSAF - JANUS - CCTT
SAF AVCATT SAF
4eXtensible Markup Language Overview
XML is a meta-language, not a programming language
XML is a family of technologies which includes
XML, XML Schema, XSL/T, XPATH
XML helps define rules (grammar) for designing
structured data
XML is a language for describing languages
XML was created by the World Wide Web Consortium
(W3C) and released in 1998
Think of structured data as spreadsheets, address
books, configuration files, etc.
5eXtensible Markup Language Overview
lttablegt lttrgtlttdgtEMPNOlt/tdgtlttdgtEMPNAMElt/tdgtlt/tr
gt lttrgtlttdgt123456lt/tdgtlttdgtJohn Adamslt/tdgtlt/trgt
lt/tablegt
XML looks like HTML, but it isnt
HTML specifies what each tag means and how the
text will be formated
HTML
HTML tags elements for presentation
XML uses tags to delimited data and what the tags
mean is left to the client
lt?xml version1.0?gt ltEMPLOYEEgt ltEMPNOgt123456
lt/EMPNOgt ltEMPNAMEgtJohn Adamslt/EMPNAMEgt lt/EMPLOY
EEgt
XML tags elements as data
XML is a vehicle for sharing and interchanging
structured data
XML
6Document Object Model (DOM) Overview
Documents are typically logically structured in
memory as a hierarchical tree
- A DOM is a platform-independent and
language-neutral API, which allows applications
to dynamically - read, (2) manipulate, and (3) write
- the content, structure, and style of both HTML
and XML documents
7Simple API for XML (SAX) Overview
SAX is considered by many as the de facto
standard
SAX supports only reading
StartDocument() startElement(X)
startElement(Y1)
startElement(Z)
characters(foo")
endElement(Z) endElement(Y1)
endElement(X) endDocument()
ltXgt ltY1gt ltZgt foo
lt/Zgt lt/Y1gt lt/Xgt
ltXgt ltY1gt ltZgt foo lt/Zgt lt/Y1gt lt/Xgt
XML Document
SAX Callbacks
SAX is supported in a number of
lanugages Microsoft's MSXML 3.0, Pascal, SAX in
C, Xerces-C, and others
SAX is faster than DOM
8XML Performance Measures
- OneSAF XML Performance has been measured in two
ways - The amount of memory and execution speed in the
parsing and/or translation of XML documents - A number of factors may affect these measures
- Physical memory constraints (RAM)
- Parsers used
- Validation of XML documents
- Use of namespaces
- XML encoding
9XML Parsing
- XML documents must be parsed in order to access
the data stored in them - Parsing and validation of XML data on OneSAF is
currently being achieved using Xerces - Xerces is an XML parser that complies with XML
Schema and provides support for XML validation
and eXtensible Stylesheet Language
Transformations (XSLT) - Biggest problem with using DOM is memory
limitations - A one megabyte XML document can use as much as
ten megabytes of RAM - Biggest problem with using SAX is its read only
- OneSAF uses both a combination of DOM and SAX
10XML Validation
- Validation ensures that data content conforms to
the grammar and structure as defined by the DTD
or XML Schema that it references - Validation is an important part of the OneSAF
data architecture in that it provides a level of
confirmation and verification that the data
stored in any one XML document conforms to the
grammar and structure, which defines it - OneSAF currently employs the use of Xerces 1.4.3
(with current efforts moving towards Xerces 2) to
validate all XML content both against XML Schema
and DTDs. - Use W3C XML Schema Recommendation 1.0
- DTDs only supported from Legacy Systems
11XML Namespaces
- Namespaces allow documents to use multiple markup
vocabularies from external sources through URI
references - Namespaces promote reuse of markup instead of
re-inventing it
12XML Validation and Namespace Performance
- OneSAF has examined XML Performance using both
version 1.4.3 and version 2 of the Xerces parser - Performance benchmarking and testing was
performed under the following environment - XML Document Specifications File size 1.95 MB
(2,055,937 bytes) with Element count 40735 - Software
- JProbe 4.0.2 (Benchmarking Software)
- Junit Testing Framework (Unit Testing Software)
- Xerces 1.4.3 and Xerces 2.0 (XML Parsing API)
- JDK 1.4.1 (JVM version)
- OneSAF Software DOMReader and TestDOMReader
class - Operating System Microsoft Windows 2000 with
Service Pack 3 - Hardware Pentium III 1.0 GHZ with 512 RAM and 60
GB HDD
13XML Validation and Namespace Performance
- OneSAF Xerces Benchmarking Results are summarized
here
14XML Validation and Namespace Overall Performance
- Xerces 1.4.3
- Overall, turning on namespace and validation
support resulted in increased memory consumption
of approximately 44.5 and processing that is
approximately 1.8 times slower than if both had
been turned off - Xerces 2
- Overall, turning on namespace and validation
support resulted in increased memory consumption
of approximately 1 and processing that is
approximately 2.2 times slower than if both had
been turned off
15XML Validation and Namespace Overall Performance
- The performance increase between the use of
Xerces 1.4.3 and 2 is significant - When validation and namespace are both turned on,
even though documents took slightly longer to
parse, the memory consumed during parsing was
less than half - This has allowed OneSAF to double the size of XML
documents it was able to originally successfully
parse in DOM using Xerces 1.4.3
16XML Encoding
- The mostly used encodings are USASCII
(US-ASCII) and Unicode (UTF-8 and UTF-16) - W3C requires that all processors automatically
support UTF-8 and UTF-16 - US-ASCII is guaranteed to be a single byte and
map directory to the equivalent Unicode value --
FAST - UTF-8 and UTF-16 results in multiple byte
sequences being read and converted for each
character -- SLOW - Use US-ASCII if characters DO NOT go beyond the
ASCII range Otherwise use UTF-8 or UTF-16
17XML Encoding
- US-ASCII is guaranteed to be a single byte and
map directory to the equivalent Unicode value --
FAST - UTF-8 and UTF-16 results in multiple byte
sequences being read and converted for each
character -- SLOW - OneSAF testing and research resulting in UTF-8
being the best encoding
18XML General Performance Tips
- Understand your data
- Forecast what a typical fielded-XML document size
might be - Examine the worst-case scenario
- Determine the best XML technologies for the short
and long term - For example, documents may be small during
development and DOM may suffice as a solution,
but will these documents grow once fielded? Will
DOM still be a good solutions then? Is this
solution scalable? - Dont use XML where it doesnt make sense
- Avoid using XML when there is no existing or
future purpose - Doing a lot of translation and/or parsing may
result in bad performance - Manipulating XML requires CPU resources, memory
usage, and may be network intensive
19XML General Performance Tips
- More than one XML solution may be needed
- A number of XML technologies may be necessary to
accomplish a task - For example, both DOM and SAX may have to be used
- Dont limit an implementation to only one XML
technology - Examine hardware requirements
- Depending on the implementation, XML can be CPU,
memory, and network intensive - Make certain development and fielded hardware can
support XML use - If memory is limited, avoid DOM, consider SAX
20XML Parsing Performance Tips
- Keep XML documents small
- The bigger the documents, the higher the
parsing/translation costs and the worse the
performance - If documents are too large, consider logically
breaking them up into smaller XML documents - Reduce the character count
- Replace elements with attributes where it makes
sense - Avoid excessive use of spaces because parsers
must scan through it - Use tabs in place of spaces if possible
- Avoid lengthy element and attribute names
- ltsystemRepositoryServiceIndexFileRootElementTaggt
? ltindexgt
21XML Parsing Performance Tips
- Explicit or Meta-model
- Look at whether an explicit or meta-model
approach should be used - Meta-model approaches simplify schemas, but
increase grammar needed in content documents
which may also inhibit validation - Consider redesigning the XML grammar and
structure in order to decrease the amount of
elements and attributes - Avoid default values in attributes
- Too many simply slows down processing
- Avoid external entities and DTDs
- Doing so causes overhead
22XML Parsing Performance Tips
- Reuse parser instances whenever possible
- Dont create a new parser each time you need one
- Create a pool of reusable parser instances
(especially if in a multi-threaded environment
and multiple parsers need to be run at once) - Turn validation off when not needed
- Validation is expensive
- Only validate when you have to
- If using DTDs, avoid using DOCTYPE in XML
documents. Some parsers will read the DTD if
DOCTYPE is specified even if validation is turned
off
23XML Parsing Performance Tips
- Check parser configuration carefully
- Parsers may perform differently depending on
whether DTDs or XML Schema is used - Check for the recommended parser configuration
(if there is one) - Check for default features being turned on
- Only use what you need
24XML Parsing Performance Tips
- Use the appropriate encoding
- The three most common encoding schemes are ASCI
("US-ASCII"), or Unicode ("UTF-8" or "UTF-16"). - The W3C XML 1.0 Recommendation requires parsers
to assume UTF-8 if no encoding is specified. - US-ASCII is the fastest to parse because each
character is guaranteed to be a single byte and
map directly to their equivalent Unicode value. - Documents needing Unicode characters beyond the
ASCII range must use either "UTF-8" or "UTF-16". - Multiple byte sequences must be read and
converted for each character resulting in a
performance hit.
25Summary
- OneSAF is still learning from the use of XML
-
- XML is a maturing technology
- Today it has become more than just a document
markup language, but a viable vehicle in which to
share and interchange structured data - As the popularity of XML grows and becomes more
widespread, better solutions will become
available addressing the performance concerns
being tackled today -
26Contact Information
- Boaventura (Ben) DaCosta
- Dynamics Research Corporation
- 407-380-1200
- bdacosta_at_drc.com
-
- Robin Outar
- Science Applications International Corporation
- 321-235-7660
- routar_at_ideorlando.org