Title: Towards software components
1C O M P U T A T I O
N A L R E S E A R
C H D I V I S I O N
Towards software components for efficient and
easy communication and data integration in P2P
networks W. Hoschek Jan. 21, 2005
2Overview
- Communication and data integration in P2P
networks is difficult - Most APIs for network I/O either do not scale
well, or are hard to use - XML is great but its APIs are harder to use than
expected (generality) - Large overheads related to XML serialization and
deserialization - It need not be that way
3Overview
- Seek to enable
- robust and powerful commodity XML tool chains
- while retaining good messaging performance
- Components for
- asynchronous non-blocking network I/O
- binary encoding of XML
- XQuery/XPath manipulations of messages
- Trade-offs formed by
- performance, usability, flexibility, expressive
power - Some preliminary performance results
4Synchronous blocking I/O
- Requires one thread per connection
- Many threads scheduling inefficiencies
- Concurrency and synchronization issues subtle and
non-intuitive, hard to find debug - Serious degradation under overload
5Async non-blocking I/O
- Stages, Queues, Events, Event Handlers, Threads
- One (or few) threads for N connections
- Event driven design rather than OO call interface
- Few concurrency bugs since (mostly) single
threaded - Can avoid overload via explicit queue
shaping/priorities
6Implementation
- SEA toolkit
- A layer on top of Java NIO
- TCP UDP
- Overhead of toolkit
- (preliminary, not measuring network)
- 30000 msg/s (tiny msg size)
- 200 MB/s (large msg size)
- Documentation
- http//dsd.lbl.gov/sea
7Easy API (Hello World Server)
agent new NetAgent() myStage new
StageManager().createStage().start() agent.addLis
tenPort(myStage, 9000) agent.start() onAccept
ed(rsp) rsp.getAgent().enqueue( new
ChannelRequest.WriteData(rsp.getKey().channel(),
hello world))
8XML Serialization Deserialization Overheads
- XML is complex and very general
- Standards compliant XML handling --gt
inefficiencies - Serialization
- 4-5 MB/s (standard textual XML)
- 15-50 MB/s (bnux binary XML)
- Deserialization (parsing)
- 2-11 MB/s (standard textual XML)
- 30-101 MB/s (bnux binary XML)
- Data compression factor
- 1.2 - 4
- Guarantees well-formed XML, preserving W3C XML
Infoset and W3C Canonical XML(!)
9Binary XML Applications
- Tightly coupled high-performance systems
- exchanging large volumes of networked XML data
- Compact main memory caches
- Short-term storage as BLOBs
- in backend databases or files
- e.g. "session" data with limited duration
- Not a standard - thus
- not intended as a replacement for standard
textual XML in loosely coupled systems where
maximum long-term interoperability is the
overarching concern - not intended for long-term data storage
10XML Serialization Deserialization Overheads
- BNUX Binary XML
- Eliminate tag redundancy via tokenization and
string pooling - Eliminate DTD and XML Schema checking
- Efficient buffer (re)use
- Fast Unicode conversions
- Careful implementation (profiling)
- Simplicity is key to performance
- Guarantees well-formed XML, preserving W3C XML
Infoset and W3C Canonical XML(!)
11Easy BNUX API
Document doc new Builder().build("/tmp/test.xml"
) // write binary XML document to file
BinaryXMLCodec codec new BinaryXMLCodec()
byte data codec.serialize(doc, 0)new
FileOutputStream("/tmp/test.xml.bnux").write(data)
// read binary XML document from filebyte
data XOMUtil.toByteArray( new
FileInputStream("/tmp/test.xml.bnux")) Document
doc codec.deserialize(data) System.out.println
(doc)
12Easy yet powerful XML XQuery XPath
- Manipulating and querying XML data
- Manual SAX/DOM cumbersome at best
- XSLT often too complicated
- Most APIs have steep learning curve, contain
quite a few bugs - XQuery XPath are powerful yet simple
- Essentially SQL for XML data
Document doc // retrieve timeout of a given
transaction// from XML protocol message
timeout XQueryUtil.xquery(doc,
"/opentransactionID123/scope/timeout)
13XQueries can be powerful
- List books published by Addison-Wesley after
1991, including their year and title
ltbibgt for b in doc("http//bstore1.example.
com/bib.xml")/bib/book where b/publisher
"Addison-Wesley" and b/_at_year gt 1991 return
ltbook year" b/_at_year "gt
b/title lt/bookgt lt/bibgt
14Implementation
- Leveraging existing software
- standards compliance, efficiency, maturity
- Designed straightforward API (Nux)
- Internally glues Saxon XQuery engine to XOM
library - Tricky internals (!)
- Preliminary performance for simple queries
- 2000 (100000) queries/sec over 100 (0.5) KB input
documents 200 (50) MB/s - served from memory, commodity PC 2004, Java 1.5
- Example ballpark figures
- use cases, documents and query complexity can
vary wildly - Documentation at http//dsd.lbl.gov/nux
15Putting it all together
- Components
- Async non-blocking I/O
- Binary XML encoding
- XQuery and Xpath
- Enable use of
- robust and powerful commodity XML tool chains
- while retaining good performance,
- sweet spot in the space of trade-offs formed by
performance, usability, flexibility and
expressive power
16Future Work
- Integration with firefish, scishare, P2P routing
and maintainance strategies, etc. as outlined in
LDRD - More detailed performance studies
- End-to-end rather than isolated studies