Title: Tycho: A Resource Discovery and Messaging Framework for Distributed Applications
1Tycho A Resource Discovery and Messaging
Framework for Distributed Applications
Matthew Grove m.grove_at_reading.ac.ukViva
Presentation, November 2006
2Outline
- Research Goals,
- An Overview Of Tycho,
- Comparative Benchmarks,
- Applications of Tycho,
- Tycho Swarm, a Distribution File Utility -
(Demo), - Summary.
3Some Background
- Two key services for distributed systems are a
mechanism for discovering remote components (such
as a registry) and then sending messages between
these components - These two services are interdependent.
- Current solutions require the application
scientists to assemble their systems from a
diverse range of services. - One approach has been to produce toolkits which
have pre-selected sets of service bundled
together, for example Globus.
4Research Goals
- The thesis of this research work is that by
combining registry and messaging into a single
software framework, the task of binding together
distributed systems can be simplified. - The proposed solution uses an Internet-based
architecture that keeps complexity at the edges
of a robust and secure set of core services - a
novel approach! - This framework facilitates extensibility while
limiting the installation and management costs of
using the software. - The design and development of the framework -
known as Tycho - has an overarching goal of
reducing the complexity of developing distributed
applications.
5High-level Requirements
- These are the desirable features for Tycho - as
argued in the dissertation - Scalability, be able to cope with the sizes
typical of modern distributed systems, - High-performance,
- Extensibility, be able to add new features and
interoperate with other systems, - Security out of the box,
- Manageability, ease of installation and use
- For example minimizing elememnts like software
dependencies, firewall requirements and the
amount of configuration needed to deploy Tycho.
6The Tycho Implementation
- Tycho is the reference implementation of the
framework developed during the PhD - The Tycho components are
- Mediators,
- Clients (Producers and Consumers),
- Utilities
- The Tycho mediator provides services that allow
clients to discover each other using a Virtual
Registry (VR) made up of a network of mediators
this also aids communication over both LAN and
WAN. - Utilities are extensions to Tychos
functionality. - Tycho used to be called javaGMA or jGMA (poor
choice of name!)
7Tychos Architecture
8General Design Philosophy
- Reuse existing software components, if possible,
rather than reinvent existing services or
functionality. - Try to make use of existing software
infrastructure. - Ensure that Tycho is simple to install, configure
and use. - Provide a basic release with the ability to
extend functionality with a further more
sophisticated component - Tycho utilities. - Because we require portability and
interoperability with other distributed systems,
Java was a good choice of implementation language.
9Tycho Mediator Implementation
- Tycho provides a choice of implementations for
each core service. - Tychos design described in a paper for a
"Work-in-Progress Novel Grid Technologies" track
of the IEEE International Conference Cluster
Computing and Grid 2005 (CCGrid 2005).
10Tycho Clients Utilities
- The Tycho Connector provides the API for building
producers and consumers. - Extra functionality can be added as utilities.
11An Example of Tychos Setup
12Tycho Benchmarks
- Three rounds of benchmarking to measure the
performance of Tycho compared to state-of-the-art
and widely used systems - Communications - measured the performance of
inter-client and inter-mediator messaging for
Tycho and NaradaBrokering. - Virtual Registry tests - measured and compared
the performance of the Tycho VR to Globus MDS4
and gLite R-GMA. - Component Tests - different components of the VR
were tested in various configurations. - Results presented in a paper in proceedings of
the IEEE International Conference on Cluster
Computing 2006 (Cluster 2006).
13Sample VR Benchmark Results
MDS4 out of memory
14Benchmarks Results Summary
- Tycho has a better performance and
client-scalability than both R-GMA, MDS4 and
NaradaBrokering. - R-GMA, MDS4 and NaradaBrokering all crashed
during testing when they exceeded the maximum
memory available for the tests (1.5 Gbytes). - Memory management in Java systems is an issue
- Without limited buffering or flow control,
consuming the Java heap is a problem. - Storing information internally using XML seems to
be a source for some of these memory problems - Java database solutions such as HSQDLB can
provide a high-performance solution for
off-loading some of the storage requirements to
disk.
15Tycho Core Future Work
- Some more performance improvements
- Caching of local mediator queries to reduce
response times, - Use of a hybrid VR-interconnect to use IRC for
query routing and HTTP for transporting large
responses. - Additional functionality can be added to provide
advanced services - WS-based transport handlers for interoperability.
16Tycho Applications
- We developed a number of applications to further
validate the implementation. - These include
- Demonstrations of publishing and discovering
distributed webcams, - Remote resource discovery for the VOTechBroker
project - Part of the European Virtual Observatory project,
Tycho provides automatic resource discovery for
job submission. - Binding components together for the Semantic Log
Analyser (Slogger) project - Here Tycho helps locate and gather distributed
logs for analysis.
17Content Distribution With Tycho
- We wanted to develop a Tycho utility that would
demonstrate and validate the utility concept - We wanted to create something useful!
- We created a content distribution system call the
Tycho swarm utility. - The swarm utility provides content distribution
similar to BitTorrent and overcomes the common 2
Gigabyte file size problem. - Content is split into chunks and the VR is used
to store chunk availability. - Peers use the VR to locate each other and decide
what chunks to download. - Tycho messages are used to transfer the chunks
between peers and peers cooperate to distribute
the content throughout the swarm.
18Swarm Utility Architecture
19Swarm Utility Summary.
- The utility was developed to test the potential
of Tycho utilities and also further stress test
the overall infrastructure - By simultaneously utilising the VR and messaging
functionality, - Storing and updating thousands of entry records
in the VR, - Sending thousands of multi-megabyte messages
between clients. - Its potential uses include
- Distributing files for collaboration purposes,
- Staging data for computation,
- Mirroring and managing large data sets.
20Swarm Utility Demo
21Summary
- The reference implementation of Tycho has been
completed. - Tycho has been released under the LGPL Open
Source license - http//acet.rdg.ac.uk/projects/tycho/
- The focus now is on developing Tycho utilities to
provide more feature rich functionally. - This work has been summarised in a paper accepted
for a special issue of The Journal of
Supercomputing.
22Research Goals
- Scalability and high-performance have been
demonstrated by the benchmarking. - Extensibility has been shown with the development
of the swarm utility and the different services
and protocols supported by Tycho. - Tycho has security out of the box, using HTTPS
and passwords or certificates for wide-area
access control and encryption - no comparable
system we reviewed has this currently. - Manageability has been maximised, Tycho requires
one firewall port, has no external dependencies
other than a JVM and can run with zero
configuration.
23Some Experiences / Observations
- Java developers should think carefully about how
memory is used in their applications. - Systems which store their data internally as XML
will probably have relatively poor performance
and require large amounts of memory and resources
to work. - If you use a servlet container, Jetty offers much
better performance than Apache Tomcat. - Instead of using a separate database, consider
the Java-based HSQLDB, we have shown it can
achieve excellent performance and it removes an
external dependency from your software. - Java is not a magic bullet for portability,
systems such as R-GMA are evidence of this.
24Links
- Project Web page
- http//acet.rdg.ac.uk/projects/tycho/
- The DSG Web page
- http//dsg.port.ac.uk/
- The ACET Web page
- http//acet.port.ac.uk/