XML Metadata Services - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

XML Metadata Services

Description:

GNU/Linux (kernel release 2.6.16) 2GB. Dual Core AMD Opteron(tm) Processor 270 ... Sun-Fire-880, sun4u sparc SUNW. Indianapolis, IN, USA. complexity.ucs.indiana.edu ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 36
Provided by: gridsUcs
Category:
Tags: xml | metadata | services

less

Transcript and Presenter's Notes

Title: XML Metadata Services


1
XML Metadata Services
  • SKG06 http//www.culturegrid.net/SKG2006/
  • Guilin China November 3 2006
  • Mehmet S. Aktas, Sangyoon Oh, Geoffrey C. Fox
    and Marlon Pierce
  • Presented by Geoffrey Fox Computer Science,
    Informatics, Physics
  • Pervasive Technology Laboratories
  • Indiana University Bloomington IN 47401
  • gcf_at_indiana.edu
  • http//www.infomall.org

2
Different Metadata Systems
  • There are many WS- specifications addressing
    meta-data defined broadly
  • WS-MetadataExchange
  • WS-RF
  • UDDI
  • WS-ManagementCatalog
  • And many different implementations from
    (extended) UDDI through MCAT of the Storage
    Research Broker
  • And of course representations including RDF and
    OWL
  • Further there is system metadata (such as UDDI
    for core services) and metadata catalogs for each
    application domain such as WFS (Web Feature
    Service) for GIS (Geographical Information
    Systems)
  • They have different scope and different QoS
    trade-offs
  • e.g. Distributed Hash Tables (Chord) to achieve
    scalability in large scale networks
  • WS-Context
  • ASAP
  • WBEM
  • WS-GAF

3
Different Trade-offs
  • It has never been clear how a poor lonely service
    is meant to know where to look up meta-data and
    if it is meant to be thought up as a database
    (UDDI, WS-Context) or as the contents of a
    message (WS-RF, WS-MetadataExchange)
  • We identified two very distinct QoS tradeoffs
  • 1) Large scale relatively static metadata as in
    (UDDI) catalog of all the worlds services
  • 2) Small scale highly dynamic metadata as in
    dynamic workflows for sensor integration and
    collaboration
  • Fault-tolerance and ability to support dynamic
    changes with few millisecond delay
  • But only a modest number of involved services (up
    to 1000s in a session)
  • Need Session NOT Service/Resource meta-data so
    dont use WS-RF

4
Hybrid WS-Context ServiceArchitecture and
Prototype
5
WS-Context compliant XML Metadata Services
  • We designed and built a WS-Context compliant XML
    Metadata services supporting distributed or
    central paradigms. This service,
  • supports extensive metadata requirements of rich
    interacting systems, such as
  • correlating activities of widely distributed
    services, EX workflow style GIS Service
    Oriented Architectures, AND
  • optimizing Grid/Web Service messaging
    performance, EX mobile computing environment,
    AND
  • managing dynamic events especially in multimedia
    collaboration, EX collaboration Grid/Web service
    applications, AND
  • providing information to enable session failure
    recovery capabilities.

6
Context as Service Metadata
  • We define all metadata (static, semi-static,
    dynamic) relevant to a service as Context.
  • Context can be associated to a single service, a
    session (service activity) or both.
  • Context can be independent of any interaction
  • slowly varying, quasi-static context
  • Ex type or endpoint of a service, less likely to
    change
  • Context can be generated as result of service
    interactions
  • dynamic, highly updated context
  • information associated to an activity or session
  • Ex session-id, URI of the coordinator of a
    workflow session

7
Hybrid XML Metadata Services gt WS-Context
extended UDDI
  • We combine functionalities of these two services
    WS-Context AND extendedUDDI in one hybrid service
    to manage Context (service metadata).
  • WS-Context controlling a workflow
  • (Extended) UDDI supporting semantic service
    discovery
  • This approach enables uniform query capabilities
    on service metadata catalog.
  • http//www.opengrids.org/wscontext/index.html

8
Distributed Hybrid WS-Context XML Metadata
Services
Note that all Replica Servers are identical in
their capabilities. This figure illustrates the
system from the perspective of one Replica
Server.
HTTP(S)
HTTP
Subscriber
Publisher
Replica Server-1
Replica Server-2
Replica Server-N
9
Key Features
  • Publish-Subscribe exploited to support replicated
    storage e.g.
  • Initial storage of context
  • Update to make copies consistent
  • Access context
  • Use of Javaspaces cache running in memory on each
    WS-Context node
  • Naturally supports Get Context by name requests
  • Backed up every 30 milliseconds to a MySQL
    database
  • If query can be satisfied by Javaspaces cache,
    the query can be satisfied in lt 1ms plus the few
    milliseconds of Web service overhead

10
TupleSpaces-Based Caching Strategies
  • TupleSpaces is a communication paradigm
  • asynchronous communication
  • pioneered by David Gelernter
  • first described in Linda project in 1982 at Yale
  • communication units are tuples
  • data-structure consisting of one or more typed
    fields
  • Hybrid WS-Context Service employs/extends
    TupleSpaces
  • all memory accesses. overhead is negligible (less
    than 1msec. for inqueries)
  • data sharing - mutual exclusive access to tuples
  • associative lookup - content based search,
    appropriate for key-based caching
  • temporal, spatial uncoupling of communicating
    parties
  • e.g. a tuple ("context_id", Context). This
    indicates a tuple with two fields a) a string,
    "context_id" and b) a Java object, "Context".
  • back-up with frequent time intervals for
    fault-tolerance

11
Managing Context UDDI WS-Context
purpose standard way of publishing, discovering generic Web Service information standard way of maintaining distributed session state information
metadata characteristics interaction-independent, rarely-changing, small-size interaction-dependent, highly dynamic, small-size
types of typical queries high degree of complexity in inquiry arguments to improve the selectivity and increase the precision in the search results simplicity in inquiry arguments, mostly key-based retrieval queries, selectivity of queries is one.
scalability Whole Grid, UDDI is a domain-independent service for generic service metadata Sub-Grids, modest number interacting Web Services participating an activity
desired features better expressiveness power of service metadata (e.g., RDF-enabled UDDI Registries), up-to-date service entries (e.g., leasing capable UDDI Registries), domain-specific capabilities (e.g., geospatial query capabilities), persistent storage notification (members of an activity should be notified of the distributed state information), synchronous callback (loose-coupling of services), high performance, light-weight storage
12
A general performance evaluation on the most
recent implementation of the Hybrid WS-Context
Service
13
Prototype Evaluation - I
  • Performance Experiment We investigate the
    practical usefulness of the system by exploring
    following research questions.
  • What is the baseline performance of the hybrid
    WS-Context Service implementation for given
    standard operations?
  • What is the effect of the network latency on the
    baseline performance of the system?
  • How does the performance compare with previous
    metadata management solutions?

14
PERFORMANCE TEST
15
TESTBED Cluster node configuration TESTBED Cluster node configuration
Processor Intel Xeon CPU (2.40GHz)
RAM 2GB total
Network Bandwidth 900 Mbits/sec.1 (among the cluster nodes)
OS GNU/Linux (kernel release 2.4.22)
Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19)
SOAP Engine Axis 2 (in Tomcat 5.5.8)
Metadata Services Avg. latency for inquiries
hybrid WS-Context 8.41 ms
extended UDDI 17.5 ms
JUDDI 40 ms
UDDI-MT 20.37 ms
JWSD 18.99 ms

Test 2-Test 1 is Javaspaces overhead
The experimental study indicates that the
proposed system can provide comparable
performance for standard operations with the
existing metadata management services.
16
Prototype Evaluation - II
  • Scalability Experiment We investigate the
    scalability of the system by finding answers to
    the following research questions.
  • What is the performance degradation of the
    system for standard operations under increasing
    message sizes?
  • What is the performance degradation of the
    system for standard operations under increasing
    message rates?
  • What is the scalability gain (both in numbers
    and in performance) of moving from a centralized
    system to a distributed system under the same
    workload?

17
SCALABILITY TEST-1
1 user/100 transactions
single threaded
WSDL
WS-Context Client
TEST-1 - Hybrid-WSContext inquiry/publication
with increasing message sizes
TEST-2 - Hybrid-WSContext inquiry/publication
with increasing message rates ( of messages per
second)
18
TESTBED Cluster node configuration for hybrid WS-Context tests TESTBED Cluster node configuration for hybrid WS-Context tests
Processor Intel Xeon CPU (2.40GHz)
RAM 2GB total
Network Bandwidth 900 Mbits/sec.1 (among the cluster nodes)
OS GNU/Linux (kernel release 2.4.22)
Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19)
SOAP Engine Axis 2 (in Tomcat 5.5.8)
Metadata Services Avg. latency for inquiries for 64KByte data retrieval
hybrid WS-Context 14.55 ms
OGSA-DAI WSRF 2.1 232 ms
  • gt OGSA-DAI Results are from
  • http//www.ogsadai.org.uk/documentation/scenarios/
    -performance
  • Both OGSA-DAI and WS-Context testing cases were
    conducted on a tightly coupled network.


The results indicate that the cost of inquiry and
publication operations remains the same, as the
contexts payload size increases from 100Bytes up
to 10KBytes. We also see that the hybrid
WS-Context presents better performance than
OGSA-DAI approach but latter technology more
powerful
19
TESTBED Cluster node configuration TESTBED Cluster node configuration
Processor Intel Xeon CPU (2.40GHz)
RAM 2GB total
Network Bandwidth 900 Mbits/sec.1 (among the cluster nodes)
OS GNU/Linux (kernel release 2.4.22)
Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19)
SOAP Engine Axis 2 (in Tomcat 5.5.8)

The results indicate that the proposed system can
scale up to 940 simultaneous querying clients or
222 simultaneous publishing clients where each
client sending one query per second, for small
size context payloads with 30 milliseconds fault
tolerance. Multi-core hosts will improve
performance dramatically
20
4 Cores is 3000 messages per second about one
message per millisecond per core for Opteron one
message per 2 ms for Sun Niagara core
21
5 Client distributed to cluster nodes 1 to 5,
with each running 1 to 15 threads firing messages
to randomly selected servers.
DISTRIBUTION TEST
HTTP(S)
  • We investigate scalability when moving from a
    centralized server to a distributed one under
    heavy workloads.
  • Numbered rectangle shapes correspond to an
    N-node FTHPIS system with various
    Publish-Subscribe topologies (this does NOT
    affect performance)
  • 5 different FTHPIS system tested when N range
    from 1 to 5 under the same workload.
  • At each testing case, same volume of data is
    evenly distributed among the nodes.

22
TESTBED Cluster node configuration TESTBED Cluster node configuration
Processor Intel Xeon CPU (2.40GHz)
RAM 2GB total
Network Bandwidth 900 Mbits/sec.1 (among the cluster nodes)
OS GNU/Linux (kernel release 2.4.22)
Java Version Java 2 platform, Standard Edition (1.4.2-beta-b19)
SOAP Engine Axis 2 (in Tomcat 5.5.8)
Hybrid WS-Context inquiry operation Hybrid WS-Context inquiry operation Hybrid WS-Context inquiry operation Hybrid WS-Context inquiry operation
of nodes message rate mean error (ms) Stdev (ms)
1 940 47.05 0.24 33.52
2 1005 40.76 0.43 38.22
3 1082 38.58 0.45 34.93
4 1148 36.28 0.42 32.24
5 1221 34.13 0.4 30.76
Non-optimal caching algorithm as does database
access BEFORE Publish-Subscribe. Reversingthis
choice should lead to throughput Linear in
nodes Pub-Sub overhead 2ms

The results indicate that the scalability of
metadata store can be increased when moving from
a centralized service to a distributed system.
23
Prototype Evaluation - III
  • Fault Tolerance Experiment We investigate the
    empirical cost of having fault-tolerance by
    finding answers to the following research
    questions.
  • What is the cost of the fault-tolerance in terms
    of execution time of standard operations on a
    tight cluster?
  • How does the cost of fault-tolerance change when
    the replica servers separated with significant
    network distances?

24
FAULT-TOLERANCE TEST
25
FAULT-TOLERANCE EXPERIMENT TEST BED
Summary of machine configurations Summary of machine configurations Summary of machine configurations Summary of machine configurations Summary of machine configurations Summary of machine configurations
Location Processor RAM OS Java Version
gf6.ucs.indiana.edu Bloomington, IN, USA Intel Xeon CPU (2.40GHz) 2GB GNU/Linux (kernel release 2.4.22) Java 2, STE, (1.4.2-beta-b19)
complexity.ucs.indiana.edu Indianapolis, IN, USA Sun-Fire-880, sun4u sparc SUNW 16GB SunOS 5.9 Java HotSpot(TM) 64-Bit Server VM(1.4.2-01)
lonestar.tacc.utexas.edu Austing, TX, USA Intel(R) Xeon(TM) CPU 3.20GHz 4GB GNU/Linux (kernel release 2.6.9) Java 2, STE, (1.4.2-beta-b19)
tg-login.sdsc.teragrid.org San Diego, CA, USA GenuineIntel IA-64, Itanium 2, 4 processors 8GB GNU/Linux Java 2, STE, (1.4.2-beta-b19)
vlab2.scs.fsu.edu Tallahase, FL, USA Dual Core AMD Opteron(tm) Processor 270 2GB GNU/Linux (kernel release 2.6.16) Java 2, STE, (1.4.2-beta-b19)
26
FAULT-TOLERANCE TEST RESULTS
The results point out the inevitable trade-off
between the fault-tolerance (degree of
replication or high availability of data) and
performance. The lower the level of
fault-tolerance, the higher the performance would
be for publication operations. These results
also indicated that, high degree of replication
could be succeeded (by utilizing an asynchronous
communication model such as publish-subscribe
paradigm) without increasing the cost of
fault-tolerance.
27
An Application Case Scenarioand an
application-specific performance evaluation of
the Hybrid WS-Context Service
28
Application Context Store usage in
communication of mobile Web Services
  • Handheld Flexible Representation (HHFR) is an
    open source software for fast communication in
    mobile Web Services. HHFR supports
  • streaming messages, separation of message
    contents and usage of context store.
  • http//www.opengrids.org/hhfr/index.html
  • We use WS-Context service as context-store for
    redundant message parts of the SOAP messages.
  • redundant data is static XML fragments encoded in
    every SOAP message
  • Redundant metadata is stored as context
    associated to service conversion in place
  • The empirical results show that we gain 83 in
    message size and on avg. 41 on transit time by
    using WS-Context service.

29
Optimizing Grid/Web Service Messaging Performance
The performance and efficiency of Web Services
can be greatly increased in conversational and
streaming message exchanges by removing the
redundant parts of the SOAP message.
30
Performance with and without Context-store
  • Experiments ran over HHFR
  • Optimized message exchanged over HHFR after
    saving redundant/unchanging parts to the
    Context-store
  • Save on average
  • 83 of message size, 41 of transit time

Summary of the Round Trip Time (TRTT)
Message Size Without Context-store Without Context-store With Context-store With Context-store
Message Size Ave.error Stddev Ave.error Stddev
Medium 513byte (sec) 2.760.034 0.187 1.750.040 0.217
Large 2.61KB (sec) 5.200.158 0.867 2.810.098 0.538
31
System Parameters
  • Taccess time to access to a Context-store (i.e.
    save a context or retrieve a context to/from the
    Context-store) from a mobile client
  • TRTT Round Trip Time to exchange message through
    a HHFR channel
  • N number of simultaneous streams supported by
    stream summed over ALL mobile clients
  • Twsctx time to process setContext operation
  • Taxis time consumed for Axis process
  • Ttrans transmission time through network
  • Tstream stream length

32
Context-store System Parameters
33
Summary of Taxis and Twsctx measurements
  • Taccess Twsctx Taxis Ttrans
  • Data binding overhead
  • at Web Service Container
  • is the dominant factor to
  • message processing

34
Performance Model and Measurements
  • Chhfr nthhfr Oa Ob
  • Csoap ntsoap
  • Breakeven point
  • nbe thhfr Oa Ob nbe tsoap
  • Oa(WS) is roughly 20 milliseconds

Oa overhead for accessing the
Context-store Service Ob overhead for
negotiation
Averageerror (sec) Stddev (sec)
Context-store Access (Oa) 4.1270.042 0.516
Negotiation (Ob) 5.1330.036 0.825
35
String Concatenation
  • Measure the total time to process stream
  • Independent variables
  • Number of messages per stream
  • Size of the message
Write a Comment
User Comments (0)
About PowerShow.com