The JgroupARM Dependable Computing Toolkit - PowerPoint PPT Presentation

About This Presentation
Title:

The JgroupARM Dependable Computing Toolkit

Description:

Stavanger University College Norway. Department of Electrical and Computer Engineering ... Aroma [UC Santa Barbara] DCOM. Quintet [Cornell] 14 ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 65
Provided by: jgroupSou
Category:

less

Transcript and Presenter's Notes

Title: The JgroupARM Dependable Computing Toolkit


1
The Jgroup/ARMDependable Computing
Toolkit Hein Meling Stavanger University College
NorwayDepartment of Electrical and Computer
Engineering Alberto Montresor University of
Bologna - ItalyDepartment of Computer Science
2
Context
  • (Distributed) systems that require
  • Reliable and high-availability operation
  • Fault tolerance
  • (Load balancing)
  • Based on cheap hardware and software
  • Commercial off the shelf, and not custom hardware
  • Heterogenous software (OS) architectures
  • Middleware architectures for distributed
    computing
  • Middleware between the application and OS

3
Types of Failures
  • Processor failures
  • Crash failures
  • Value failures (very expensive)
  • Network failures
  • Operating System hangs
  • Memory leaks
  • Software design errors(beyond state-of-the-art)

4
Overview
  • Jgroup
  • A toolkit aimed at supporting the development of
    reliable and highly-available applications.
  • Autonomous Replication Management (ARM)
  • A framework for server replica deployment and
    recovery without user intervention.
  • History
  • Formal specification (1996-97)
  • Algorithm description and Jgroup implementation
  • Integration with existing technologies (Java RMI
    / Jini)
  • The ARM framework (2000-03)
  • Development of Jgroup-based applications

5
  • Summary
  • Introduction
  • Object Group Communication
  • The ARM framework
  • Integration with Java RMI / Jini
  • Conclusions

6
The Problem
  • Some environments supporting distributed
    computing
  • CORBA (OMG)
  • DCOM / .NET (Microsoft)
  • Java RMI / Jini / EJB (Sun)
  • Characteristics
  • Object-oriented
  • Based on client - server remote method
    invocations
  • Promote modularity, reusability,
    interoperability, portability

7
Java Remote Method Invocations
  • Java RMI protocol
  • enables objects residing in different JVMs to
    communicate through remote method invocations

JVM1
JVM2
8
Java Remote Method Invocations
method()
return x
JVM1
JVM2
9
The Problem
  • Distributed computing environments did not
    provide adequate support for developing reliable
    and high-available applications
  • Lack of reliable one-to-many interaction
    primitives
  • From the clients point of view non-transparent
    access to replicated servers
  • From the servers point of view no support for
    maintaining consistency

10
The Solution The Object Group Paradigm
  • Object group
  • A dynamic collection of server objects that
    cooperate in order to deliver some service and
    maintain shared state
  • Group method invocations
  • The act of invoking a method on an object group
  • The method is executed by a certain number of
    servers in the object group, depending on the
    invocation semantics

ObjectGroup
11
The Solution The Object Group Paradigm
  • From the clients point of view
  • Groups must be transparent - like standard remote
    objects
  • Clients need not be aware that they are
    interacting with an object group instead of a
    single server
  • From the servers point of view
  • Server implementation - as transparent as
    possible
  • Servers forming a group
  • must cooperate to maintain shared state and
  • to appear as a single object

12
Group Communication
  • Group communication has been shown to be a
    powerful paradigm for supporting the development
    of dependable applications in distributed systems
  • Management of dynamic groups(join/leave
    operations)
  • Failure monitoring(crashes / partitionings)
  • One-to-many communication
  • Ordering of events (FIFO, Causal, Atomic)
  • State synchronization tools

13
Other Object Group Systems
  • CORBA
  • Electra Cornell, Zurich
  • Object Group Service (OGS) EPFL, Lausanne
  • Eternal UC Santa Barbara, Eternal Systems
  • Newtop Newcastle, UK
  • Java RMI
  • Filterfresh Bell Labs
  • JavaGroups Cornell
  • Aroma UC Santa Barbara
  • DCOM
  • Quintet Cornell

14
Jgroup Yet Another Object Group Service?
  • Support for partition-awareness
  • Modern wide-area communication networks are often
    characterized as highly partitionable
  • Jgroup supports the development of reliable and
    high-available applications in partitionable
    systems
  • Moreover
  • Is extends modern technologies like Java RMI and
    Jini
  • Is completely written in Java (portability)
  • Supports complex merging service
  • Extensible deployment, recovery and upgrade
    facilities

15
Autonomous Replication Management
  • Support for transparent replica deployment
  • Placing server replicas on machines in the
    network
  • Selecting machines so that each application can
    tolerate both network and machine failures
  • Support for replica recovery
  • Jgroup detect and report failures
  • ARM replace any crashed server replica with a new
    instance

16
  • Summary
  • Introduction
  • Object Group Communication
  • The ARM framework
  • Integration with Java RMI / Jini
  • Conclusions

17
Group Membership
  • Group membership service tracks both voluntary
    and involuntary changes in the groups membership
  • Variations are reported to group members through
    the installation of views
  • Installed views
  • Consist of a collection of members
  • Correspond to the groups current membership as
    perceived by the members included in the view

18
Group Membership A Simple Scenario
view
19
Partition-awareness
  • What kind of behavior can we expect from
    fault-tolerant applications in the presence of
    network partitioning?
  • The primary-partition approach

20
Support for partition-awareness
  • Jgroup supports dependability in partitionable
    systems
  • Development of applications aware of the
    existence of partitions (on the server-side)
  • Partition-aware applications take advantage of
    their semantics in order to be more available
  • Computations continue in all partitions of the
    system

21
Group Membership A Partitioning Scenario
22
Example Task Execution Service
Primary Partition
Partition-aware
23
Comparison
  • Primary-partition approach
  • Easy to maintain a single, coherent shared
    state(strong consistency)
  • Servers in non-primary partitions unable to serve
    requests (low availability)
  • Partition-aware approach
  • Servers in multiple partitions may be able to
    serve requests(high availability)
  • Partitions evolve independently, possibly leading
    to inconsistent states (loose consistency)

24
Comparison (Cont.)
  • Primary-partition approach
  • Development of fault-tolerant applications is
    simpler(active replication of existing non
    fault-tolerant servers)
  • Developers cannot exploit application semantics
    in order to provide a more available service
  • Partition-aware approach
  • Applications adapt their behavior and remain
    available in many partitions (perhaps by reducing
    their quality of service)
  • Development of fault-tolerant applications is
    more complex (case-by-case design is needed)

25
The State Merging Problem
  • During partitioning, the state of servers
    belonging to distinct partitions may become
    inconsistent
  • When the partition disappears, an
    application-specific state merging protocol may
    be needed
  • Servers participating in the protocol try to
    define a new shared state that reconciles (when
    possible) the divergences

26
The State Merging Problem
  • During partitioning, the state of servers
    belonging to distinct partitions may become
    inconsistent
  • When the partition disappears, an
    application-specific state merging protocol may
    be needed
  • Servers participating in the protocol try to
    define a new shared state that reconciles (when
    possible) the divergences

27
The State Merging Problem
  • State merging protocols are based on the exchange
    of information among servers that have been
    partitioned
  • Jgroup provides a state merging service (SMS)
    that simplifies the development of state merging
    protocols
  • NOTE
  • Determining
  • what information needs to be exchanged
  • how to use it to construct a new consistent
    shared state
  • is an application-dependent problem

28
General Schema for State Merging Protocols
  • In each of the merging partitions, a coordinator
    is selected
  • SMS interrogates each coordinator to obtain
    information about its current state
  • State information from a coordinator is passed to
    servers that used to be partitioned from it
  • Each of the servers merge information from
    coordinators with their own state

S1
getState()
S2
putState()
S3
S4
29
General Schema for State Merging Protocols
  • In each of the merging partitions, a coordinator
    is selected
  • SMS interrogates each coordinator to obtain
    information about its current state
  • State information from a coordinator is passed to
    servers that used to be partitioned from it
  • Each of the servers merge information from
    coordinators with their own state

S1
getState()
S2
putState()
S3
S4
30
Full Object-Orientation
  • Existing object group systems fail to provide a
    completely object-oriented environment for
    software developers

Remote methodinvocations
Messagemulticasting
31
View Synchrony
  • View synchrony (1)
  • If a correct server S executes an invocation
    during a view, then
  • all servers within the view will also execute the
    invocation,
  • or S will install a new view
  • View synchrony does not admit executions like
    this

S1
S2
S3
S4
32
View Synchrony
  • View Synchrony (2)
  • All servers that survive from one view to the
    same next view execute the same set of
    invocations in the original view
  • View synchrony does not admit executions like
    this

S1
S2
S3
S4
33
Internal Group Method Invocations
  • Synchronous invocations
  • The method invocation terminates by returning a
    vector of return values, one from each server at
    which the method was executed
  • Asynchronous invocations
  • The method invocation terminates immediately
    replies (if any) are returned to a callback
    object
  • Can be used to simulate message multicasting
    through void methods (one-way)

34
Internal Invocations example
Synchronous invocation
S1
S2
S3
35
Internal Invocations example
S1
S2
S3
36
External Group Method Invocations
  • Anycast invocations
  • Are executed by at least one server in the object
    group (unless the client is partitioned from the
    group)
  • Efficiency (same cost as standard RMI
    interactions)
  • Useful for read methods on replicated databases
  • Multicast invocations
  • Are executed by all servers in a view, following
    the view synchrony semantics
  • More costly (involve several servers)
  • Useful for write methods on replicated databases

37
External invocations example
C1
S1
S2
S3
C2
Multicast invocation registry.bind(name, obj)

Anycast invocation registry.lookup(name)

38
  • Summary
  • Introduction
  • Object Group Communication
  • The ARM framework
  • Integration with Java RMI / Jini
  • Conclusions

39
Replication Management The Problem
  • Object Group Systems support replication
    transparency
  • Membership management
  • Reliable multicast
  • But does not support full failure transparency
  • Application or manual support to distribute
    replicas
  • Application support or manual intervention
    required to recover from replica failures
  • Complicated tasks
  • Application implementations prone to contain
    errors
  • These tasks should not be left to the application
    developer

40
Solution Autonomous Replication Management
  • Support for creating object groups
  • By placing individual members on distinct
    machines
  • Each application may specify a replication policy
  • For example, redundancy level 3
  • Support for failure recovery
  • Jgroup detects and reports failures to ARM
  • ARM reacts by creating a replacement member for
    each failed member, perhaps on a different
    machine
  • Each application may specify a recovery policy

41
ARM Replica Distribution
item.ntnu.no
Management Client
Router
ux.his.no
42
ARM Replica Distribution
item.ntnu.no
Management Client
Router
ux.his.no
43
ARM Recovery from Crash Failure
item.ntnu.no
Group Leader
Management Client
Router
ux.his.no
44
ARM Recovery from Crash Failure
item.ntnu.no
Group Leader
Management Client
Router
ux.his.no
45
  • Summary
  • Introduction
  • Object Group Communication
  • The ARM framework
  • Integration with Java RMI / Jini
  • Conclusions

46
Introduction to Jini
  • Jini is an API built on top of the Java 2
    platform
  • enables spontaneous networks of devices/software
    services to assemble into federations of objects
  • addresses the distribution problems in these
    federations through a set of simple interfaces
    and protocols

Jini Network
47
Jini Architecture
  • The components of the Jini architecture may be
    divided in three categories
  • Infrastructure i.e. the components that enables
    building a federated Jini system
  • Model that supports and encourages the
    production of reliable distributed services
  • Services that can be made part of a federated
    Jini system and which offer functionality to any
    other member of the federation
  • Javaspaces

48
Jini Infrastructure
  • The infrastructure is composed of
  • Java RMI protocolenables objects residing in
    different JVMs to communicate through remote
    method invocations

JVM1
JVM2
49
Jini Infrastructure
  • The infrastructure is composed of
  • Lookup Service defines how services may become
    part of a Jini system and clients retrieve
    services by their types and attributes.

50
The Jini Programming Model
  • The programming model is based on three distinct
    paradigms for distributed computing
  • Leases extend the Java programming model by
    adding the time to the notion of holding a
    reference to a resource
  • Transactionsallow a set of operations on one or
    more remote participants to be grouped in such a
    way that either all succeed or all fail
  • Eventsenable objects to register interest in
    changes of the abstract state of remote objects

51
Jini and Fault Tolerance
  • Jini fault tolerance is based on leases and
    transactions
  • leases enable the detection of service failures
  • transactions provide consistency by guaranteeing
    all-or-nothing semantics
  • Unfortunately, no support for high-availability
    is present in Jini
  • No support for replication
  • Failure of transaction manager ? clients and
    participants must wait for the recovery of the
    manager before serving further requests

52
Enhancing Jini with Fault-Tolerance
  • Extending Jini with the Object Group Paradigm
  • Infrastructure
  • Extending Java RMI for Group Method Invocations
  • Extending the Lookup Service for dealing with
    Group Proxies
  • Programming Model
  • Object Group Paradigm as alternative programming
    model
  • Integration between transactional and object
    group model
  • Services
  • Replicated JavaSpaces

53
Extending Java RMI
  • RMI group at Javasoft designed Java RMI in order
    to be extensible
  • The RemoteRef interface enables programmers to
    write their own references to remote objects on
    the client-side
  • Unfortunately, RemoteRefs are not sufficient
  • There is no possibility to modify the behavior of
    RMI on the server side

RemoteRef
54
The Jgroup Approach (Current Version)
Statically or dynamically generated
implementsthe remote interface
Method dispatchers
Fixed stub for server proxy
55
Designing a New Java RMI API
  • We have cooperated with Sun Microsystems to
    design a new RMI API
  • Fully customizable, on both the client-side and
    the server-side
  • Based on Dynamic Proxy Classes (JDK 1.3)(No need
    for static stub generators)
  • Two different versions
  • One-to-one (remote method invocations)
  • Voted down in JSR-078
  • Being included in the "Davis" release of Jini
  • One-to-many (group method invocations)

56
Jgroup with 1-to-1 Customizable RMI
Statically or dynamically generated
implementsthe remote interface
Method dispatchers
57
Jgroup with 1-to-Many Customizable RMI
Customizableobjects
Multicast RMI
58
Extending the Lookup Service
  • Jini enables the registration of customized
    proxies for services
  • this feature can be used to register group
    proxies using any implementation of the lookup
    service
  • Group proxies, however, differ from standard
    proxies as their contents may be dynamic
  • server registration ? server reference added to
    group proxy
  • server removal, lease expired ? server reference
    removed from group proxy
  • We have developed an alternative implementation
    of the lookup specification capable to deal with
    group proxies

59
The Jgroup Lookup Service
Lookup
Invocation
60
Extending the Jini Programming Model
  • Jgroup Jini programming model for
    fault-tolerance
  • Leases transactions
  • Object group communication
  • Problem
  • transactions and group communication considered
    as separate aspects of fault-tolerance
  • their composition does not result in any
    meaningful combination of their respective
    strengths
  • We need the possibility of using replication in
    transactions
  • Transaction managers
  • Participants
  • Clients

61
  • Summary
  • Introduction
  • Object Group Communication
  • The ARM framework
  • Integration with Java RMI / Jini
  • Conclusions

62
Applications (Research)
  • Jgroup/ARM is being used for
  • A distributed auction system
  • Partitionable auctions
  • Panzieri, Amoroso et al., University of Bologna,
    2002
  • An online-upgrade service for active replication
  • Solarski, GMD Fokus
  • A replication management framework
  • Application-specific replication and recovery
    strategies
  • Meling, HiS
  • Dependable naming service
  • Support for extensible group proxies (JERI)
  • Meling et al., HiS

63
Applications (Education)
  • Jgroup is being used at the
  • Stavanger University College in the Advanced
    Programming course
  • University of Bologna in the Distributed System
    course
  • Norwegian University of Science and Technology in
    the Dependable Systems course
  • Source for several projects and thesis
  • Low-level communication protocols (Bologna)
  • Replication services (Bologna)
  • Wide-area distributed services (Padova)
  • Management and deployment issues (HiS)

64
  • Thank You!
  • http//jgroup.sourceforge.net/
Write a Comment
User Comments (0)
About PowerShow.com