Title: Lightweight Fault Tolerance for Distributed RealTime Systems
1Lightweight Fault Tolerance for Distributed
Real-Time Systems Revised Submission
realtime/2008-05-01 Ottawa OMG Technical
Meeting, June 2008 Realtime/08-06-01 Andy
Foster, PrismTech Robert Kukura, PrismTech
2Contents
- Overview
- Basic Operation
- CORBA-specific Details
- Fault Tolerance
- LWFT Interfaces
- Summary of ORB Changes
- Revisions Made Since the Initial Proposal
- Conclusions
3Overview
- Objective to define a new Lightweight Fault
Tolerance for Distributed Real-Time Systems
specification. - The existing FT CORBA specification is not
compatible with the RT CORBA specification. - It also requires the implementation of the IOGR
type. - Here we propose an alternative to FT CORBA
- Makes use of location forwarding techniques to
provide indirect binding to Server replicas. - Uses the standard multi-profile IOR.
- Requires minimal ORB modifications.
- Is compatible with the RT CORBA specification.
- Supports legacy ORB clients
4Basic Operation
- Based on the concept of redirecting Clients to a
suitable replicated Server entity. - Servers are registered to a central Registry
which records their endpoint information. - Servers substitute the endpoint information of a
Forwarder component into any IORs that they
produce. - As a result, clients using these IORs will be
directed to the Forwarder, which will forward
them to the IOR of the Server replica that they
should use.
5Basic Operation
- Replication is managed on a per-process
granularity. - Primary focus is on keeping track of a
registration of transport endpoints that are to
host fault tolerant entities. - Will allow the querying of registered endpoints
regarding individual objects hosted there. This
is required for compatibility with Real-time
CORBA. - Does not attempt to handle application state
consistency. - Mechanisms are provided to allow applications to
manage their own state consistency. - Focuses on ensuring the safe delivery of requests
and responses.
6Basic Operation
1.
Server 1A
3.
4.
Server 1B
Forwarder
Client
5.
6.
2.
Server 1C
Registry
1. Servers register themselves with the Registry.
2. Forwarder accesses the Registry's records to
update its own.
3. Client requests access to a Fault Tolerant
Object on Server 1.
4. Forwarder selects a replication (1B) to
handle this request.
5. Server 1B returns an object reference to the
Forwarder...
6. ...which is passed to the Client in the
response to their original request.
7CORBA-Specific Details
- Object References
- Will use regular IORs rather than FT CORBA's
IOGRs. - Every replica in the same group will share a
common Object Key. - Every member of the same replication group will
share a common -ORBServerID property value, which
should be recoverable from their Object Key. - Server's will use their own Object Key and the
Forwarder's endpoint information (obtained during
Registration) in any IORs that they create for FT
Objects.
8CORBA-Specific Details
- FT_Locate()
- Fully compatible ORBs will contact the Forwarder
using the new reserved operation name
'FT_Locate', the only correct, non exceptional
response to which is a reply with a
LOCATION_FORWARD status value. - A response to an FT_Locate call containing a
BAD_OPERATION exception should be interpreted as
meaning the object exists, but the server ORB
does not support LOCATION_FORWARD. - In this situation, the Forwarder can build an IOR
for that Object from the Server's registration
data, but it will not be suitable for use in
Real-time CORBA applications.
9CORBA-Specific Details
- Behaviour of the Forwarder
- Upon receiving an FT_Locate request, the
Forwarder will decode the incoming Object Key. - Using the extracted -ORBServerID value to
identify the replication group, the Forwarder
will select an appropriate Server to use (each
replica will have been initialised with its own
unique - ORBProcessID property value). - The Forwarder will use the recorded endpoint
Information for that Server to send an FT_Locate
request to it. The response to this request will
be a LOCATION_FORWARD which points directly to
the location of an FT Object. - The Forwarder will now return LOCATION_FORWARD
reply to the Client which points to this received
location.
10CORBA-Specific Details
11Fault Tolerance
- Fault Detection using FT CORBA's FT_HB()
operation. - '23.2.9 Transport Heartbeats' in the CORBA
specification. - Replicas detected to have failed will have their
registrations removed from the Registry. - Fault Recovery using the standard CORBA IOR
fall-back mechanisms. - If a Server replica fails, the direct IOR passed
to the client by the Forwarder will no longer
work. - The client will simply fall back to the original
IOR as per normal transparent reinvocation
behaviour, which will result in them requesting
another replica's IOR from the Forwarder.
12LWFT Interfaces
- Server endpoint data is represented in the system
using the following data structures - Endpoint
- Location
- The key components which form this mechanism are
- LWFTRegistry
- LWFTRegistryAdmin
- LWFTForwarder
- LWFTObjectKeyDecoder
- LWFTProcessSelector
13LWFT Interfaces - Data Structures
- Endpoint
- struct Endpoint
-
- Location endpoint_key
- Object profiles
-
- Used to represent an accessible Server endpoint
in the system. - Built from the information gained by decoding an
Object Key into its constituent parts.
14LWFT Interfaces - Data Structures
- Location
- struct Location
-
- ObjectLifeSpan lifespan
- unsigned long vendor_orb_id
- string server_id
- string orb_id
- CORBAStringSeq poa_names
- CORBAOctetSeq object_id
-
- Decoded Object Keys will contain some or all of
these variables (the more the better). - The Object Key must contain the server_id
parameter in order to identify the requested
replication group.
15LWFT Interfaces - Registry
- LWFTRegistry
- Manages Server replica registrations and passes
them the Forwarder endpoint information to use in
their IORs. - Responsible for listening to Server's still_alive
calls and deregistering them should a timeout
occur. - Other Registry components can be registered as
listeners on a set of defined Locations. The
level of listening used can be defined by
policy. - The LWFTRegistryAdmin interface can be used by
applications to access the Registry's records of
registered processes, and to manipulate the order
in which backups will be selected within
individual replication groups.
16LWFT Interfaces - Registry
interface Registry void register_process(ino
ut ProcessID process_id,
in EndpointSeq process_endpoints,
out EndpointSeq forwarder_endpoints)?
raises (ReplicaMismatch, ProcessIDInUse,
UnParsableEndpoints) void
deregister_process(in ProcessID process_id,
in string message)
raises (NotFound) void still_alive (in
ProcessID from) raises (NotFound) void
process_ready(in ProcessID process_id) enum
LISTEN_LEVELDEREGISTRATION, REGISTRATION,
FULL void register_listener (in Registry
ft_registry, in
LISTEN_LEVEL level,
in LocationSeq locations) void shutdown (in
boolean wait)
17LWFT Interfaces - RegistryAdmin
- LWFTRegistryAdmin
- Can be used by applications to gain direct access
to the contents of the Registry. - Allows applications to query the registered
details of a Server and manipulate the order of
backups used by individual replication groups.
18LWFT Interfaces - RegistryAdmin
- interface RegistryAdmin
-
- readonly attribute Registry the_registry
- void get_all_processes (out ProcessIDSeq
all_processes) - void get_all_locations (out LocationSeq
all_locations) - void get_location (in Location the_location,
out ProcessIDSeq processes, - out ForwardPolicy policy,
out RegistrySeq listeners, - out ForwarderSeq
forwarders) - void set_location(in Location the_location,in
ForwardPolicy policy, - in ProcessIDSeq ordered_list,
- in ForwarderSeq forwarders)
- void get_process (in ProcessID process,
- out EndpointSeq
registered_endpoints)
19LWFT Interfaces - Forwarder
- LWFTForwarder
- Responsible for retrieving the direct IOR of an
FT Object for a client. - Uses the LWFTObjectKeyDecoder and
LWFTProcessSelector interfaces to decode an
ObjectKey and identify a suitable replica for the
client to use. - Supports the registration of other Forwarders as
fall-backs if a request is received for an
unknown replication group, redirect the client to
the fall-back Forwarder using a LOCATION_FORWARD.
20LWFT Interfaces - Forwarder
interface Forwarder // Pseudo operation
// void FT_Locate() // Pseudo operation
// void FT_HB() void register_forwarder (in
Forwarder fall_back,
in LocationSeq locations)
21LWFT Interfaces ObjectKeyDecoder
ProcessSelector
- LWFTObjectKeyDecoder
- Provides methods to decode an Object Key octet
sequence into an LWFTLocation data structure. - Two versions of this interface will be used,
LWFTObjectKeyDecoder and a local interface
version LWFTObjectKeyDecoderLocal. - LWFTProcessSelector
- A local interface which Provides methods to
select a replica's unique processID value from a
particular replication group, given that group's
unique serverID value. - The order in which replicas will be selected is
determined by the implementation of this
interface or any interfaces which inherit from
it.
22LWFT Interfaces ObjectKeyDecoder
ProcessSelector
interface ObjectKeyDecoder readonly
attribute unsigned long vendor_orb_id
boolean get_key_contents (in CORBAOctetSeq
object_key, out
Location key_contents) // For symmetry
only boolean get_object_key (in Location
contents, out
CORBAOctetSeq object_key) local
interface ProcessSelector void set_registry
(in RegistryAdmin admin) void add_process
(in ProcessID process) void remove_process
(in ProcessID process) ProcessID
select_process (in Location the_location)
ProcessIDSeq ordered_process_set (in Location
the_location)
23Summary of ORB modifications
- Additional Object Key parameters
- Server_id, used to uniquely identify replication
groups, is required. - It is preferred that as many of the Location
structure's parameters are provided as possible. - New reserved operation name FT_Locate
- The expected response is a LOCATION_FORWARD
reply. - If the requested object does exist but the
receiving ORB does not support LOCATION_FORWARD,
return a BAD_OPERATION exception
24Revisions Made Since the Initial Proposal
- Separate interfaces to handle Server registration
and Client forwarding. - Introduced a new LWFTRegistryAdmin interface
- Allows manual manipulation of the Registry and
its records, including the ordering of backups. - Introduced a new LWFTProcessSelector interface
- Allows developers to define their own methods of
selecting which replica to use upon each
request. - Can be linked to a LWFTRegistryAdmin, to
provide access to the full range of data in the
system when making a decision.
25Conclusions
- There is a great need for a Fault Tolerance
specification which is lightweight, flexible, and
fully compatible with the Real-time CORBA
specification. - This solution
- Minimizes ORB changes to a few additional Object
Key values and a new reserved operation name. - Uses regular IORs.
- Provides replication on a per process
granularity. - Provides developers with the means to implement
replica state consistency however they choose to
do so (if at all). - Is compatible with RT CORBA.