Reliable Multicasting with JGroups - PowerPoint PPT Presentation

About This Presentation
Title:

Reliable Multicasting with JGroups

Description:

Commercial products can use JGroups without having to LGPL their code ... Protocol stacks are fashionable: Servlet 2.3 filters. Interceptors (Corba, JBoss) ... – PowerPoint PPT presentation

Number of Views:754
Avg rating:3.0/5.0
Slides: 50
Provided by: csCor
Category:

less

Transcript and Presenter's Notes

Title: Reliable Multicasting with JGroups


1
Reliable Multicasting with JGroups
  • Bela Ban, Jan 2004
  • belaban_at_yahoo.com
  • http//www.jgroups.org

2
Overview
  • API, architecture
  • Protocols
  • Building Blocks
  • Performance
  • Future, Conclusion

3
What Is It ?
  • Toolkit for reliable multicasting
  • Fragmentation
  • Message retransmission
  • Ordering
  • Group membership, membership change notification
  • LAN or WAN based

4
License
  • JGroups is a toolkit (JAR), to be linked against
    an application
  • Open Source under LGPL
  • Commercial products can use JGroups without
    having to LGPL their code
  • Modifications to JGroups itself need to be
    LGPL'ed (if distributed)
  • Dual licensing in the future

5
API
  • Channel similar to java.net.MulticastSocket
  • plus group membership, reliability
  • Operations
  • Create a channel with a set of properties
  • Connect to a group X. Everyone that connects to X
    will see each other
  • Send a message to all members of X
  • Send a message to a single member

6
API
  • Receive a message
  • Retrieve membership
  • Be notified when members join, leave (including
    crashes)
  • Disconnect from the group
  • Close the channel

7
API
JChannel channelnew JChannel("file//home/bela/de
fault.xml") channel.connect("demo-group") System
.out.println("members are " channel.getView().g
etMembers()) Message msgnew Message(null, null,
"Hello world") channel.send(msg) Message
m(Message)channel.receive(0) System.out.println(
"received msg from " m.getSrc() " "
m.getObject()) ch.disconnect() ch.close()
8
Group topology
9
Architecture of JGroups
10
Demo
  • Draw
  • ReplicatedTree shared state

11
Stats
  • JGroups has 90KLOC
  • 30KLOC protocols
  • 45KLOC main building blocks
  • 15KLOC unit tests
  • 90 protocols shipped with JGroups
  • Set of well-tested stacks (in XML files)

12
Available protocols I
  • Transport
  • UDP, TCP, TCP_NIO, TUNNEL, JMS, LOOPBACK
  • Discovery
  • PING, TCPPING, TCPGOSSIP, UDPPING
  • Group membership
  • Reliable delivery FIFO
  • NAKACK, SMACK, UNICAST

13
Available protocols II
  • Failure detection
  • FD, FD_SOCK, FD_PID, FD_SIMPLE, FD_PROB,
    VERIFY_SUSPECT
  • Security
  • ENCRYPT, SSL ConnectionTable (n/a)
  • Fragmentation (FRAG)
  • State transfer (STATE_TRANSFER)

14
Available protocols III
  • Ordering
  • FIFO, CAUSAL, TOTAL, TOTAL_TOKEN
  • Virtual Synchrony
  • FLUSH, QUEUE, VIEW_ENFORCER
  • Probabilistic Broadcast
  • PBCAST
  • Merging
  • MERGE(2), MERGEFAST

15
Available protocols IV
  • Distributed message garbage collection
  • STABLE
  • Debugging
  • PERF, TRACE, PRINTOBJS, SIZE, BSH
  • Simulation
  • SHUFFLE, DELAY, DISCARD, DEADLOCK, LOSS,
    PARTITIONER

16
Available protocols V
  • Dynamic configuration
  • AUTOCONF
  • Flow control
  • FLOW_CONTROL, FC
  • Misc
  • PIGGYBACK, COMPRESS

17
Transport
  • Task
  • Send messages from above to all members in the
    group, or to a single member
  • Receive messages from NW, pass up stack
  • UDP multicast and multiple UDP unicast
  • TCP mcast done by multiple TCP unicasts
  • TUNNEL send to external router, e.g. through
    firewall

18
Discovery
  • Task
  • Initial discovery of members
  • Used by GMS to determine coordinator to send JOIN
    request to
  • Each member returns its own addr, plus the addr
    of the coordinator
  • Typical response (A,A, B,A, C,A)
  • Wait for n milliseconds or m responses

19
Discovery - UDP
  • Multicast discovery request
  • Each member responds with a unicast UDP datagram
    (local-addr, coord-addr), back to the sender

20
Discovery - TCPGOSSIP
  • Can be used by both UDP and TCP
  • External GossipServer
  • org.jgroups.stack.GossipServer
  • Maintains table of ltgroup, membersgt
  • Each member registers (groupname, own addr)
  • Lease based - members have to periodically renew
    registration
  • Multiple GossipServers possible

21
Discovery - TCPGOSSIP
  • To obtain initial membership for a given group,
    TCPGOSSIP contacts the GossipServer
  • Membership info does not need to be accurate -
    only goal is to determine coord to send JOIN
    request to

22
Discovery - TCPPING
  • Give a set of well known members
  • For discovery, those members are pinged
  • If at least 1 responds, we can find the
    coordinator
  • Does not require additional process

23
Group Membership
  • Task
  • Maintain a list of members
  • Notify members when a new member joins, or an
    existing member leaves (or crashes)
  • Each member has the same ordered list
  • List can be retrieved by Channel.getView()
  • First ( oldest) member is coordinator
  • If coord crashes, 2nd oldest takes over

24
Group Membership - JOIN
  • New member uses discovery to find coord
  • If first member -gt become coord
  • Else sends JOIN to coord
  • Coord adds new member to list, multicasts new
    view (member list) to all members
  • If 2 initial members are started at the same
    time, MERGE protocol merges them into a single
    group

25
Group Membership - LEAVE
  • Member sends LEAVE to coord
  • Coord multicasts new view to all members

26
Group membership - CRASH
  • Failure detection protocol sends up SUSPECT event
  • VERIFY_SUSPECT double checks
  • GMS multicasts new view (not containing crashed
    member)
  • If member resurfaces, it will be shunned
  • Has to leave and rejoin group

27
Failure detection
  • Task
  • Detect if a member has crashed and send SUSPECT
    event up the stack (to be handled by GMS)
  • Logical ring over membership
  • Each member pings its neighbor to the right

28
Failure detection - FD
29
Reliable delivery FIFO
  • Lossless and FIFO delivery for multicast and
    unicast messages
  • Multicast NAK and ACK
  • Unicast ACK
  • Missing messages (gaps) are retransmitted
  • Sender resends or
  • Receiver requests retransmission

30
Encryption
  • Uses public/private encryption to join new member
    and get shared group key
  • Shared key is used to encrypt all messages
  • Group key is recomputed on joins/leaves
  • SSL ConnectionTable
  • As alternative, to be used in TCP
  • Uses SSLSocket rather than Socket

31
Properties configuration
  • Plain string format
  • "UDP(mcast_addr228.8.8.8mcast_port45566ip_ttl
    32"
  • "mcast_send_buf_size64000mcast_recv_buf
    _size64000)"
  • "PING(timeout2000num_initial_members3)
    "
  • "MERGE2(min_interval5000max_interval10
    000)"
  • "FD_SOCK"
  • "VERIFY_SUSPECT(timeout1500)"
  • "pbcast.NAKACK(max_xmit_size8096gc_lag
    50retransmit_timeout600,1200,2400)"
  • "UNICAST(timeout600,1200,2400,4800)"
  • "pbcast.STABLE(desired_avg_gossip20000)
    "
  • "FRAG(frag_size8096down_threadfalseup
    _threadfalse)"
  • "pbcast.GMS(join_timeout5000join_retry_
    timeout2000"
  • "shunfalseprint_local_addrtrue)"
  • URL / XML

32
Advantages of protocol stacks
  • Each property is implemented by 1 prot
  • Fragmentation, retransmission, ordering
  • Protocols are assembled into a stack
  • Stack has exactly the properties needed by the
    appl / required by the network
  • Cant get this with java.net.Socket, always comes
    with full TCP/IP

33
Advantages of protocol stacks
  • Small scope a protocol does just one job, but
    does it well
  • Protocol stacks are fashionable
  • Servlet 2.3 filters
  • Interceptors (Corba, JBoss)
  • AOP separation of concerns, e.g. fragmentation
    should not be an application concern

34
Benefits
  • Same application code, different protocol stacks
    (deployment issue)
  • Application requirements reflected in protocol
    stack specification
  • App focuses on domain specific issues

35
Building Blocks
  • Replicated Cache
  • NotificationBus
  • Group RPC

36
Replicated Cache
  • Shared state across a group
  • Any change is replicated to all members
  • New members acquire initial state from coord
  • Structures supported
  • Tree
  • Hashmap
  • Queues

37
NotificationBus
  • Thin layer on Channel
  • Notifications sent to all members
  • Callback when notification is received
  • Hook for state sharing

38
Group RPC
  • Invoke a method call in all members
  • Get a list of responses
  • Wait for all responses, majority, first, or none
    response (use optional timeout)
  • Handles crashed members correctly (no blocking)

39
Theory
  • Virtual Synchrony
  • DEFAULT
  • Probabilistic Broadcast

40
Virtual Synchrony
  • A View is a list of members (A,B,C,D)
  • When members join/leave, a new view will be
    installed (A,C,D)
  • Every healthy member receives the same set of
    messages between subsequent views
  • Messages sent in V1 are received in V1
  • All msgs by sender received in same order

41
Virtual Synchrony
  • The FLUSH protocol ensures that all members have
    received all msgs in V1 before installing V2
  • New members wont receive messages from previous
    views
  • Member that left wont receive msgs

A
V1
V2
B
C
42
DEFAULT
  • VSYNC expensive, doesnt scale well
  • Stop-the-world model on view changes
  • DEFAULT treats views as regular msgs
  • Less stringent reliability guarantees
  • Still good enough for most apps
  • SMACK does away with membership altogether, uses
    approximation of mbrship
  • Good for large groups (no coord)

43
Probabilistic Broadcast
- First a dirty multicast
44
Probabilistic Broadcast
- Then gossipping to repair failures
45
Probabilistic Broadcast
  • Epidemic style msg dissemination
  • Very resilient to attacks
  • Avoid nak implosions
  • Suited for large networks
  • Probability
  • 1 that either all members or none receives msg
  • 0 that few members receive msg

46
Serverless JMS
  • JMS based on JGroups
  • Peer-to-peer architecture rather than C/S
  • Client publishing to a topic
  • Instead of sending msg to server, and server
    distributes to multiple clients publisher
    multicasts message
  • JMS Server just another member
  • Handles persistent messages (DB)

47
Serverless JMS
Cost 4 unicasts
Cost 1 multicast
48
Serverless JMS
  • Clients are still able to publish even when
    server is down
  • Caveat works in scenario where client and server
    are in same multicast-reachable NW
  • Status
  • Topics/Queues available
  • No TX/XA, no durable subscriptions, no persistent
    messages
  • Download (standalone) beta at jboss.org

49
Session Replication in Tomcat
  • Done by Filip Hanik in Tomcat 4.x
  • Servlet sessions are replicated across Tomcat
    processes
  • New Tomcat instance gets sessions from existing
    Tomcat instance(s)
  • Modification (addition, removal of attributes) of
    session gets replicated

50
Session Clustering in Tomcat II
  • Expiry of session will expire session everywhere
    in the cluster
  • Last timestamp update
  • External load-balancer distributes requests to
    Tomcat instances
  • Round-robin
  • Sticky, next server on crash

51
Session Clustering in Tomcat III
52
Where is JGroups used ?
  • JBoss
  • Clustering
  • Replication of entity beans, SLSBs and SFSBs
  • HA-JNDI
  • Cache invalidation
  • Session repl (integrated Tomcat, Jetty)
  • Serverless JMS
  • Cache
  • Replicated transactional clustered cache

53
Where is JGroups used ?
  • Jonas appserver (clustering)
  • GroupPac (FT-CORBA impl)
  • GCT port to .NET
  • Replicated Caching
  • OpenSyphony OSCache
  • Jakarta Turbine's JCS
  • Swarmcache

54
Where is JGroups used ?
  • Session replication
  • Jetty
  • Tomcat 4.x
  • Work in progress on plugin architecture for
    Tomcat 5.x
  • Unofficial ones...

55
Performance
  • 4 nodes, 1 or 2 senders
  • 750MHz SunBlade 1000 512MB, 100MB switched
    ethernet
  • JGroups 2.1
  • 8000 10K msgs, in 200 bursts of 20 (2 senders),
    sleep after burst 5ms
  • 451 msgs/s 4.5MB/s throughput
  • Resident heap size 35MB max (-Xmx128m)

56
Performance
  • 1.4 billion messages total
  • 4 nodes, 2 senders
  • Message size 10K
  • Average msgs/s 350
  • Max resident mem 35M (-Xmx128m)
  • Tests available as part of JG distro
  • Includes gnuplot scripts to generate graphs

57
Current and future projects
  • JBossCache, Serverless JMS
  • Port to J2ME (first version available on
    www.jgroups-me.org)
  • hsqldb (HyperSonic) database replication
  • JCache JSR 107 compliant impl (JBoss Cache)
  • Potential work on GroupComm JSR
  • jcluster project on dev.java.net

58
Links
  • www.jgroups.org
  • "Papers and Articles" link to IBM devworks

59
Questions ?
Write a Comment
User Comments (0)
About PowerShow.com