Scalable Distributed Stream Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Scalable Distributed Stream Processing

Description:

Create a copy of a box that is intended to run on second machine, to offload ... Choosing what to offload. Choosing what to split. Choosing filters. Others... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 18
Provided by: webC
Learn more at: http://web.cs.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Scalable Distributed Stream Processing


1
Scalable Distributed Stream Processing
  • Presented by Ming Jiang

2
Centralized stream processing review
3
Situation when distributed
  • A distributed federation of participating nodes
    in different administrative domains
  • Collaboration between different domains required

4
Two complementary efforts for the situation
  • Aurora
  • intra-participant distribution
  • Medusa
  • inter-participant distribution

5
Three pieces to be shard
  • Aurora
  • An overlay network of communication
  • Algorithms for high-availability

6
Three architectural issues
  • Communications
  • Load sharing
  • High availability in the presence of failure

7
Communications
  • Naming (participants, entity-name)
  • Routing
  • 1. a data source or an administrator registers a
    schema and a stream
  • 2. When DS produce an event, labels

8
Communications
  • Message Transport
  • multiplexing all the message streams on a single
    TCP connection
  • Remote definition process migration is too
    complicated

9
Load Management
  • Repartitioning Aurora Networks, based on loads
    and resources
  • Box Sliding
  • Box Splitting

10
Box Sliding
  • Takes a box on the edge of a sub-network on one
    machine and shifts it to its neighbor.

upstream box sliding
11
Box Splitting
  • Create a copy of a box that is intended to run on
    second machine, to offload
  • Need a filter as router

12
Box splitting
Tumble
Merge Box splitting has to be
transparent
13
Box splitting
  • If predicate in filter is Blt3

A machine 1,2,3,4,7
B machine 5,6
?
A machine
B machine
final result after merge
14
Key partitioning Challenges
  • Choosing what to offload
  • Choosing what to split
  • Choosing filters
  • Others

15
High Availability
Utilize the push-based nature
16
Failure detection and Recovery
  • 1. periodically send heartbeat msgs to upstream
    neighbors
  • 2. if any server does not reply for pre-defined
    time, we assume it failed
  • 3. initiate recovery phase, emulating the process
    of failed server
  • (load shedding can be used)

17
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com