A Recovery-Friendly, Self-Managing Session State Store - PowerPoint PPT Presentation

About This Presentation

Title:

A Recovery-Friendly, Self-Managing Session State Store

Description:

A Recovery-Friendly, Self-Managing Session State Store Benjamin Ling, Emre Kiciman, Armando Fox {bling,emrek,fox}_at_cs.stanford.edu – PowerPoint PPT presentation

Number of Views:100

Avg rating:3.0/5.0

Slides: 45

Provided by: Benja54

Learn more at: http://roc.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Recovery-Friendly, Self-Managing Session State Store

1
A Recovery-Friendly, Self-Managing Session State
Store

Benjamin Ling, Emre Kiciman, Armando
Foxbling,emrek,fox_at_cs.stanford.edu

2
Outline

Motivation What is Session State?
SSM
Architecture
Algorithm
Backpressure and Admission Control
SSM Pinpoint
Self-recovering, self-monitoring
Benchmarks
Next steps Sun Reference AppServer integration
Conclusion

3
Proliferation of J2EE and Web Services

J2EE embraced as industry standard
Framework
Simplifies development
Allows for portability of services
Standardized interfaces
However, difficulties remain

4
The Pain Administration and Maintenance

Administration is difficult and costly
-- Database admins cost 200K/yr a head
Development efficiency negatively impacted
Failure/Recovery is costly
Recovery slow, especially site outages
Data loss on crashes
Users adversely affected

5
Not All State is Created Equal

Various types of state in J2EE
User profile state
Persistent shared state
Transaction history state
But usually stored in the same place
Stored in DB or FS
Focus on particular class
Exploit its properties
Simplify Administration and Maintenance

6
Example of Session State
7
Properties of Session State

Subcategory of session state
Single-user, serial access, semi-persistent data
Examples Temporary application data, application
workflow
Example of usage (e.g. J2EE)

8
Goal

Build a session state store that is
Failure-friendly
Does not lose data on crash
Degrades gracefully
Recovery-friendly
Recovers fast
Self-Managing

9
Outline

Motivation What is Session State?
SSM
Architecture
Algorithm
Backpressure and Admission Control
SSM Pinpoint
Self-recovering, self-monitoring
Benchmarks
Next steps Sun Reference AppServer integration
Conclusion

10
Session State Manager (SSM)
RAM, Network Interface
Redundant, in-memory hash table distributed
across nodes

Algorithm Redundancy similar to quorums
Write to many random nodes, wait for few (avoid
performance coupling)
Read one

11
Write example Write to Many, Wait for Few
Try to write to W random bricks, W 4Must wait
for WQ bricks to reply, WQ 2
Brick 1
Brick 2
Browser
Brick 3
Brick 4
Brick 5
12
Write example Write to Many, Wait for Few
Try to write to W random bricks, W 4Must wait
for WQ bricks to reply, WQ 2
Brick 1
Brick 2
Browser
Brick 3
Brick 4
Brick 5
13
Write example Write to Many, Wait for Few
Try to write to W random bricks, W 4Must wait
for WQ bricks to reply, WQ 2
Brick 1
Brick 2
Browser
Brick 3
Brick 4
Brick 5
14
Write example Write to Many, Wait for Few
Try to write to W random bricks, W 4Must wait
for WQ bricks to reply, WQ 2
Brick 1
Brick 2
Browser
Brick 3
Brick 4
Brick 5
15
Write example Write to Many, Wait for Few
Try to write to W random bricks, W 4Must wait
for WQ bricks to reply, WQ 2
Brick 1
Brick 2
Browser
14
Brick 3
Brick 4
Cookie holds metadata
Brick 5
16
Read example
Try to read from Bricks 1, 4
Brick 1
14
Brick 2
Browser
Brick 3
Brick 4
Brick 5
17
Read example
14
Brick 1
Brick 2
Browser
Brick 3
Brick 4
Brick 5
18
Read example
Brick 1 crashes
Brick 1
Brick 2
Browser
Brick 3
Brick 4
Brick 5
19
Read example
Brick 2
Browser
Brick 3
Brick 4
Brick 5
20
SSM Failure and Recovery

Failure of single node
No data loss, WQ-1 remain
State is available for R/W during failure
Recovery
Restart No recovery
No special case recovery code
State is available for R/W during brick restart
Session state is self-recovering
Users access pattern causes data to be rewritten

21
Backpressure and Admission Control
Brick 1
Brick 2
Drop Requests
Brick 3
Brick 4
Brick 5
Heavy flow to Brick 3
22
Backpressure and Admission Control
Brick 1
Brick 2
Drop Requests
Brick 3
Brick 4
Reduce Sending
Brick 5
Reject requests
23
Outline

Motivation What is Session State?
SSM
Architecture
Algorithm
Backpressure and Admission Control
SSM Pinpoint
Self-recovering, self-monitoring
Benchmarks
Next steps Sun Reference AppServer integration
Conclusion

24
Recovery Philosophy
RECOVERY COST
Cheap
Expensive
Accurate
Lax
Aggressive
DETECTION ACCURACY
25
Failure detection and Recovery
SSM Failure masked
Instant recovery
26
False Positives
Normal Operation
False positivetriggered
Instant recovery
27
Statistical Monitoring
Statistics
Statistics NumElementsMemoryUsedInboxSizeNumDro
ppedNumReadsNumWrites
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
28
Statistical Monitoring
Statistics
Statistics NumElementsMemoryUsedInboxSizeNumDro
ppedNumReadsNumWrites
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
REBOOT
29
Statistical Monitoring
Statistics
Statistics NumElementsMemoryUsedInboxSizeNumDro
ppedNumReadsNumWrites
Brick 1
Brick 2
Brick 3
Brick 4
Brick 5
30
SSM Monitoring

N replicated bricks handle read/write requests
Cannot do structural anomaly detection!
Alternative features (performance, mem usage,
etc)
Activity statistics How often did a brick do
something?
Msgs received/sec, dropped/sec, etc.
Same across all peers, assuming balanced workload
Use anomalies as likely failures
State statistics Current state of system
Memory usage, queue length, etc.
Similar pattern across peers, but may not be in
phase
Look for patterns in time-series differences in
patterns indicate failure at a node.

31
Surprising Patterns in Time-Series

1. Discretize time-series into string. Keogh
0.2, 0.3, 0.4, 0.6, 0.8, 0.2 -gt aaabba
2. Calculate the frequencies of short substrings
in the string.
aa occurs twice ab, bb, ba occurs once.
3. Compare frequencies to normal, look for
substrings that occur much less or much more than
normal.

32
Outline