Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of F - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of F

Description:

Implementation and Evaluation of a Protocol ... PS3. invocation. result. Actor4. PS4. Pointer Chain. Failures ... PS3. PS4. Requirements. Guaranteed Recording ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 24

Provided by: sciU

Category:

more less

Transcript and Presenter's Notes

Title: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of F

1
Implementation and Evaluation of a Protocol for
Recording Process Documentation in the Presence
of Failures

Zheng Chen and Luc Moreau
zc05r_at_ecs.soton.ac.uk
L.Moreau_at_ecs.soton.ac.uk
University of Southampton

2
Outline

Motivation
Protocol Overview
Implementation
Experimental Setup
Experimental Results Analysis
Conclusions Future Work

The provenance of a data product refers to the
process that led to that data product
Process documentation is a computer-based
representation of a past process for determining
provenance
Process documentation consists of a set of
p-assertions
Process documentation is stored in provenance
stores
Provenance obtained by querying provenance stores

4
PReP (Groth 04-08)

A protocol to record process documentation
Multiple provenance stores are interlinked to
enable retrievability of distributed process
documentation

5
Failures

Provenance store crash, communication failures
We do not consider application failures, e.g.
actor crash
Poor quality process documentation
Incomplete
Disconnected

6
Requirements

Guaranteed Recording
After a process completes, the entire
documentation of the process must eventually be
recorded in provenance stores
Link Accuracy
All the links recorded during a process must
eventually be accurate to enable retrievability
of distributed documentation
Efficient Recording
The protocol should be efficient and
introduce minimum overhead

7
F-PReP

A protocol for recording process documentation in
the presence of failures
Derives from PReP to inherit its generic nature
Introduces an Update Coordinator to facilitate
updating links (We assume the coordinator does
not crash)
Actors side
Uses timeout and retransmission to record
p-assertions
Chooses alternative provenance stores in case of
failures
Requests the coordinator to update links
Provenance store
Replies an acknowledgement only after it has
successfully recorded p-assertions in its
persistent storage.

8
F-PReP
9
Implementation

Provenance Store
Implemented as a Java Servlet
backend store (Berkeley DB)
Disk cache
Flushing OS buffers to disk before providing
an ack to actor
Update Plug-In
Client Side Library
Remedial actions that cope with failures
Multithreading for the creation and recording of
p-assertions
A local file store (Berkeley DB) for temporarily
maintaining p-assertions
Update Coordinator
Implemented as a Java Servlet
Berkeley DB is also employed to maintain request
information

10
Performance Study

Throughput of provenance store and coordinator
Scalability of update coordinator
Failure-free recording performance
Overhead of taking remedial actions
Performance impact on application

11
Experimental Setup

Iridis cluster (Over 1000 processor-cores)
Gigabit Ethernet
Tomcat 5.0 container
Berkeley DB Java Edition database
Java 1.5
A generator is used on an actor's side to inject
random failure events
Failure to submit a batch of p-assertions to a
provenance store
Failure to receive an acknowledgement from a
provenance store before a timeout
Generates a failure event based on a failure
rate, i.e., the number of failure events
occurring after a total number of recordings

12
1. Provenance Store (PS) Throughput

Setup up to 512 clients sending 10k
p-assertions to 1 PS in 10 min
Hypothesis Disk cache may sacrifice a
provenance store's throughput.
Result 20 decrease in throughput

13
2. Coordinator Throughput

Setup up to 512 clients sending 100 requests to
1 coordinator in 10 min
Hypothesis The coordinators throughput is
high.
Result 30,000100 repair requests accepted in
10 min

14
3. Throughput Experiment with Failures (1 client)

Setup 1 client sending 10k p-assertions to 1 PS
1 alt. PS and 1 coordinator used in
the case of failures
Hypothesis (a) Resending to a same PS is
preferred over alt. PS
for transient failures
(b) Update coordinator is
not a bottleneck.

15
4. Throughput Experiment with Failures (128
clients)

Setup 128 clients sending 10k p-assertions to 1
PS
1 alt. PS and 1 coordinator used
in the case of failures
Hypothesis (a) Resending to a alt. PS is
preferred to same PS
(b) The coordinator is not a bottleneck.

16
5. Failure-free Recording Performance

Setup 1 client recording 10,000 10k
p-assertions to 1 PS
100 p-assertions shipped in a single batch
Hypothesis Disk cache causes overhead.
Results (a) 900 10k p-assertions may be lost if
PSs OS crashes. (PReP)
(b) 13.8 overhead, compared to PReP

17
6. Overhead of Taking Remedial Actions

Setup 1 client recording 100 p-assertions to 1
PS
1 alt. PS and 1 coordinator used in the case of
failures
Hypothesis Remedial actions have acceptable
overhead.
Result record time

18
7. Performance Impact on Application

Amino Acid Compressibility Experiment (ACE)
High performance and fine grained, thus
representative
One run of ACE 20 parallel jobs 54, 000
interactions/job
Extremely detailed process documentation
1.08 GB p-assertions/job in 25 minutes

19
Recording Performance in ACE

Setup 5 PS and 1 coordinator
Multithreading for creation and recording
p-assertions
Hypothesis F-PReP has acceptable recording
overhead.
Results (a) similar overhead (12) as PReP on
application performance when no
failure occurs
(b) Timeout and queue management affect
performance.

20
Impact of Queue Management on Performance

Hypothesis Flow control on queue affects
performance.
Conclusions (a) The result supports our
hypothesis.
(b) We can monitor queue and take
actions,
e.g., employing the local file store.

21
8. Quality of Recorded Process Documentation

Setup Using F-PReP and PReP to record
p-assertions
Querying PS to verify recorded
documentation
Results (a) PReP incomplete F-PReP complete
(b) PReP irretrievable F-PReP
retrievable

22
Conclusions Future Work

Coordinator does not affect an actors recording
performance.
In an application, F-PReP has similar recording
overhead as PReP on application performance when
there is no failure.
Although it introduces overhead in the presence
of failures, we believe the overhead is still
acceptable, given that it can record high quality
(i.e., complete and retrievable) process
documentation.
We are currently investigating how to create
process documentation when an application has its
own fault tolerance schemes to tolerate
application level failures.
In future work, we plan to make use of the
process documentation recorded in the presence of
failures to diagnose failures.