Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of F - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of F

Description:

Implementation and Evaluation of a Protocol ... PS3. invocation. result. Actor4. PS4. Pointer Chain. Failures ... PS3. PS4. Requirements. Guaranteed Recording ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 24
Provided by: sciU
Category:

less

Transcript and Presenter's Notes

Title: Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of F


1
Implementation and Evaluation of a Protocol for
Recording Process Documentation in the Presence
of Failures
  • Zheng Chen and Luc Moreau
  • zc05r_at_ecs.soton.ac.uk
  • L.Moreau_at_ecs.soton.ac.uk
  • University of Southampton

2
Outline
  • Motivation
  • Protocol Overview
  • Implementation
  • Experimental Setup
  • Experimental Results Analysis
  • Conclusions Future Work

3
  • The provenance of a data product refers to the
    process that led to that data product
  • Process documentation is a computer-based
    representation of a past process for determining
    provenance
  • Process documentation consists of a set of
    p-assertions
  • Process documentation is stored in provenance
    stores
  • Provenance obtained by querying provenance stores

4
PReP (Groth 04-08)
  • A protocol to record process documentation
  • Multiple provenance stores are interlinked to
    enable retrievability of distributed process
    documentation

5
Failures
  • Provenance store crash, communication failures
  • We do not consider application failures, e.g.
    actor crash
  • Poor quality process documentation
  • Incomplete
  • Disconnected

6
Requirements
  • Guaranteed Recording
  • After a process completes, the entire
    documentation of the process must eventually be
    recorded in provenance stores
  • Link Accuracy
  • All the links recorded during a process must
    eventually be accurate to enable retrievability
    of distributed documentation
  • Efficient Recording
  • The protocol should be efficient and
    introduce minimum overhead

7
F-PReP
  • A protocol for recording process documentation in
    the presence of failures
  • Derives from PReP to inherit its generic nature
  • Introduces an Update Coordinator to facilitate
    updating links (We assume the coordinator does
    not crash)
  • Actors side
  • Uses timeout and retransmission to record
    p-assertions
  • Chooses alternative provenance stores in case of
    failures
  • Requests the coordinator to update links
  • Provenance store
  • Replies an acknowledgement only after it has
    successfully recorded p-assertions in its
    persistent storage.

8
F-PReP
9
Implementation
  • Provenance Store
  • Implemented as a Java Servlet
  • backend store (Berkeley DB)
  • Disk cache
  • Flushing OS buffers to disk before providing
    an ack to actor
  • Update Plug-In
  • Client Side Library
  • Remedial actions that cope with failures
  • Multithreading for the creation and recording of
    p-assertions
  • A local file store (Berkeley DB) for temporarily
    maintaining p-assertions
  • Update Coordinator
  • Implemented as a Java Servlet
  • Berkeley DB is also employed to maintain request
    information

10
Performance Study
  • Throughput of provenance store and coordinator
  • Scalability of update coordinator
  • Failure-free recording performance
  • Overhead of taking remedial actions
  • Performance impact on application

11
Experimental Setup
  • Iridis cluster (Over 1000 processor-cores)
  • Gigabit Ethernet
  • Tomcat 5.0 container
  • Berkeley DB Java Edition database
  • Java 1.5
  • A generator is used on an actor's side to inject
    random failure events
  • Failure to submit a batch of p-assertions to a
    provenance store
  • Failure to receive an acknowledgement from a
    provenance store before a timeout
  • Generates a failure event based on a failure
    rate, i.e., the number of failure events
    occurring after a total number of recordings

12
1. Provenance Store (PS) Throughput
  • Setup up to 512 clients sending 10k
    p-assertions to 1 PS in 10 min
  • Hypothesis Disk cache may sacrifice a
    provenance store's throughput.
  • Result 20 decrease in throughput

13
2. Coordinator Throughput
  • Setup up to 512 clients sending 100 requests to
    1 coordinator in 10 min
  • Hypothesis The coordinators throughput is
    high.
  • Result 30,000100 repair requests accepted in
    10 min

14
3. Throughput Experiment with Failures (1 client)
  • Setup 1 client sending 10k p-assertions to 1 PS
  • 1 alt. PS and 1 coordinator used in
    the case of failures
  • Hypothesis (a) Resending to a same PS is
    preferred over alt. PS
  • for transient failures
  • (b) Update coordinator is
    not a bottleneck.

15
4. Throughput Experiment with Failures (128
clients)
  • Setup 128 clients sending 10k p-assertions to 1
    PS
  • 1 alt. PS and 1 coordinator used
    in the case of failures
  • Hypothesis (a) Resending to a alt. PS is
    preferred to same PS
  • (b) The coordinator is not a bottleneck.

16
5. Failure-free Recording Performance
  • Setup 1 client recording 10,000 10k
    p-assertions to 1 PS
  • 100 p-assertions shipped in a single batch
  • Hypothesis Disk cache causes overhead.
  • Results (a) 900 10k p-assertions may be lost if
    PSs OS crashes. (PReP)
  • (b) 13.8 overhead, compared to PReP

17
6. Overhead of Taking Remedial Actions
  • Setup 1 client recording 100 p-assertions to 1
    PS
  • 1 alt. PS and 1 coordinator used in the case of
    failures
  • Hypothesis Remedial actions have acceptable
    overhead.
  • Result record time

18
7. Performance Impact on Application
  • Amino Acid Compressibility Experiment (ACE)
  • High performance and fine grained, thus
    representative
  • One run of ACE 20 parallel jobs 54, 000
    interactions/job
  • Extremely detailed process documentation
  • 1.08 GB p-assertions/job in 25 minutes

19
Recording Performance in ACE
  • Setup 5 PS and 1 coordinator
  • Multithreading for creation and recording
    p-assertions
  • Hypothesis F-PReP has acceptable recording
    overhead.
  • Results (a) similar overhead (12) as PReP on
    application performance when no
    failure occurs
  • (b) Timeout and queue management affect
    performance.

20
Impact of Queue Management on Performance
  • Hypothesis Flow control on queue affects
    performance.
  • Conclusions (a) The result supports our
    hypothesis.
  • (b) We can monitor queue and take
    actions,
  • e.g., employing the local file store.

21
8. Quality of Recorded Process Documentation
  • Setup Using F-PReP and PReP to record
    p-assertions
  • Querying PS to verify recorded
    documentation
  • Results (a) PReP incomplete F-PReP complete
  • (b) PReP irretrievable F-PReP
    retrievable

22
Conclusions Future Work
  • Coordinator does not affect an actors recording
    performance.
  • In an application, F-PReP has similar recording
    overhead as PReP on application performance when
    there is no failure.
  • Although it introduces overhead in the presence
    of failures, we believe the overhead is still
    acceptable, given that it can record high quality
    (i.e., complete and retrievable) process
    documentation.
  • We are currently investigating how to create
    process documentation when an application has its
    own fault tolerance schemes to tolerate
    application level failures.
  • In future work, we plan to make use of the
    process documentation recorded in the presence of
    failures to diagnose failures.

23
Questions?
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com