Scalable Analysis of Distributed Workflow Traces - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Scalable Analysis of Distributed Workflow Traces

Description:

Profiling and Tracing code segments. ( TAU, Paraver, FPMPI, Intel ... SvPablo Auto code instrumentation and statistics collected for sections of source code. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 16
Provided by: csag9
Category:

less

Transcript and Presenter's Notes

Title: Scalable Analysis of Distributed Workflow Traces


1
Scalable Analysis of Distributed Workflow Traces
  • Daniel K. Gunter and Brian Tierney
  • Distributed Systems Department
  • Lawrence Berkeley National Laboratory

2
Outline
  • Motivation / Why do we care?
  • Related Work / What have others done?
  • NetLoggers Objective / What would we like to do?
  • Background / What is NetLogger?
  • How does NetLogger address the problems?
  • What are the results / costs of the solution?

3
Motivation
  • Large-scale applications are widely used in
    science and business.
  • Astronomy, Biology, Weather Models, etc.
  • Large-scale apps are complex and difficult to
    debug and optimize.
  • Large number of concurrent operations
  • Distributed resources
  • Hard to find bottlenecks

4
Related Work
  • Applications can be tightly coupled, loosely
    coupled or uncoupled.
  • Tools have mostly focused on tightly coupled
    applications.
  • Profiling and Tracing code segments. (TAU,
    Paraver, FPMPI, Intel Trace Collector)
  • Tools extended to loosely coupled apps
  • SvPablo Auto code instrumentation and
    statistics collected for sections of source code.
  • Phopesy Auto code instrumentation and database
    of performance info. Tunable granularity.
  • Paradyn Dynamic instrumentation insertion at
    runtime. Designed for message passing and
    pthreads programs

5
End Objective
  • Focus on loosely coupled and uncoupled
    applications.
  • We would like a tool that can combine performance
    information of multiple resources and application
    components and expose their interactions.

6
NetLogger Background
  • Log Generation calls to logger libraries added
    to source code at critical points to create event
    logs.
  • Log Management The various logs are collected
    and merged based on event timestamps.
  • Visualization and Analysis Events, systems
    stats and lifelines are displayed.

7
Extensions to NetLogger
  • Scaling NetLogger to large scale systems (100s
    of machines)
  • Collecting distributed log files
  • Evaluating large log data-sets
  • Addition of Work Flow identifiers

8
Log Collection and Management
  • Netlogd
  • Collection daemon which accepts logs across the
    network (UDP or TCP)
  • Nlforward
  • For finer-grain instrumentation, events can be
    written to local disk and forwarded in batches
  • Nldemux
  • Server-side tool to scan incoming logs
  • Split events into separate files
  • Allows for log file rollovers.

9
Sifting Through the data
  • Huge amount of log data from just 5 nodes
    obscures important events.

10
Anomalous Workflow Detection Tool
  • Define a linear sequence of events in a
    configuration file.
  • Mark any workflow lifeline that is missing these
    events.
  • Problems
  • We would like some context for normal behavior.
    (solved by and option to include neighbors of
    anomalous lifelines)
  • Too many events to keep them all in memory for
    scanning.

11
Solutions
  • Solution 1.
  • Create a histogram with 100 bins for normal
    workflow execution times.
  • Timeout when after 99th percentile.
  • Runs in fixed memory footprint.
  • Supports additional parameters (min time, max
    time, etc)
  • Solution 2
  • Calculate a running mean and standard deviation
    of workflow runtimes.
  • Assumes statistically normal distribution of
    times.

12
NetLogger Workflow-logging Architecture
13
New Log Visualization
  • 3 incomplete events from previous picture shown
    in blue with context events shown in red.
  • Able to detect several errors in SNFactory
    Workflow application.

14
Key Differences in NetLogger
  • Use of Lifelines to trace sequence of actions.
  • Workflow anomaly detection.
  • Facilitate log collection from multiple
    locations.
  • Manual instrumentation of source code.
  • Must have source code and understand it.

15
The End.
  • Questions?
  • Comments?
Write a Comment
User Comments (0)
About PowerShow.com