DoCorp: Full Text System Information Integration - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

DoCorp: Full Text System Information Integration

Description:

Extractor. Indexer (Lucene) Source File. Indexed Logs. Searcher ... Live Extractor. Live Log Viewer. Saved Queries. Live monitor script generator. Regular Expr. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 12
Provided by: Wei111
Category:

less

Transcript and Presenter's Notes

Title: DoCorp: Full Text System Information Integration


1
DoCorp Full Text System Information Integration
  • Wei Xu
  • Armando Fox
  • David Patterson

2
RadLab research overview
DC spec
compiler
logical config
Policy-aware Switching Layer
policy verification
log mining
per node SW stack
Web 2.0 apps
monitoring data
Ruby on Rails interpreter
web svc APIs
physical config
trace collection
drivers
local OS functions
VM monitor
3
Motivation
  • Development-operation more tightly coupled
  • Less distinctive developer and operator
  • Distributed systems as building blocks
  • Few people can understand details of every
    components
  • Too much information for human
  • Too unstructured for machine
  • Information asymmetry

4
Problem and related work
  • Source code
  • Comments
  • Version control logs
  • Bug tracking
  • Debug logs
  • Scripts
  • Configuration files
  • Experience
  • Console logs
  • User feedback

Human
  • Versioning diffs
  • Cruise Control
  • Test configuration
  • Profiling
  • Measurements from framework instrumentation
  • OS counters

Machine
Developer
Operator
5
Goal of DoCorp
  • Source code
  • Comments
  • Version control logs
  • Bug tracking
  • Debug logs
  • Scripts
  • Configuration files
  • Experience
  • Console logs
  • User feedback

Human
DoCorp Mining full text information
  • Versioning diffs
  • Cruise Control
  • Test configuration
  • Profiling
  • Measurements from framework instrumentation
  • OS counters

Machine
Developer
Operator
6
Goal of DoCorp
  • Developer-Operator Corpus Analysis Tool
  • Bridging operators and developers
  • Discover connections among operator data and
    developer data
  • Browsing through different abstraction levels
  • Based on text mining
  • Bridging human and machine
  • Make unstructured data structured
  • Labeling
  • mining structured data is easier
  • Make verbose data human friendly
  • Indexing / search / selection
  • visualization

7
Console logs
  • Console log is the natural information developer
    conveys to operators
  • A reasonable developer usually logs what he/she
    believes to be important
  • Normally used as a debugging tool
  • Usually less verbose than automatic tracing logs
  • Give insights on program execution and internal
    states

8
Console log meets source code
  • Console logs are neither machine-friendly nor
    human friendly
  • Highly unstructured
  • Very verbose
  • Related information is scattered
  • Multi-threading / distributed systems
  • Make console logs structured
  • Logs are generated from source code, which is
    designed to be machine understandable
  • Helps new developers / operators understand
    source code better

9
Structured info extractor
Class Server void periodicTask ... LOG.info(
host is managing usage.partitionCnt
partitions, read usagestat.readcnt)
Source
(.) is managing (.) partitions, read(.)
Regular expression
Console Log
070505 122323 rad10 is managing 23 partitions,
read32222

Structured Data
host rad10 usage.partitionCnt
23 usagestat.readcnt 32222
Class name Server Function name
periodicTask Line number 234 Source version
2345
timestamp 070505 122323 position in log
10
System structure
Other info (affects ranking/selection)
Source File
Structured Info Extractor
Indexer (Lucene)
Indexed Logs
Offline Console Logs
Structured data
Offline
Searcher Navigational UI
Source File
11
Future work
  • Integrating more information
  • Versioning
  • Bug Tracking
  • Configuration
  • Correlation discovery among sources
  • Ranking and selection
  • Metrics based on textual info other time series
    data
  • Visualization and UI
  • Tools for analyzing extracted structured data
Write a Comment
User Comments (0)
About PowerShow.com