Title: DoCorp: Full Text System Information Integration
1DoCorp Full Text System Information Integration
- Wei Xu
- Armando Fox
- David Patterson
2RadLab research overview
DC spec
compiler
logical config
Policy-aware Switching Layer
policy verification
log mining
per node SW stack
Web 2.0 apps
monitoring data
Ruby on Rails interpreter
web svc APIs
physical config
trace collection
drivers
local OS functions
VM monitor
3Motivation
- Development-operation more tightly coupled
- Less distinctive developer and operator
- Distributed systems as building blocks
- Few people can understand details of every
components - Too much information for human
- Too unstructured for machine
- Information asymmetry
4Problem and related work
- Source code
- Comments
- Version control logs
- Bug tracking
- Debug logs
- Scripts
- Configuration files
- Experience
- Console logs
- User feedback
Human
- Versioning diffs
- Cruise Control
- Test configuration
- Profiling
- Measurements from framework instrumentation
- OS counters
Machine
Developer
Operator
5Goal of DoCorp
- Source code
- Comments
- Version control logs
- Bug tracking
- Debug logs
- Scripts
- Configuration files
- Experience
- Console logs
- User feedback
Human
DoCorp Mining full text information
- Versioning diffs
- Cruise Control
- Test configuration
- Profiling
- Measurements from framework instrumentation
- OS counters
Machine
Developer
Operator
6Goal of DoCorp
- Developer-Operator Corpus Analysis Tool
- Bridging operators and developers
- Discover connections among operator data and
developer data - Browsing through different abstraction levels
- Based on text mining
- Bridging human and machine
- Make unstructured data structured
- Labeling
- mining structured data is easier
- Make verbose data human friendly
- Indexing / search / selection
- visualization
7Console logs
- Console log is the natural information developer
conveys to operators - A reasonable developer usually logs what he/she
believes to be important - Normally used as a debugging tool
- Usually less verbose than automatic tracing logs
- Give insights on program execution and internal
states
8Console log meets source code
- Console logs are neither machine-friendly nor
human friendly - Highly unstructured
- Very verbose
- Related information is scattered
- Multi-threading / distributed systems
- Make console logs structured
- Logs are generated from source code, which is
designed to be machine understandable - Helps new developers / operators understand
source code better
9Structured info extractor
Class Server void periodicTask ... LOG.info(
host is managing usage.partitionCnt
partitions, read usagestat.readcnt)
Source
(.) is managing (.) partitions, read(.)
Regular expression
Console Log
070505 122323 rad10 is managing 23 partitions,
read32222
Structured Data
host rad10 usage.partitionCnt
23 usagestat.readcnt 32222
Class name Server Function name
periodicTask Line number 234 Source version
2345
timestamp 070505 122323 position in log
10System structure
Other info (affects ranking/selection)
Source File
Structured Info Extractor
Indexer (Lucene)
Indexed Logs
Offline Console Logs
Structured data
Offline
Searcher Navigational UI
Source File
11Future work
- Integrating more information
- Versioning
- Bug Tracking
- Configuration
- Correlation discovery among sources
- Ranking and selection
- Metrics based on textual info other time series
data - Visualization and UI
- Tools for analyzing extracted structured data