Title: Kinesthetics eXtreme: An External Infrastructure for Monitoring Distributed Legacy Systems
1Kinesthetics eXtreme An External Infrastructure
for Monitoring Distributed Legacy Systems
- Presented by
- Jie Feng
- jfeng_at_cse.unl.edu
- Advised by Dr. Ying Lu
2Give Credits
- I have borrowed some slides from Gail Kaiser,
Programming Systems Lab Columbia University - I have modified them and added new slides
3Outline
- Introduction and Retrofitting autonomicity
- KX architecture and components
- Probes
- Gauges
- Coordinated Effectors
- Architectural Model-based Analysis
Decision - Case studies
- Contributions and future work
4Introduction
- Self management System
- Autonomic Computing
- Retrofitting Autonomicity
5Computer Software Systems are Not Self-Managing
NASA Satellite Support
6Why is this a Problem?
- Software bugs and hardware failures
- Too many users, not enough resources
- Too many computational devices
- Security Denial of service attacks, Internet
worms and viruses, Intruders (outsiders and
insiders) - Human management of increasingly complex software
systems is expensive, time-consuming, and
error-prone
7Autonomic Computing
- Goal is to develop technologies to
enableself-managing software systems - Many RD communities are excited about the idea
of self-managing systems enterprise
applications, networking, safety-critical
systems, high-performance computing, - And are building new software systems with
self-management capabilities
8Legacy Systems
- A legacy system is any software system that
already exists and is in use - such as the
nations critical information infrastructure and
defense information systems - Replacing all existing systems with new autonomic
computing systems would be very expensive and
take a long time - They are investigating how to retrofit legacy
systems with autonomic capabilities
9Retrofitting Autonomicity
- Inserting adaptation (self-xyz) mechanisms into
existing application code is difficult,
error-prone and costly, and hard to reuse or
reason about - Authors seek to enable autonomic properties
through a solution orthogonal to the legacy
systems main business logic and communication
framework - Introduce common external infrastructure, and
this infrastructure becomes an integral part of
the system-of-systems self, co-existing and
cooperating with the systems functional
mechanisms
10Autonomizing Legacy Systems
decision
analysis
legacy system
sensors
effectors
11Feedback Control Loop
- Monitor and Adapt
- OODA (Observation, Orientation, Decision, Action)
Loop - Sensors observe, Analysis orient, Decision
decide, Effectors act
12- Kinesthetics eXtreme
- An External Infrastructure for
Monitoring Distributed Legacy Systems
13KX Architecture
Decision
Controllers
Behavior Models
Gauge Bus
Interpretation
Gauges
Collection
Probe Bus
Probes (Sensors)
Legacy System(s)
Configuration (Adaptation )
Effectors
14KX Architecture Components
- Sensors instrument the components and connectors
of a running system to collect data about the
execution - Gauges interpret sensor data, by filtering and
aggregating the data streams to recognize
abstract semantic events - Controllers determine which semantic events
warrant system adaptation and construct repair
processes - Effectors apply reconfigurations, replacements,
tuning, etc. to individual components of the
running system
15System overview
16Probes (Sensors)
- Called probes in the paper
- No specific probes used. Depends on the
application - A variety of technologies can be used
- Inject into source, byte codes, binaries
- Wrap DLLs or other dynamic libraries
- Monitor application and system traffic and logs
-
- Not a major focus of this paper
17Gauges
- A main focus of this paper
- Event Packager
- Event Distiller
- Asynchronous publish/subscribe content-based
routing from packagers to distillers, among
distillers, and from distillers to controllers
18Event Packager
- Flight recorder
- event translation and log all events
- Global timestamp synchronization
- Logs data streams from one or more sensors to a
relational database - Supports later replay and data mining
19Event Packager, cont.
- The Event Packager has several top-level
constructs that do most of the work - An Event Type provides storage and conversion
facilities - for the
associated data - One or more Plugins an input, a transform, an
output, - a store
- And a Rule which binds one or more Inputs,
Outputs, and - Transforms
together to create a simple event - workflow given
a situation
20(No Transcript)
21(No Transcript)
22Event Distiller
- Event Distiller event pattern analysis and
monitoring
23Event Distiller
- Uses nondeterministic temporal state machines to
implement broad array of correlation and
pattern-matching of events across multiple
streams/sources - Complex temporal event patterns defined as
condition/success failure action rules - New rules (and thus state machines) can be added,
removed and manipulated on-the-fly - Supports timestamped event reordering, timeouts,
logic constructs, loops, rule chaining, variable
binding, garbage collection,
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28Controllers and Effectors
- Workflake controller
- Worklets Effectors
29Controllers and Effectors
- Decentralized workflow engine constructs,
instantiates and coordinates processes enacted by
one or more effectors - Currently use mobile agents as effectors, but
other technologies could be substituted - The subject of another paper
- Giuseppe Valetto and Gail Kaiser. Using Process
Technology to Control and Coordinate Software
Adaptation. International Conference on Software
Engineering, May 2003, pp. 262-272.
30KX Implementation
- Written primarily in Java (some C glue code)
- Event Packager XML, Siena, Elvin, SMTP, raw TCP,
Java RMI, etc. - Event Distiller a typical rulebase is several
hundred lines in an XML vocabulary - Download from http//www.psl.cs.columbia.edu/softw
are.html
31Case Studies
- The other main focus of this paper
32Beyond Conventional Autonomicity(Simple) Spam
Detection
- Specific attributes of each message were captured
by probes - Event distiller would trigger if detects multiple
messages(gt3) with same source and message ID in a
short timespan (lt10s) - Repair reconfigures Sendmail to block further
instances
33a
b
34Failure Detection and Load Balancing GeoWorlds
- GIS Mapping System
- Failure detection
- Load balancing pro-actively prevent thrashing
35Failure Detection and Load Balancing GeoWorlds
- GIS-based intelligence analysis system developed
by ISI and in experimental use at US PACOM - Uses external news sources such as CNN, BBC,
Yahoo, etc. so is subject to frequent glitches
(denial of service, source redesigns website)
36Failure Detection and Load Balancing GeoWorlds,
cont.
- Instrumented GeoWorlds Java source code using
WPIs AIDE tool, which emits XML events captured
by Event Packager - Event Distiller checks that initiation and
termination method calls match up within
timebound - Repair restarts or load balances internal
services, substitutes external services - Event Distiller detects oscillations (thrashing),
so Controller tries a different repair - Employed CMUs ACME architectural models
37start
end
38Failure Detection and Load Balancing GeoWorlds,
cont.
- Probes measure the overall load on system
- Results were piped into custom plugin for EP
- ACME architectural description was created
- The rules were generated and fed into the ED
dynamically - If load exceeds threshold, ED will detect and
report it - A triggered repair would cause the service to
move to a different host
39Failure Detection and Load Balancing TILAB
Instant Messaging
- TILAB IM System (Telecom Italia) KX Validated
for a variety of monitoring, reconfiguration and
repair requirements
40Failure Detection and Load Balancing TILAB
Instant Messaging
- Probing user sign on events and server request
queue determine the load of
elements and take actions - Failure detection is also supported from a load
balancing standpoint
41Contributions
- KX an implementation of an easily integrable
external monitoring infrastructure - A component replaceable and event-driven
meta-architecture - Can be used to add autonomic self-management and
self-healing functionality to legacy systems and
large-scale systems of systems - Reference implementation
- Several case study applications
42Future Work
- Automatic derivation of EP and ED rules
- Make KX internals more autonomic
- Machine learning of behavioral models, e.g.,
normal vs. anomalous event streams - Applications to intrusion detection
- Continuous (as opposed to discrete) raw data
streams, such as audio and video
43thank you
44(No Transcript)