LiveOps: Systems Management as a Service - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

LiveOps: Systems Management as a Service

Description:

Forensics. 29% of systems have run unauthorized processes ... Forensic Investigation. Enforcing Policies. Scenario 1 : Impact Analysis ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 24
Provided by: chadver
Category:

less

Transcript and Presenter's Notes

Title: LiveOps: Systems Management as a Service


1
LiveOps Systems Management as a Service
  • Chad Verbowski, Software Architect, MSR
  • Juhan Lee, Xiaogang Liu Microsoft MSN
  • Roussi Rousev Florida Institute of Technology
  • Yi-Min Wang Microsoft Research
  • 12/07/2006 Washington D.C.

2
Overview
  • Problem Space
  • Approach
  • Architecture and Challenges
  • Deployment and Results
  • Tech Transfers, Related and Future Work

3
Why Systems Management Is Hard
  • Administrators Dont Understand Their Systems
  • What processes are valid and should be running?
  • Which configuration binary/file/registry does
    my app need?
  • Will this change affect my Line of Business
    application?
  • Management Practices Cannot Be Enforced
  • No changes during lockdown periods?
  • Consistent updates applied across all machines?
  • No copies of source code to removable media?
  • Consequences
  • Reactive Problems are detected after they
    impact reliability
  • Expensive Entropy in System Configuration
    Create Complexity
  • Makes it harder to Troubleshoot, and mistakes
    more likely
  • Harder to secure systems, easier for malware and
    hackers to hide
  • Unreliable Problems reoccur because the root
    cause is not found

4
Systems Management Surveys
  • Reduce Support/Ops Cost Improve Effectiveness
  • More Than 40 of Ops Cost is People
  • People Cost Scales Linearly with Managed Servers
  • Improve Server Reliability
  • 33 of Outages caused by Human Error
  • 76 of Time-To-Repair is operator activity
  • What changes impact this app and who/what made
    them?
  • The most costly problems are not the most common
  • Improve Application Performance
  • 33 of operator time is spent on optimization
  • Which process is causing the system to be
    sluggish?
  • What resource is hanging my app, and which app
    has it?

5
The Change Management Process
!
Person or Automation
LiveOps
Is The Change Approved?
Change Request
FDR
Change Tools
Change Detected
OS Applications Platform
6
(No Transcript)
7
Overview
  • Problem Space
  • Approach
  • Architecture and Challenges
  • Deployment and Results
  • Tech Transfers, Related and Future Work

8
The Traditional Approaches
  • Signature Based Accept False (-) for Low False
    ()
  • Rather than complete analysis look for known bad
  • (AV/AS) Manual Sample Collection and Signature
    Derivation
  • (Mgmt) Manual events rules for well known
    problems
  • (Requires tuning to specific environments)
  • Manifest Manual Specification of Dependencies
  • Coverage for THINLeg Applications is hard
  • (Third Party, In-House, and Legacy Applications)
  • Resolving Late-Bound dependencies is difficult
  • Expensive to create a manifest for large
    applications
  • Keeping the manifest current is challenging

9
Our Approach
  • Trace All Interactions Between Applications and
    Configuration Interactions Provide Context for
    Understanding Configuration
  • Completeness Enables detection of anomalies we
    have not seen before
  • Comprehensive always on black-box tracing of
    low-level system activities
  • All File, Registry Process, and Module Load
    activities
  • Average of 20 MB/day, no discernable overhead
  • Extensive Cross-Time/Machine Analysis Minimizes
    False Positives
  • Automatically identifies patterns, baselines, and
    deviations
  • 3 seconds per machine day of data to
    analyze/process

10
Flight Data Recorder (FDR)
  • Airplanes Have Black Box Recorders
  • Track Performance Parameters and Cockpit Audio
  • Provide A TimeLine of Events For Understanding
    What Happened
  • FDR is The Black Box Recorder for Windows
  • All File, Registry, Module, Process interactions
  • Who, What, When, and How State is Used and
    Modified

11
Overview
  • Problem Space
  • Approach
  • Architecture and Challenges
  • Deployment and Results
  • Tech Transfers, Related and Future Work

12
LiveOps Architecture and Data Flow
13
Challenges
14
Overview
  • Problem Space
  • Approach
  • Architecture and Challenges
  • Deployment and Results
  • Tech Transfers, Related and Future Work

15
FDR Agent Deployment
  • Current Deployment 600 Machines
  • 500 Severs from 15 MSN properties
  • 107 Corporate Desktops
  • 1350 Distinct systems have provided data over 24
    months
  • Expanding Deployment 6500 Servers

16
Server Deployment Results
  • Critical Changes
  • Lockdown Violations analysis over 12 months
  • Most properties had at least 1 violation during
    each lockdown
  • Deleting Server Page File Settings
  • Happens every 2-3 months across 10s of servers
  • FDR tracked it to a remote registry change,
    likely from a script
  • Daily Changes
  • Typically more than 1 change impacts OS and LOB
    applications

17
Server Deployment Results
  • Forensics
  • 29 of systems have run unauthorized processes
  • Email clients, Media Player, Java auto-updating
    clients
  • 8 processes that could not be identified by
    security experts
  • mlconv.exe, monnow.exe, siteremover.exe,
    lsacacheagent.exe,
  • Performance
  • LOB application reads crypto keys 240
    times/second
  • Management agent continuously reading all service
    entries
  • Security Best Practices
  • 1/3rd of a systems were found to be running
    screen savers
  • 6 services not running within the machine account
    context

18
LiveOps Scenarios
  • Impact Analysis
  • Forensic Investigation
  • Enforcing Policies

19
Scenario 1 Impact AnalysisStale Binary after
Patch Installation
20
FDR Scenario 2 Forensic InvestigationWhere did
that process / file come from?
1
2
21
Historical Contextual Information (Drill In)
22
Scenario 3 Enforcing PoliciesWas that an
authorized and planned change?
23
Questions?
Write a Comment
User Comments (0)
About PowerShow.com