Visible Ops: Building Effective - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Visible Ops: Building Effective

Description:

'Grant me the Serenity to accept the things I can not change, Courage ... Dr. Reinhold Niebuhr (excerpt from the Serenity Prayer) Phase 1: What You Have Built ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 46
Provided by: gene71
Learn more at: http://www.hpc.unm.edu
Category:

less

Transcript and Presenter's Notes

Title: Visible Ops: Building Effective


1
Visible OpsBuilding Effective Auditable ITIL
Change Management Processes in 4 StepsPhase One
  • Gene Kim, CTO, Tripwire, Inc.October 27, 2004

2
The Challenges
  • How do I simultaneously contain costs, improve
    security and service levels, and address
    regulatory compliance?
  • What is my first step in building an ITIL change
    management process? How will I know that its
    working?
  • What order should I tackle the ITIL process
    areas?
  • How do I attest to auditors that I have effective
    change management processes?
  • Sarbanes-Oxley Section 404
  • HIPAA, GLBA, CFR11a, etc.
  • How do COBIT and ITIL fit together?
  • How do I create a good working relationship with
    my auditors?
  • What do auditors doing controls-based auditors
    look for?
  • What happens if they cannot find effective
    controls?

3
Agenda
  • Examine the high-performing IT operations and
    security organizations
  • What they all have in common
  • What we can learn from them
  • Define the ideal working relationship between IT
    and audit
  • Why auditors talk the way they do
  • What auditors need to see
  • Building auditable and effective change
    management processes in four steps
  • Stabilize Patient
  • Catch Release and Find Fragile Artifacts
  • Establish Repeatable Build Library
  • Enable Continuous Improvement

4
The Highest Performing IT Organizations
  • High performance Ops and Security organizations
    have
  • Highest ratio of staff deployed on pre-production
    processes
  • Lowest amount of unplanned work
  • Highest change success rate
  • Best posture of compliance and security

5
Common Process Areas Of High Performers
  • All the high-performers had self-derived the same
    way of working
  • Culture of change management
  • Culture of causality
  • Culture of compliance and desire to continually
    reduce variance

6
Common Traits Of The Highest Performers
  • Culture of change management
  • Integration of IT operations and security
    processes via problem management and change
    management processes
  • Processes that serve both organizational needs,
    as well as business objectives
  • Highest rate of effective change (approved
    changes, change success rate)
  • Culture of causality
  • Highest service levels (MTTR, MTBF)
  • Highest first fix rate (unneeded rework)
  • Culture of compliance and continual reduction of
    operational variance
  • Production configurations
  • Highest level of pre-production staffing
  • Effective pre-production controls
  • Effective pairing of preventive and detective
    controls

7
Causal Factors of IT Downtime
Operator Error 60
System Outages 20
5
Security Related
15
Non-Security Related
Application Failure 20
Source IDC, 2004
8
Capability Levels
Organization controls the changes
  • 4 - Continuously Improving
  • lt5 of time spent on unplanned work
  • Change success rate is very high
  • Service levels are world class
  • IT operating costs are under control
  • Can scale IT capacity rapidly with marginal
    increases in IT costs
  • Change review and learning processes are in place
  • Able to increase capacity in a cost-effective way

Changes control the organization
  • 3 - Closed-Loop Process
  • 15-35 of time spent on unplanned work
  • Some ticketing / workflow system in place
  • Changes documented and approved
  • Change success rate is high
  • Service levels are pretty good
  • Server-to-admin ratio is good, but not BoB
  • IT costs are improving but still too high
  • Security incidents down

  • 2 - Using Honor System
  • 35-50 of time spent on unplanned work
  • Some technology deployed
  • You have the right vision but no accountability
  • Server-to-admin ratio is way too low
  • IT costs are too high
  • Process subverted by talking to the right people
  • 1 - Reactive
  • Over 50 of time spent on unplanned work
  • Chaotic environment lots of fire fighting
  • MTTR is very long poor service levels
  • Can only scale by throwing people at the problem

Effectiveness
Reactive
Using The Honor System
Closed-Loop Change Mgt
ContinuouslyImproving
Based on the IT Process Institutes Visible Ops
Framework
9
Why Auditors Do The Things They Do
  • Given enough time and resources, auditors would
    love to count all the beans
  • Go into the warehouses, open up all the
    containers, and inspect all the contents
  • Rarely does this actually happen, for obvious
    reasons
  • Instead, auditors go to the bean counting
    machine to see whether the results are
    trustworthy
  • What controls ensure that it hasnt been
    subverted?
  • What controls ensure that the results are
    correct?
  • For a variety or reasons, auditors are shifting
    from substantive audits to control audits

10
IT Controls 101
  • Preventative Controls
  • Separation of duties
  • Change management and authorization processes
  • Detective Controls
  • Production controls around change management and
    configuration management
  • Corrective Controls
  • Restoration and backup systems

11
Ideal Attestation of Controls
  • High performing shops typically have the highest
    service levels and the lowest cost of controls
  • Best service levels (MTTR, MTBF), lowest amount
    of unplanned, unscheduled work, highest
    server/sysadmin
  • Best working relationship with audit.
  • Least amount of time dedicated to compliance
    activities
  • Why?
  • They can point to their change management and
    governance process (preventative controls)
  • They can show that the processes are working
    (detective controls)
  • How?
  • Change management meeting minutes
  • Three-ring binder of change orders and verified
    changes

12
COBIT and Change Management
13
COBIT AI6 Managing Changes
Control Objective Tripwires Role
6.5 Documentation and Procedures The change process should ensure that whenever system changes are implemented, the associated documentation and procedures are updated accordingly. Tripwire is used to validate that all changes are tracked, synchronized with documentation (run books, etc.), and applied consistently across the appropriate systems.
6.6 Authorised Maintenance IT management should ensure maintenance personnel have specific assignments and that their work is properly monitored. In addition, their system access rights should be controlled to avoid risks of unauthorised access to automated systems. By reporting what changed on each system, when it occurred, and who made the change, Tripwire is used to ensure that all changes made are authorized, and made by authorized personnel . Out of scope changes, inconsistently applied changes, changes that occur outside the maintenance window, and other inappropriate changes are therefore discovered before they impact system availability.
6.7 Software Release Policy IT management should ensure that the release of software is governed by formal procedures ensuring sign-off, packaging, regression testing, handover, etc. Tripwire enables IT management to validate that formal sign-off processes are adhered to. Tripwire is also commonly used to ensure that packages are not altered during handoffs (through pre- and post- handoff comparison of released packages).
6.8 Distribution of Software Specific internal control measures should be established to ensure distribution of the correct software element to the right place, with integrity, and in a timely manner with adequate audit trails. Tripwire can compare changes that occur on production systems back to a reference baseline to ensure that software distribution happens consistently across target systems, within the prescribed time. All changes are recorded for historical and audit-related reporting and analysis.
14
COBIT DS9 Managing The Configuration
Control Objective Tripwires Role
9.2 Configuration Baseline IT management should be ensured that a baseline of configuration items is kept as a checkpoint to return to after changes. Configuration baselines are a core competency of Tripwire software. Tripwire maintains a history of current and previously authorized baselines to determine whether the current (as-is) device and system configuration matches the authorized (as-specified) state, according to the configurations youve authorized for use in your environment. Tripwire can also enable rollback to an authorized state either by performing the rollback directly or providing a manifest to drive third-party restoration / provisioning systems.
9.4 Configuration Control Procedures should ensure that the existence and consistency of recording of the IT configuration is periodically checked. Tripwire integrity checks provide the means to assess the existence, consistency, and conformance of device and system configurations. These checks can be performed on an automatic, ongoing basis, as well as initiated on-demand by administrators.
9.5 Unauthorised Software Clear policies restricting the use of personal and unlicensed software should be developed and enforced. The organisation should use virus detection and remedy software. Business and IT management should periodically check the organisations personal computers for unauthorized software. Compliance with the requirements of software and hardware license agreements should be reviewed on a periodic basis. Tripwire is frequently used by customers to identify unauthorized or rogue applications within the production environment. This aids in enforcing configuration standards, as well as assisting in identification. Isolation, and recovery from day zero attacks from viruses or worms
15
The Tragic Truth About Auditors
  • Auditors gravitate to where controls appear
    weakest
  • To attract the attention of auditors, have
    unexplained outages and lots of unexplained
    changes
  • The top leading indicators of risk when we look
    at an IT operation are poor service levels and
    unusual velocity of changes. Bill Philhower

16
Visible Ops Four Steps To Build An Effective
Change Management Process
  • Each of the four Visible Ops steps is
  • A finite project not a ISO 9001 initiative or a
    vague 5-year vision
  • Catalytic returns more resources to the
    organization than it consumes, fueling the next
    steps
  • Sustaining process stays in place, even when the
    initial force behind it disappears
  • Auditable supports factual reporting and
    attestation to process adherence and consistency
  • Ordered must be done in the specified order to
    achieve the above
  • Model based on five years studying
    high-performing IT Ops and Security organizations
  • Visible Ops has been donated to the ITPI

17
Visible Ops Four Steps To Build An Effective
Change Management Process
Phase 2 Catch and Release, Find Fragile Artifacts
Phase 3 Establish Repeatable Build Library
Tripwire protects fragile artifacts. Tripwire
enforces change freeze and prevents configuration
drift.
Tripwire captures known good state in
preproduction. Tripwire captures production
changes that need to be baked into the build.
Phase 1 Electrify Fence, Modify First Response
Phase 4 Continually improve
Tripwire enforces the change process. Tripwire
rules out change as early as possible in the
repair cycle.
Tripwire detects change, which all process areas
hinge upon.
18
Phase 1 Stabilize Patient, Modify First Response
Phase 1 Stabilize Patient, Modify First Response
Tripwire and IP Services
19
Issues
  • We have a tendency to light and fight our own
    fires
  • 80 of outages are self-inflicted
  • 80 of MTTR is dominated by asking what
    changed?
  • With sufficiently low change success rate, high
    rate of change, and high MTTR, we are spending
    all our time doing unplanned, unscheduled work
  • Best in class 5 of OpEx is spent on unplanned
    work
  • Average estimated around 25-45
  • Changes are made without authorization, proactive
    scheduling, or full documentation

"The most likely way the world will be destroyed,
most experts agree, is by accident. That's where
we come in we're computer professionals. We
cause accidents." Nathaniel Borenstein
20
Stabilize Patient
  • Curb the major cause of outages 80 of outages
    are self-inflicted
  • Identify critical patients, clear everyone away
    from them unless they are authorized to operate
  • Document this new change policy no changes
    unless authorized (preventative)
  • At this point, anyone even holding a scalpel
    should be viewed with suspicion

21
Electrify The Fence
  • We have now prescribed our first preventative
    change process and policy
  • Why do most change management initiatives fail?
  • What is the top audit finding around change
    controls?
  • Now we must manage by fact instead of manage
    by belief by electrifying the fences
  • No one is allowed to be inside the change fence
    except on the weekends
  • Why did Joe Bob touch the fence on Monday at
    211am?
  • Document what should happen to Joe Bob
  • Public shaming, take a day off, or more

What is often overlooked is that if one person
can single-handedly save the ship, that one
person can probably single-handedly sink the
ship, too. -- Unknown
22
Create Change Team
  • Get all necessary stakeholders who can best make
    decisions about changes, encompassing business
    goals, operational risks, technical risks, etc.
  • Key stakeholders for us Security Lead, Ops
    Systems Engineering Lead, VP of Operations,
    Service Desk Manager, Director of Network
    Operations, and Internal Audit
  • Create weekly change management meetings
    mandatory for all CAB members.

23
Hold Weekly Change Management Meetings
  • Create a path from desired change, to requested
    change, authorized change, scheduled change,
    implemented change, verified change.
  • Review implemented changes and ensure that all
    actual changes mapped to authorized work
  • Enable highest change throughput for the
    organization, best serve business needs, with the
    least amount of bureaucracy possible
  • Weekly 15 min change management meetings are
    possible, with practice
  • Keep good records of requested changes,
    authorized changes, and scheduled changes

24
Change Management Guidelines
  • Dont
  • Dont authorize changes that do not have rollback
    plans that everybody reviews
  • Dont allow rubber stamping approval of changes
  • Dont let any system changes off the hook
    someone made it, so understand what caused it
  • Do
  • Do post-implementation reviews to determine
    whether the change succeeded or not
  • Do track the change success rate
  • Do use the change success rate to avoid making
    historically risky changes

Its not the strongest species that survive, nor
the most intelligent but the one most responsive
to change. Charles Darwin
25
Spectrum Managing Change
  • Dont expect to be doing closed loop change
    management right out of the chute awareness is
    better than being oblivious, managed is better
    than unmanaged!
  • Spectrum
  • Oblivious to change "Hey, did the switch just
    reboot?"
  • Aware of change "Hey, who just rebooted the
    switch?"
  • Announcing change "Hey, I'm rebooting the
    switch. Let me know if that will cause a
    problem."
  • Authorizing change "Hey, I need to reboot the
    switch. Who needs to authorize this?"
  • Scheduling change "When is the next maintenance
    window - I'd like to reboot the switch then?"
  • Verifying change "Looking at the fault manager
    logs, I can see that the switch rebooted as
    scheduled."

This is what SO-404 requires! (Preventative and
detective controls)
26
Create Trusted Authorized Work Queue and Change
Calendar
  • Create a work ticketing system that contains all
    the authorized work that went through the change
    management process
  • Create a change calendar (Forward Schedule Of
    Change) that the change manager uses to
    coordinate resources, manage risks, etc.

27
Modify First Response (1/2)
  • The key to a catalytic change management process
    is that it must return value back to the
    organization
  • Decrease MTTR, dominated by 80 where people ask
    what changed? by integrating change management
    process into problem management
  • Whenever problem managers are mobilized, have all
    authorized changes and actual changes in the work
    ticket

The Microsoft MOF study showed that their best in
class customers rebooted their servers 20x less
often, and also had 5x fewer blue screens of
death.
28
Modify First Response (2/2)
  • Eliminate change as early as possible by
    identifying the assets directly involved in the
    ticket and auditing them against their
    configuration baseline for the last 72 hours. All
    changes found are attached to the ticket.
  • If no changes are found the circle is widened to
    include changes made to infrastructure supporting
    the target systems.

Grant me the Serenity to accept the things I can
not change, Courage to change the things I can,
and Wisdom to know the difference. Dr.
Reinhold Niebuhr (excerpt from the Serenity
Prayer)
29
Phase 1 What You Have Built
  • Documented correct path from desired change to
    authorized change, scheduled change, implemented
    change, and verified change
  • Created documentation that the process is working
  • Returning value back to IT Ops by reducing MTTR,
    increasing change success rate and effective
    change throughput

30
What To Show The SO-404 Teams
  • Change governance and management processes
  • Meeting minutes of the change management meetings
  • Authorization processes
  • Three ring binder of stapled items
  • Authorized work order
  • Change report on infrastructure showing correct
    changes made
  • Signature of change manager verifying correct
    implementation of change

31
What To Show The Auditors
  • List of all outages and unscheduled downtime
  • Change management metrics
  • Change rate (per week)
  • Change success rate
  • MTTR, MTBF
  • This would make most auditors breathe a sign of
    relief

32
Visible Ops Four Steps To Build An Effective
Change Management Process
Phase 2 Catch and Release, Find Fragile Artifacts
Phase 3 Establish Repeatable Build Library
Tripwire protects fragile artifacts. Tripwire
enforces change freeze and prevents configuration
drift.
Tripwire captures known good state in
preproduction. Tripwire captures production
changes that need to baked into the build.
Phase 1 Electrify Fence, Modify First Response
Phase 4 Continually improve
Tripwire enforces the change process. Tripwire
rules out change as early as possible in the
repair cycle.
Tripwire detects change, which all process areas
hinge upon.
33
Which Metric Do You Want To Improve?
Phase 4
  • Release
  • Time to provision known good build
  • turns to a known good build
  • Shelf life of build
  • of systems that match known good build
  • of builds that have security sign-off
  • of fast-tracked builds
  • Ratio of release engineers to sysadmins
  • Controls
  • of changes authorized per week
  • of actual changes made per week
  • Change success rate
  • of emergency changes
  • of service-affecting outages
  • of special changes
  • of business as usual changes
  • Change management overhead
  • Configuration variance
  • Resolution
  • MTTR, MTBF
  • of time spent on unplanned work

34
Why Is Unplanned Work Such A Good Indicator?
of productionchanges
failed change or unauth changes
mean timeto repair
of time spenton unplanned work

X
X
Highperformer gt 1000 chg/wk lt 1 minutes lt 5 of OpEx
Average unknown,hundreds 30-50 (avg) hours,days 35-45 of OpEx
Average 35-45 of OpEx spent on unplanned
work! Impact late projects, rework, compliance
issues, uncontrolled variance, etc
35
What Affects These Variables?
of productionchanges
mean timeto repair
of time spenton unplanned work
failed change or unauth changes

X
X
Behaviors that increase change success rate
Effective change testing Effective risk review
when approving changes Effective identification
of change stakeholders Effective change
scheduling
Behaviors that decrease MTTR Culture of
causality desire to rule out change first in
problem repair cycle Effective change
management process that can report on authorized
and scheduled changes Ability to distinguish
planned and unplanned outage events Effective
communications around scheduled changes
Effective monitoring of infrastructure for
production changes
Behaviors that reduce unauthorized changes
Culture of change management Management
ownership of change process Effective
monitoring of infrastructure with detective
controls to enforce change process Management
use of corrective action when change processes
are not followed
36
What Do These Transformations Look Like?
  • Examples
  • Joe Judge at Adero
  • Ken Larson at Schlumberger-SEMA
  • Kevin Behr at IP Services
  • Financial returns of process transformations
  • Increased availability and decreased MTTR
  • Reduction of unplanned work from 50 to 5 of
    OpEx
  • Increased delivered capacity by 2x with 10
    increase in OpEx
  • Increased delivery of planned projects that
    deliver higher value to the business
  • Fulfilled compliance and reduced cost of
    compliance

37
Why Do Auditors Love Continuous Improvement?
  • Controls are owned by the business to meet
    business objectives! Instead of there only to
    make auditors happy!
  • Auditors hate dragging organizations to implement
    controls, especially if creates grudging and
    literal interpretations of findings
  • Continuous improvement requires process and
    controls, to detect and reduce variance

38
ITIL and COBIT
  • ITIL defines the set of all IT operational
    processes
  • COBIT defines all the controls that can be
    wrapped around them
  • ITIL and COBIT are complementary and orthogonal
  • Six Sigma defines how to build processes and
    their corresponding controls to continually
    monitor and reduce variance
  • ITIL defines the change management processes
  • COBIT defines the controls to ensure that the
    ITIL processes are auditable and effective

39
Caught in the Crossfire of Change
  • Rate of change is increasing with no signs of
    slowing

Business objectives
SarbOx, GLBA, CISP, etc.
Distributed systems
Heterogeneous environments
Quality improvement
Service levels
Risk mitigation
Staffing Budgets
40
Getting Control of Change
  • Control frameworks prescribe internal controls to
    enhance operational performance, security, and
    regulatory compliance
  • COBIT, ITIL, ISO17799, SAS70

Preventive
Corrective
Detective
Change Management
41
Tripwire Change Auditing Solutions
  1. Actual changes are detected on production systems
    and reconciled with approved and intended changes
  1. Change auditing results then flow back to change
    tickets, trouble tickets, audit and mgmt reports,
    plus configuration mgmt databases (CMDB)

42
Can You Answer These Questions?
  • Pick any piece of your infrastructure (router,
    server, firewall, etc.)
  • If a change is made to this device, how will you
    know?
  • How soon will you know?
  • How will you know if the change is good or bad?
  • How long will that process take?
  • What happens when the change is good?
  • What happens when the change is bad?
  • How do you verify that each change has been
    reconciled?
  • How do you report on all of the above?
  • Can you provide a historical report accounting
    for all changes in your environment?
  • This is what auditors want to know about how
    changes are managed in your IT infrastructure
  • With Tripwire, you can answer all of these
    questions

43
Improving Service Quality And Availability
Customer IT Services operations of a Major
Energy Services company
  • Problem Change management in place, but lacked
    enforcement
  • Saw changes occurring, but didnt have the means
    to validate
  • Tripwire solution
  • Tripwire detects change and puts teeth in the
    process
  • Tracking What, When, Who, How and Why a change
    was made
  • Tripwire provides black and white documentation
    to enforce process
  • Increased staff efficiency, uptime, and service
    quality
  • We used to spend 45 to 50 of our time on
    unplanned work. Now its around 5.
  • In spite of force reductions, customers describe
    our services as phenomenally better now.

44
Get Involved!
  • Join ICOPL (ITPI Community Of Practice List-Serv)
  • http//www.itpi.org/home/icopl.php
  • There is now a Visible Ops Pocket Guide!
  • http//www.itpi.org/home/visibleops.php
  • We are looking for volunteers to help with our
    research projects.
  • IMCA is now online at the ITPI
  • http//www.itpi.org/home/imca.php
  • If you have a high performing organization, we
    want to study you!

45
Summary
  • Control is possible. We merely need to look at
    the high-performing IT organizations to confirm
    this.
  • Transformation is possible. Visible Ops is the
    result of years of studying high-performing IT
    operations and security organizations in
    conjunction with the ITPI
  • Visible Ops illustrates how interested
    organizations might replicate the processes of
    these high-performing organizations in just four,
    achievable steps
  • Gene Kim genek_at_tripwire.com
Write a Comment
User Comments (0)
About PowerShow.com