Using Simplicity to Control Complexity - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Using Simplicity to Control Complexity

Description:

The system never performs worse than before, even if the changes ... The More Alternatives the Merrier? RB2. RB3. RB10. RBn: n alternatives. lrs_at_cs.uiuc.edu ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 27
Provided by: Lui100
Category:

less

Transcript and Presenter's Notes

Title: Using Simplicity to Control Complexity


1
Using Simplicity to Control Complexity
  • Lui Sha
  • Department of CS
  • lrs_at_cs.uiuc.edu
  • UIUC
  • June, 2002

2
The Goal
  • Software systems are not static. They evolve.
  • Our goal is to develop an engineering foundation
    that allows us to evolve software systems
    dependably.
  • New features can be easily added, preferably
    online without down time
  • The system never performs worse than before, even
    if the changes have bugs or even contain
    malicious attack codes.
  • To realize this goal, we need to first understand
    the nature of software reliability, and
    demonstrate the viability of this idea in some
    important class of applications.

3
Which Side Would You Take?
  • How to improve the reliability and availability
    of increasingly complex software is a serious
    challenge. There are two philosophical positions
  • The diversity camp Diversity in crops resists
    diseases diversity in software improves
    reliability. The likelihood of making the same
    mistakes decreases as the degree of diversity
    increases. Dont put all your eggs in one basket.
  • The bullet-proof your basket camp Concentrate
    all the available resource to one version and do
    it right. Do-it-right-the-first-time is the time
    honored approach to quality products.

4
Software Development Postulates
  • In science we rely on facts and logic. Lets
    begin with well known observations in software
    development. We make 3 postulates
  • P1 Complexity Breeds Bugs Everything else being
    equal, the more complex the software project is,
    the harder it is to make it reliable.
  • P2 All Bugs are Not Equal You fix a bunch of
    obvious bugs quickly, but finding and fixing the
    last few bugs is much harder, if you can ever
    hunt them down.
  • P3 All Budgets are Finite There is only a
    finite amount of effort (budget) that we can
    spend on any project.
  • No so fast, Lui! Could please you define
    software complexity?

5
Residual Logical Complexity
  • Computational complexity is modeled as the number
    of steps to complete the computation. Likewise,
    logical complexity can be viewed as the number of
    steps that are needed to verify the correctness.
  • A program can have different logical and
    computational complexities. For example,
    comparing with heap-sort, bubble-sort has lower
    logical complexity but higher computational
    complexity. We focus on logical complexity in
    this talk.
  •  
  • Residue logical complexity. A program could have
    high logical complexity initially. However, if it
    has been verified and can be used as is, then the
    residue complexity is zero
  • In the rest of discussion, we shall focus on
    (residual logical) complexity of software.

6
The Implications of the 3 Postulates
  • P1 The Complexity Breeds Bugs postulate implies
    that for a given mission duration t, the
    reliability of software decreases as complexity
    increases.
  • P2 The All Bugs are Not Equal postulate implies
    that for a given degree of complexity, the
    reliability function has a monotonically
    decreasing rate of improvement with respect to
    development effort.
  • A reliability function in the form of R(Effort,
    Complexity, t) e-kC t/E satisfies P1 and P2
  • P3 The Finite Budget Assumption implies that
    that diversity is not free. That is, if we go for
    n version diversity, we must divide the available
    effort n-way. This allows us to compare
    different approaches fairly.

7
Modeling the Implications
  • This is equivalent to assume that
  • the commonly used reliability function e- ? t is
    a useful model
  • the failure rate, ?, in R(t) is proportional to
    complexity but inversely proportional to effort
    spent to the software.
  • Hold on Lui, how do you know failure rate is
    proportional to complexity and inversely
    proportional to efforts spent? For Gods sake,
    they could be very non-linear relations!
  • Ok, we will examine non-relationships later.

8
A Unified Framework
  • Recently Larry Bernstein extended the reliability
    model as follows
  • R e-kCt /E?
  • Where ? expresses the ability to solve a program
    with fewer instructions with a new tool such as a
    complier.
  • This equation expresses reliability of a software
    system in a unified form as related to software
    engineering parameters. The longer the software
    system runs the lower the reliability and the
    more likely a fault will be executed to become a
    failure. Reliability can be improved by
    investing in tools (?), simplifying the design
    (C), or increasing the effort in development to
    do more inspections or testing than required by
    software effort estimation techniques.
  • This a new idea. For this lecture, we assume ?
    1.

9
N-Version Programming - 1
  • Lets use the simple model to analyze N-version
    program under ideal condition that faults are
    independent. N-version programming suggests that
    we should independently develop N versions of
    programs according to the same specification. And
    then take the majority of the outputs.

3-version programming
10
N-version Programming - 2
  • It turns out that single-version is better than
    3-version is a robust result. Here are two
    examples.

3-version programming
11
Recovery Block
  • The idea of recovery block is that you develop
    several alternatives Checkpoint your state, try
    the primary and test the output. If it passes the
    acceptance test, use it. Otherwise, roll back and
    try another alternative We shall assume that we
    have perfect acceptance test for now.

12
The More Alternatives the Merrier?
13
Power of Simplicity
14
The Fly in the Ointment
  • Alas, it is difficult to develop high coverage
    acceptance tests. Consider the case of a uniform
    number generator.
  • Can you determine the distribution is indeed
    uniform using one isolated data point? No.
  • Can you determine the distribution with a large
    sample? Yes.
  • Many phenomena require a good size sample to
    diagnose. It is often difficult to diagnose a
    phenomenon with an isolated instance. This
    explains why it is so difficulty to determine the
    correctness of each individual program output.
  • Unfortunately, we cannot buffer a long sequence
    of outputs before we output them in many
    applications. We cant do it in interactive
    applications, nor can we buffer up the outputs in
    control applications
  • We need to find a way that tolerates incorrect
    outputs

15
Feedback Control of Software Execution
  • To tolerate output errors that cannot be detect
    instantaneously, the applications should have the
    following characteristics
  • Capability control When the system in an
    operational state, a single incorrect output
    cannot bring the system down instantaneously.
    (Cumulative errors can)
  • Measurable system behavior We can evaluate the
    system behaviors under the software control.
  • Control applications meet these 2 requirements.
    Control software error maps to measurable
    actuation errors. Errors are measurable and can
    be bounded by a combination of control authority
    and monitoring frequency.
  • A simple and reliable core to provide acceptable
    performance
  • Stability control the system under complex
    software control remain in states that are
    controllable by the simple and reliable
    controller.

16
The Idea
  • Joe is a new student who partied a bit too much.
    He masters bubble sort but only have 50 chance
    of writing a correct quick sort program.
  • He must submit a program that will be evaluated
    as follows
  • Correct and fast O(n log n) A
  • Correct but slow B
  • Incorrect F
  • What is Joes optimal strategy?

Quick Sort
Bubble Sort
Stability control the set of numbers to be
sorted cannot be altered. This is the
precondition for Bubble Sort.
17
Simplex Architecture
A simple verifiable core diversity in the form
of 2 alternatives feedback control of the
software execution.
Online replaceable
18
Admissible States
  • In the operation of a plant, there is a set of
    state constraints representing the safety,
    device physical limitations, environmental and
    other operation requirements.
  • They can be represented as a normalized polytope,
    CTX ? 1, in the N-dimensional state space. We
    must be able
  • take the control away from a faulty controller,
    before the system state becomes inadmissible
  • the future trajectory of the system state after
    the switch will stay within the set of admissible
    states.

State constraints
Admissible States
Operation Constraints and Admissible states
19
The Error Bounds
  • When cannot use the boundary of admissible states
    as switching rule due to the inertia of the
    physical plant.
  • Recovery region is closed with respect to the
    operations of simple controller. It is Lyapunov
    function inside the polytope.
  • The largest recovery region can be found using
    LMI.

20
System Development Process
  • The high assurance control subsystem
  • Application level using well-understood
    classical controllers
  • System software level using high assurance OS
    kernels such as certifiable Ada runtime
  • Hardware level using well-established and simple
    fault tolerant hardware configurations, such as
    pair-pair or TMR.
  • High assurance development and maintenance
    process, e.g., FAA DO 178B
  • Requirement management requirements here are
    limited to critical properties.
  •  
  • The high performance control subsystem
  • Application level advanced control
    technologies         
  • System software level using COTS real time
    operating systems and middleware
  • Hardware level using standard industrial
    hardware, e.g., VME
  • Standard industrial software development process
  • Requirement management features and performance
    are handled here.
  • System evolution supports, e.g., online
    replaceable components

21
Semi-Conduction Wafer Process State Control
Deposition rate Refractive index Si-H/Ni-H
bonds Uniformity etc.
DC bias Mass 60 (disilane) Mass 76
(triaminosilane)
SiH4 RF power Pressure
22
DoD Applications
SoftwareFault tolerance is particularly
useful for cases in which some new functionality
is available that has been only partially tested
but that might help to achieve the success of a
mission. By providing protection from faults,
Simplex enables such functionality to be applied
on a mission. Joint Strike Fighter (JSF)the
JSF mission software architecture builds on the
architectural principles developed under the
INSERT project http//www.sei.cmu.edu/pub/docume
nts/99.reports/pdf/news-sei-fall-1999.pdf The
Space and Naval Warfare Systems Command (SPAWAR)
has initiated a process to transition SIMPLEX
technology The technology will be transitioned
to the Surface Combatant for the 21st Century
(SC21), the Next Generation Carrier (CV(X)), and
other Navy systems. SIMPLEX includes a software
architecture, real-time middleware services and
supporting tools to allow the safe insertion of
new technology or upgrading of existing
technology in high-assurance real-time systems.
It permits the new technology to operate until an
error condition (system, timing or semantic
error) occurs at which time the system rolls back
to the baseline technology http//www.rl.af.mil/
tech/programs/edcs/Accomplishments.html
23
Summary
  • We should never trust complex software that is
    beyond our means to verify
  • Untrusted complex software are useful, provided
    that when it malfunctions its adverse impacts on
    system behaviors is observable and bounded by
    design
  • We need a simple and reliable core to provide
    minimal essential services and constrain the
    impacts of malfunction software so as not to let
    faults turn into failures

After 30 seconds of a planned 90 flight missile
test in the 70s, the clock was not properly
reset. The missile blew up. Some twenty-five
years later ATT experienced a massive network
failure caused by a similar problem in the fault
recovery subsystem they were upgrading. In both
cases, the system failed because there was no
limits placed on the results the software could
produce. There were no boundary conditions set.
Designers built with a point solution in mind and
without bounding the domain of software
execution. Testers were rushed to meet schedules
and the planned fault recovery mechanisms did not
work. --- Larry
Bernstein
24
Software Fault Model
  • Timing fault misses its deadlines
  • Capability abuse
  • Corrupt others code or data
  • Unauthorized acquisition of process/resource
    management capability
  • Semantic fault incorrect results that can lead
    to
  • Poor control performance
  • Instability in the plant

25
Recent Extensions Secured Reliable Upgrades
Code Data Access Attacks
Compiler Based Protection
Algorithmic attacks
Algorithm Based Protection
Resource Depletion attacks
OS Based Protection
26
Telelab
  • www-drii.cs.uiuc.edu/download
Write a Comment
User Comments (0)
About PowerShow.com