Backward and forward looking at dependable and secure computing PowerPoint PPT Presentation

presentation player overlay
1 / 29
About This Presentation
Transcript and Presenter's Notes

Title: Backward and forward looking at dependable and secure computing


1
Backward and forward looking at dependable and
secure computing
  • Yinghua Min
  • Fellow of IEEE
  • Institute of Computing Technology,
  • Chinese Academy of Sciences, Beijing, China
  • At PRDC09, 2009/11/16

2
Outline
  • Historical review of dependable computing
  • FTCS
  • DSN
  • IFIP WG10.4
  • PRDC
  • New challenges of dependable and secure computing
  • Old techniques facing new environments
  • Concentrated on practical problems, rather than
    conceptual games

3
FTCS
  • Established in 1970
  • FTC for critical applications
  • Aviation
  • Spaceflight
  • Railway transportation
  • A highly academic symposium

4
Dependable computing
  • People understood that our area needed some
    extension.
  • A. Avizienis and Jean-Claude Laprie proposed the
    concept of Dependable Computing at FTCS-15 in
    1985.
  • Human being is included in systems then.
  • Malicious faults
  • FTCS
  • DCCA

DSN in 2000
5
DSN
  • Since 2000
  • DSN has pioneered the fusion between security and
    dependability.
  • Understanding the need to simultaneously fight
    against cyber attacks, accidental faults, design
    errors, and unexpected operating conditions.

6
PRDC
  • 1989 Joint Symposium on Fault--Tolerant
    Computing, Chongqing, China, July 18-20, 1989
  • 1991 Pacific Rim international symposium on FTS,
    Kawasaki, Japan
  • 1999 Pacific Rim international symposium on
    Dependable Computing, Hong Kong, China.
  • Keynote Computer Crime in Hong Kong (Mr.
    Anthony Fung)
  • From the HK police department
  • Computer Crime and Internet Fraud
  • Its evidence for litigation support

7
Trusted Computing
  • Trusted Computing Platform Alliance (TCPA) in
    1999
  • TCG since 2003
  • TPM ? TCM (Trusted C Module) 2008
  • Trusted root ? security chip ? trusted BIOS ?
    trusted OS ? trusted systems
  • Basically for PCs in the area of secure computing

8
IEEE Transactions on Dependable and Secure
Computing
  • Since 2004
  • Separate dependable computing from secure
    computing

9
System dependability
  • The system dependability situation has been
    getting worse rather than improving in recent
    years. Quoting the AMSD Roadmap, the
    availability that was typically achievable by
    (wired) telecommunication services, and computer
    systems in the 1990s was 99.999 percent to 99.9
    percent. Now cellular phone services, and
    web-based services, typically achieve an
    availability of only 99 per cent to 90 per cent
    (AMSD Roadmap 2003, p. 31).

The European Commissions Accompanying Measure on
System Dependability
10
New challenges
  • Three key requirements for computers
  • High performance
  • Low power
  • Dependability
  • Nano-ICs, more vulnerable
  • to transient (or soft) errors
  • to permanent malfunctions due to materials aging
    or wearout mechanisms.
  • Nano-scale IC reliability
  • Counterfeit ICs
  • Dependability and security in cloud computing
  • Signal integrality
  • Dependant software needs evidence.

11
Nano-scale IC reliability
  • The "International Technology Roadmap for
    Semiconductors" SIA estimates that by 2019 the
    feature size of process technology will reach
    7nm, but only between 10 and 20 of chips will
    be defect free.
  • Power densities to skyrocket and on-chip
    temperatures to increase
  • Small delay defects, adjacent line coupling,
    crosstalk and process variation induced
    unreliability
  • variability-tolerant design
  • appropriate measures are taken, such as fault
    tolerance, redundancy, repair and
    reconfiguration.

12
Counterfeit Electronic Components
  • These are incidents that jeopardize the
    performance and reliability of electronics.

13
Baofeng.com incident in China
  • Network outages in Jiangsu, Anhui, Guangxi,
    Henan, Gansu, and Zhejiang in China, May 19, 2009
  • The network failure was led by the domain name
    system (DNS) failure of Baofeng.com, the website
    of the Chinese music player provider
  • The failure further caused the surge of DNS
    server visits and the decrease of processing
    performance of the network.
  • The servers of DNSPod were attacked by a
    malicious virus.
  • The incident was caused by a software fault or an
    attack?--- Maybe both

14
Bohrbugs and Mandelbugs
  • Bohrbugs
  • An unusual software bug that consistently makes
    its presence known under conditions that are
    either well-defined, possibly unknown or both.
  • Mandelbugs
  • A bug whose behavior doesn't appear malicious,
    but has such a high level of complexity that it
    appears when errors are accumulated for some
    time.
  • Bohrbugs behaving like Mandelbugs
  • Becoming an attack

15
Dependability in the Cloud
  • On April 26 2008, Amazons Elastic Cloud (EC2)
    had an outage
  • due to a single customer applying a very large
    set of unusual firewall rules
  • triggering a performance degradation bug in
    Amazons distributed firewall.
  • Availability and privacy are serious challenges
    for applications hosted on cloud infrastructure.

16
Challenges on cloud infrastructure
  • Cloud applications increase risk levels
  • Sharing of cloud resources by entities that
    engage in a wide range of behaviors and employ
    best practices to varying degrees
  • An environment with a few large cloud
    infrastructure providers
  • increases the risk of common mode outages
    affecting a large number of applications
  • provides highly visible targets for attackers.
  • Multiple administrative domains between the
    application and infrastructure operators reduces
    end-to-end system visibility and error
    propagation information, thus making problem
    detection and diagnosis very difficult.
  • A cloud provider's economies of scale allow
    levels of investment in redundancy and
    dependability, but smaller operators may not.

17
Old FTC techniques facing new environments
  • Checkpointing
  • Redundancy
  • Software fault-tolerance in middleware
  • ECC in mass storage systems
  • Fault detection and diagnosis in virtual machines
  • Assessment of dependability and security

18
Checkpointing for supercomputers
  • Periodic checkpointing ? cooperative
    checkpointing
  • At runtime, the application requests a
    checkpoint.
  • The system grants or denies the checkpoint (to
    skip some of them)
  • based on various system-wide heuristics,
    including disk or network usage and reliability
    information.
  • Using cooperative checkpointing in one instance
  • reduced bounded slowdown by a factor of nine,
  • improved system utilization, and lost no more
    work to failures than periodic checkpointing
  • even when event prediction had a 90 false
    negative rate.

19
Checkpointing at micro-operation level
Committed state
Committed state
Processor State
Violation Occurs
Violation detected
  • Sliding window based on sensor delay
  • Delayed-commit completed results buffered in the
    buffers until verified to be correct
  • Noise-speculative
  • Noise-verified
  • Rollback to a previous noise-verified state when
    a violation is detected

19
20
Redundancy
  • At the application level and at a hardware level.
  • Byzantine fault tolerance
  • Algorithms that are robust to arbitrary types of
    failures in distributed algorithms.
  • Do not require any centralized control that have
    some guarantee of always working correctly.
  • Data integrity
  • Redundancy in different places
  • RAID (redundant array of independent disks), a
    fault-tolerant storage device that uses data
    redundancy.
  • Synchronization is a big challenge.

21
Software fault-tolerance in middleware
  • Optimal fault tolerance strategy for both
    stateless and stateful Web services
  • Retry
  • Recovery block
  • N-version programming
  • Network characteristics
  • Freedom
  • Dynamic
  • Multi-tier service
  • Debug performance problems of multi-tier services
    of black boxes.

22
Soft errors
  • Soft errors involve changes to data
  • Cosmic rays creating energetic neutrons and
    protons
  • The importance of soft errors increases as chip
    technology advances.
  • chip-level soft error
  • the radioactive atoms in the chip's material
    decay and release alpha particles into the chip.
  • Built-in Soft Error Resilience (BISER) Cell
  • system-level soft error
  • the data being processed is hit with a noise
    phenomenon

23
Transient Faults
  • Program replication
  • N-version programming
  • Time redundant technique,
  • Virtual duplex systems
  • Tandem Nonstop Cyclone is a custom system
    designed to use process replicas for transaction
    processing workloads.
  • Transient Fault Tolerance for Multi-core
    Architectures
  • Redundancy at the process level
  • Ensuring correct hardware execution or ensuring
    correct software execution

24
Assessment of dependability and security
  • The original definition of dependability is the
    ability to deliver service that can justifiably
    be trusted.
  • Justification
  • Evaluation
  • Banchmarking
  • Standardization
  • A dependability and security gap that is often
    perceived by users as a lack of trustworthiness
    in computer applications, and that is in fact
    undermining the network and service
    infrastructures that constitute the very core of
    the knowledge-based society.

25
Difficulties for assessment
  • The assessment of dependability in a standard and
    comparable way, considering all
  • Component failures
  • Software bugs
  • Human mistakes
  • Interaction mistakes
  • Malicious attacks
  • The quality of measurements
  • The assessment of dependability in component
    based, dynamic and adaptive systems and networks
  • The integration with the development process

26
Denial of service (DoS)
  • Effects of DoS attacks are experienced by users
    as a severe slowdown, service quality
    degradation, or service disruption.
  • We need accurate, quantitative, and versatile DoS
    impact metrics regardless of the underlying
    mechanism for service denial, attack dynamics,
    legitimate traffic mix, or network topology.
  • Measuring DoS through selected legitimate traffic
    parameters
  • packet loss,
  • traffic throughput or goodput,
  • request/response delay,
  • transaction duration, and
  • allocation of resources.

27
Conceptual games
28
Concluding remarks
  • Dependable computing is a forever topic for
    information technology
  • Dependability is as important as high
    performance, and low power.
  • New challenges are coming with the advance of IT
  • The gap between academia and industry
  • Concentrate on practical problems, rather than
    conceptual games

29
  • Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com