Backward and forward looking at dependable and secure computing presentation

About This Presentation

Transcript and Presenter's Notes

Title: Backward and forward looking at dependable and secure computing

1
Backward and forward looking at dependable and
secure computing

Yinghua Min
Fellow of IEEE
Institute of Computing Technology,
Chinese Academy of Sciences, Beijing, China
At PRDC09, 2009/11/16

2
Outline

Historical review of dependable computing
FTCS
DSN
IFIP WG10.4
PRDC
New challenges of dependable and secure computing
Old techniques facing new environments
Concentrated on practical problems, rather than
conceptual games

3
FTCS

Established in 1970
FTC for critical applications
Aviation
Spaceflight
Railway transportation
A highly academic symposium

4
Dependable computing

People understood that our area needed some
extension.
A. Avizienis and Jean-Claude Laprie proposed the
concept of Dependable Computing at FTCS-15 in
1985.
Human being is included in systems then.
Malicious faults
FTCS
DCCA

DSN in 2000
5
DSN

Since 2000
DSN has pioneered the fusion between security and
dependability.
Understanding the need to simultaneously fight
against cyber attacks, accidental faults, design
errors, and unexpected operating conditions.

6
PRDC

1989 Joint Symposium on Fault--Tolerant
Computing, Chongqing, China, July 18-20, 1989
1991 Pacific Rim international symposium on FTS,
Kawasaki, Japan
1999 Pacific Rim international symposium on
Dependable Computing, Hong Kong, China.
Keynote Computer Crime in Hong Kong (Mr.
Anthony Fung)
From the HK police department
Computer Crime and Internet Fraud
Its evidence for litigation support

7
Trusted Computing

Trusted Computing Platform Alliance (TCPA) in
1999
TCG since 2003
TPM ? TCM (Trusted C Module) 2008
Trusted root ? security chip ? trusted BIOS ?
trusted OS ? trusted systems
Basically for PCs in the area of secure computing

8
IEEE Transactions on Dependable and Secure
Computing

Since 2004
Separate dependable computing from secure
computing

9
System dependability

The system dependability situation has been
getting worse rather than improving in recent
years. Quoting the AMSD Roadmap, the
availability that was typically achievable by
(wired) telecommunication services, and computer
systems in the 1990s was 99.999 percent to 99.9
percent. Now cellular phone services, and
web-based services, typically achieve an
availability of only 99 per cent to 90 per cent
(AMSD Roadmap 2003, p. 31).

The European Commissions Accompanying Measure on
System Dependability
10
New challenges

Three key requirements for computers
High performance
Low power
Dependability
Nano-ICs, more vulnerable
to transient (or soft) errors
to permanent malfunctions due to materials aging
or wearout mechanisms.
Nano-scale IC reliability
Counterfeit ICs
Dependability and security in cloud computing
Signal integrality
Dependant software needs evidence.

11
Nano-scale IC reliability

The "International Technology Roadmap for
Semiconductors" SIA estimates that by 2019 the
feature size of process technology will reach
7nm, but only between 10 and 20 of chips will
be defect free.
Power densities to skyrocket and on-chip
temperatures to increase
Small delay defects, adjacent line coupling,
crosstalk and process variation induced
unreliability
variability-tolerant design
appropriate measures are taken, such as fault
tolerance, redundancy, repair and
reconfiguration.

12
Counterfeit Electronic Components

These are incidents that jeopardize the
performance and reliability of electronics.

13
Baofeng.com incident in China

Network outages in Jiangsu, Anhui, Guangxi,
Henan, Gansu, and Zhejiang in China, May 19, 2009
The network failure was led by the domain name
system (DNS) failure of Baofeng.com, the website
of the Chinese music player provider
The failure further caused the surge of DNS
server visits and the decrease of processing
performance of the network.
The servers of DNSPod were attacked by a
malicious virus.
The incident was caused by a software fault or an
attack?--- Maybe both

14
Bohrbugs and Mandelbugs

Bohrbugs
An unusual software bug that consistently makes
its presence known under conditions that are
either well-defined, possibly unknown or both.
Mandelbugs
A bug whose behavior doesn't appear malicious,
but has such a high level of complexity that it
appears when errors are accumulated for some
time.
Bohrbugs behaving like Mandelbugs
Becoming an attack

15
Dependability in the Cloud

On April 26 2008, Amazons Elastic Cloud (EC2)
had an outage
due to a single customer applying a very large
set of unusual firewall rules
triggering a performance degradation bug in
Amazons distributed firewall.
Availability and privacy are serious challenges
for applications hosted on cloud infrastructure.

16
Challenges on cloud infrastructure

Cloud applications increase risk levels
Sharing of cloud resources by entities that
engage in a wide range of behaviors and employ
best practices to varying degrees
An environment with a few large cloud
infrastructure providers
increases the risk of common mode outages
affecting a large number of applications
provides highly visible targets for attackers.
Multiple administrative domains between the
application and infrastructure operators reduces
end-to-end system visibility and error
propagation information, thus making problem
detection and diagnosis very difficult.
A cloud provider's economies of scale allow
levels of investment in redundancy and
dependability, but smaller operators may not.

17
Old FTC techniques facing new environments

Checkpointing
Redundancy
Software fault-tolerance in middleware
ECC in mass storage systems
Fault detection and diagnosis in virtual machines
Assessment of dependability and security

18
Checkpointing for supercomputers

Periodic checkpointing ? cooperative
checkpointing
At runtime, the application requests a
checkpoint.
The system grants or denies the checkpoint (to
skip some of them)
based on various system-wide heuristics,
including disk or network usage and reliability
information.
Using cooperative checkpointing in one instance
reduced bounded slowdown by a factor of nine,
improved system utilization, and lost no more
work to failures than periodic checkpointing
even when event prediction had a 90 false
negative rate.

19
Checkpointing at micro-operation level
Committed state
Committed state
Processor State
Violation Occurs
Violation detected

Sliding window based on sensor delay
Delayed-commit completed results buffered in the
buffers until verified to be correct
Noise-speculative
Noise-verified
Rollback to a previous noise-verified state when
a violation is detected

19
20
Redundancy

At the application level and at a hardware level.
Byzantine fault tolerance
Algorithms that are robust to arbitrary types of
failures in distributed algorithms.
Do not require any centralized control that have
some guarantee of always working correctly.
Data integrity
Redundancy in different places
RAID (redundant array of independent disks), a
fault-tolerant storage device that uses data
redundancy.
Synchronization is a big challenge.

21
Software fault-tolerance in middleware

Optimal fault tolerance strategy for both
stateless and stateful Web services
Retry
Recovery block
N-version programming
Network characteristics
Freedom
Dynamic
Multi-tier service
Debug performance problems of multi-tier services
of black boxes.

22
Soft errors

Soft errors involve changes to data
Cosmic rays creating energetic neutrons and
protons
The importance of soft errors increases as chip
technology advances.
chip-level soft error
the radioactive atoms in the chip's material
decay and release alpha particles into the chip.
Built-in Soft Error Resilience (BISER) Cell
system-level soft error
the data being processed is hit with a noise
phenomenon

23
Transient Faults

Program replication
N-version programming
Time redundant technique,
Virtual duplex systems
Tandem Nonstop Cyclone is a custom system
designed to use process replicas for transaction
processing workloads.
Transient Fault Tolerance for Multi-core
Architectures
Redundancy at the process level
Ensuring correct hardware execution or ensuring
correct software execution

24
Assessment of dependability and security

The original definition of dependability is the
ability to deliver service that can justifiably
be trusted.
Justification
Evaluation
Banchmarking
Standardization
A dependability and security gap that is often
perceived by users as a lack of trustworthiness
in computer applications, and that is in fact
undermining the network and service
infrastructures that constitute the very core of
the knowledge-based society.

25
Difficulties for assessment

The assessment of dependability in a standard and
comparable way, considering all
Component failures
Software bugs
Human mistakes
Interaction mistakes
Malicious attacks
The quality of measurements
The assessment of dependability in component
based, dynamic and adaptive systems and networks
The integration with the development process

26
Denial of service (DoS)

Effects of DoS attacks are experienced by users
as a severe slowdown, service quality
degradation, or service disruption.
We need accurate, quantitative, and versatile DoS
impact metrics regardless of the underlying
mechanism for service denial, attack dynamics,
legitimate traffic mix, or network topology.
Measuring DoS through selected legitimate traffic
parameters
packet loss,
traffic throughput or goodput,
request/response delay,
transaction duration, and
allocation of resources.

27
Conceptual games
28
Concluding remarks

Dependable computing is a forever topic for
information technology
Dependability is as important as high
performance, and low power.
New challenges are coming with the advance of IT
The gap between academia and industry
Concentrate on practical problems, rather than
conceptual games

Thank you for your attention!

Write a Comment

User Comments (0)

About PowerShow.com

Backward and forward looking at dependable and secure computing PowerPoint PPT Presentation