Resilient Real-Time Cyber-Physical Systems Josef De Vaughn - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Resilient Real-Time Cyber-Physical Systems Josef De Vaughn

Description:

Resilient Real-Time Cyber-Physical Systems Josef De Vaughn Allen, PhD INFOSEC Professional Berger, S.; C ceres, R.; Goldman, K. A.; Perez, R.; Sailer, R. & van Doorn ... – PowerPoint PPT presentation

Number of Views:205
Avg rating:3.0/5.0
Slides: 48
Provided by: uscertGo
Learn more at: http://www.us-cert.gov
Category:

less

Transcript and Presenter's Notes

Title: Resilient Real-Time Cyber-Physical Systems Josef De Vaughn


1
Resilient Real-Time Cyber-Physical Systems
  • Josef De Vaughn Allen, PhD
  • INFOSEC Professional

2
Agenda
  • Purpose
  • The Problem
  • ORNL Value Add
  • Resilient Cyber-Physical System
  • What is a SCADA
  • Current Limitations
  • ORNLs Approach
  • Risks
  • Payoff
  • Team
  • References

3
Purpose
  • ORNL is investing significant IRD for Resilient
    Infrastructure Systems
  • Want to get guidance that our findings and
    approach are in a direction that will add value
    to United States and abroad
  • Areas targeting
  • Smart Grid
  • Water
  • Oil Gas
  • Should there be others???
  • About me
  • Leading the effort at ORNL to gather requirements
    and lay out a plan of attack for Resilient
    Systems
  • Full Faculty Member at Florida State University
    (Computer Science)
  • NSA INFOSEC Professional
  • Primary Systems Installed were TS/SCI Classified
    Systems
  • Intelligence Community/DoD/Five Eyes
  • Large Scale Systems (Eight Years Experience)
  • System Security Architect for several large scale
    systems
  • Marine Corps and served in Desert Storm/Desert
    Shield

We want to provide a relevant solution that the
customers (YOU) can use
4
The Problem
  • Munitions Grade Malware are going beyond just
    affecting computing and network resources
  • Compromises a physical system (e.g. STUXNET)
  • We must be able to create cyber-secure interfaces
  • Core Issues
  • Lots of attention being paid to intrusion(s)
    going undetected
  • Attention is NOT being paid to payload
  • Payload was targeting physical devices, not
    computing/network resources
  • Smart, or not, most physical devices are too
    trusting
  • Physical Devices must be cyber-aware
  • Firmware and embedded software for remote sensor
    devices cannot be trusted
  • No known federal mandate has been made for
    quality assurance for country of origin
  • Firmware needs to be certified affirmed to be
    correct, true, or genuine
  • Computing network devices and physical devices
    can not communicate compromise so development of
    a cyber-physical layer is needed
  • Reality
  • We will get attacked and penetrated (i.e. Hacked)
  • We must be able to move forward after being
    attacked (i.e. Resilience)

How do we respond after getting punched in the
face?
5
ORNL Value Add
Vision Statement Provide end-to-end solutions for
cyber-secure resilient systems using cutting
edge, reference based, COTS HW/SW, with open
source standards while not being cost prohibitive
6
Resilient Cyber-Physical Systems
Resilient Cyber-Physical Systems refers to the
tight conjoining of, and coordination between,
Computational-Physical resources and how
adaptable the system will be in the midst of
adversity
7
Resilient Cyber-Physical Systems (CPS)
CPS
Dynamic Computing Systems
Dynamic Smart Sensors
8
Supervisory Control And Data Acquisition (SCADA)
system
  • SCADA, Controls, consists of three elements
  • Master system at a control center (Computing)
  • Communications system (Network, phone lines)
  • Multiple remote monitoring and control devices
    (Sensor, RTUs, protection relays, meters)

Paraphrased from IEEE Tutorial course on
fundamentals of supervisory systems. Technical
Report 91 EH0337-6 PWR
9
Current Limitations
  • Models and Work toward Generic Resilient
    computing/network systems
  • 1999 2003 DoD/IC
  • Organically Assured and Survivable Information
    System (OASIS)
  • OASIS Program Manager, Jaynarayan Lala DARPA/ITO
  • 2003 2006 (Europe)
  • Malicious and Accidental-Fault Tolerance for
    Internet Applications (MAFTIA)
  • 2009
  • DHS A Roadmap for Cyber Security Research
  • Trusted Systems have not adequately evolved.
  • Can we leverage lessons learned?
  • Yes

No framework, no implementation for a resilient
cyber physical system
10
Current Limitations Todays Use
  • Married to the Machine Architecture
  • Controllers are not fully optimized
  • Registry
  • Shared memory
  • Shared libraries
  • Not agnostic to the CPU instruction set
  • Static planning reactive planning
  • Hot swaps
  • OS imaging
  • Hashing
  • Non adaptive configuration management
  • Who cares?
  • Users maintainers of mission critical systems

11
ORNL Approach
  • Inline continuity of operations (ICOOP) versus
    COOP
  • No matter what attack, or disruption, there is a
    plan to complete the mission at-hand
  • Need in-place fail-over nodes and not remote
    back-up
  • Real-Time Trusted Platform Module/Base Aware
    Sensors/Controllers
  • We can not build up for resilience we must build
    within
  • Must be cost realistic
  • Adaptive planning algorithms for redirecting
    existing resources (I.E. GRID computing)
  • Leverage Mobile Phone embedded implementation
    Architectures creating cyber-aware
    hardware/software for mission critical systems
  • Dynamic Virtualization with Program
    Differentiation

Leverage ORNL Distributed Energy Communications
and Control and create a mission critical ICOOP
framework and implementation model
12
ORNL Approach
  • We will discuss resiliency for the three main
    components of a control system
  • Dynamic Computing Systems (Computing)
  • Dynamic Virtualization for Health Status of
    System
  • Dynamic Program Differentiation
  • Wireless Power System Fingerprinting
  • Communications system (Network, Phone lines)
  • Dynamic White-Listing (Harris Corporation)
  • Multiple remote monitoring and control devices
  • Dynamic Smart (Sensors)
  • Trusted Computing/TPM Aware Sensors
  • Elliptic Curve Cryptography Aware for Real-Time
  • 155-bit ECC uses 11,000 transistors while a
    512-bit RSA implementation uses 50,000.
  • Penetration Defense
  • Command Validation
  • Input Validation
  • Disturbance Rejection
  • Mathematical (Framework)
  • Mixture Model via GPU and/or Secure Cloud
    Computing
  • Model the system to get a snap shot of SCADA/PMU

13
Risk
  • Tighter Coupling of resources with the mission
  • Complexity of process scheduling may increase
  • Antiquated resources will need to be taken
    retro-fitted offline

14
Payoff
  • Create a dynamic architecture based on adaptive
    planning
  • No single point of failure
  • If a system is compromised, there is a best
    path to finish existing/current mission
  • Based on mission focus
  • Maximization of open standard COTS/SW/HW/OS in a
    directed mission
  • Creating an ontology/taxonomy that leads to novel
    non-linear scheduling algorithms on computing
    nodes
  • Define the critical components for resilience
  • Saves carbon foot print
  • Vision For Securing Control Systems in the energy
    Sector
  • In 10 years, control systems for critical
    applications will be designed, installed,
    operated, and maintained to survive an
    intentional cyber assault with no loss of
    critical function.
  • Roadmap to Secure Control Systems in the Energy
    Sector DOE/DHSJanuary 2006

15
Team
  • Oak Ridge National Laboratory
  • Josef D. Allen
  • Aleksandar Dimitrovski
  • Robert Gillen
  • Shaun Gleason
  • Dilip Reddy
  • Isabelle Snyder
  • Bogdan Vacaliuc
  • Phillip Vallance
  • Richard Wallace
  • Florida State University
  • David Whalley
  • Xiuwen Liu
  • Gary Tyson
  • Michael Steurer
  • Karl Schoder
  • GE Research
  • Arthur Chip Cotton
  • Harris Corporation
  • Travis Berrier
  • University of Tennessee
  • Yilu Liu

Cross Discipline Team is Essential for Success!!
16
References and Related Work
  • References
  • D. Chang, S. Hines, P. West, G. Tyson, and D.
    Whalley, Program Differentiation" in the Journal
    of Circuits, Systems, and Computers, accepted
    March 2011
  • X. Liu, A. Srivastava, and D. L.
    Wang,Intrinsic generalization analysis of low
    dimensional representations,' Neural Networks,
    vol. 16, no. 5/6, pp. 537--545, 2003.
  • S. C. Zhu and X. Liu, Learning in Gibbsian
    fields How accurate and how fast can it be?''
    IEEE Transactions on Pattern Analysis and Machine
    Intelligence, vol. 24, no. 7, pp. 1001--1006,
    2002.
  • F. Wang, F. Gong, C. Sargor, K.
    Goseva-Popstojanova, K. S. Trivedi and F. Jou,
    SITAR A Scalable Intrusion Tolerant
    Architecture for Distributed Services, 2nd
    Annual IEEE Systems, Man, and Cybernetics
    Information Assurance Workshop, West Point, New
    York, June 2001
  • João Filipe Ferreira, Jorge Lobo, Jorge Dias.
    Journal of Real Time Image Processing (2010)
    Bayesian real-time perception algorithms on GPU
  • D. Powell and R. Stroud, Conceptual model and
    architecture of MAFTIA, MAFTIA Deliverable D21,
    2003.
  • N. F. Neves and P. Verissimo, Complete
    Specifications of APIs and Protocols for the
    MAFTIA middleware, MAFTIA Deliverable D9,
    2002.
  • M. Dacier (editor), Design of an
    Intrusion-Tolerant Intrusion Detection System,
    MAFTIA Deliverable D10, 2002.
  • M. Castro and B. Liskov, Practical Byzantine
    Fault Tolerance and Proactive Recovery, ACM
    Transactions on Computer Systems, vol. 20, no. 4,
    pp. 398-461, 2002.
  • J. Levy, H. Saidi, and T. Uribe, Combining
    monitors for runtime system verification,
    Electronic Notes in Theoretical Computer Science,
    vol. 70, no. 4, 2002.
  • Berger, S. Cáceres, R. Goldman, K. A. Perez,
    R. Sailer, R. van Doorn, L. vTPM Virtualizing
    the Trusted Platform Module USENIXSS 06
    Proceedings of the 15th conference on USENIX
    Security Symposium, USENIX Association, 2006,
    2121
  • Sadeghi, A. Scheibel, M. Stüble, C. Wolf, M.
    Play it once again, Sam Enforcing Stateful
    Licenses on Open Platforms Second Workshop on
    Advances in Trusted Computing (WATC 06 Fall),
    2006
  • Xue, Y., Some Viewpoints and Experiences on Wide
    Area Measurement Systems and Wide Area Control
    Systems, 2008 IEEE Journal
  • C. Arguayo, J. Reed, Detecting Unauthorized
    Software Execution in SDR Using Power
    Fingerprinting, MILCOM 2010
  • T. Messerges, E. Dabbish, R. Sloan, Examining
    Smart-Card Security under the Threat of Power
    Analysis Attacks, IEEE Transactions on
    Computers, Vol 51, No. 4, April 2002
  • Related Work
  • Very recently there were several noticeable
    efforts toward intrusion tolerant systems
  • DARPA OASIS (Organically Assured and Survivable
    Information Systems, http//ieeexplore.ieee.org/xp
    l/mostRecentIssue.jsp?punumber8932)

17
Guidance
18
Guidance
  • We want to make sure that our direction makes
    sense.
  • Please give feed back!!.
  • Presenter Josef D. Allen
  • Email allenjd_at_ornl.gov

19
Thank You
20
Our Solution
21
SCADA for Power System
22
Generators Control Strategies
Local Control Speed governor
Local Control
G
G
Local Control Speed governor
Local Control Speed governor
G
G
Local Control Speed governor
Local Control Speed governor
G
G
Area Control AGC System
Local Control (Speed governors) respond to
frequency or load changes at the generator
output, fast response (10s)
Area Control (AGC system) Continuously monitor
frequency and tie line flows. Changes the output
of the participating generators to bring both
frequency and interchange back to schedule, slow
response (10minutes)
23
Large Frequency Deviation Impact
  • Electrical islands local power system cut off
    from outside power source due to tripping of
    ties.
  • Frequency decay between 0.5hz and 4hz per second
  • At around 59.3hz load shedding by under
    frequency relays to prevent complete shutdown
  • If enough load cannot be shed, generating units
    will trip (automatically or manually) and lead
    to complete shutdown

24
Synchronous Generator Control Loops Prime Mover
and Exciter Control
25
Network Protection
26
Isolate Corrupted Devices in Network
  • Monitor for network pre-intrusion from generation
    to substation
  • Change current reactive security to proactive,
    anticipatory security
  • Robust adaptable security without perimeter
    reconfiguration limitations
  • Low cost, power scavenging makes installation easy

1. Hydroelectric dam 2. Generator 3. Step-up
transformer 4. Grid high voltage transmission
lines 5. Terminal Station 6. Subtransmission
lines 7. How it is used by the customer 8.
Distribution substation
27
Protecting Device Communication
Management
  • One way communications path with high assurance
    firewall/VPN
  • Can stop propagation of malware
  • Containment function for digital traffic with
    Dynamic whitelisting

Switch
Monitored Comm.
Invalid Comm.
28
Server Architecture Resiliency
29
Resilient Controls Under Duress
  • Resilient systems under duress rely on the
    following
  • The computing devices in question for control
    systems run a constrained set of software (i.e.
    Not general-purpose machines)
  • The devices are generally-available commodity
    hardware
  • The devices serve to accept input data, make a
    decision, and respond appropriately.
  • They are not individually responsible for
    trending, persistence, etc. But may (and likely
    do) send such data to an external system for
    historical or other analyses
  • The Computing Systems environment is limited in
    power, space, and available options for
    individual system redundancy
  • The systems can operate on virtualized hardware
    (standard type-1 virtualization approaches)

30
VM Health Management
Dom0 monitors VM1-VMn based on the rules defined
for each system.
VM1
VM2
Health Monitor
System1-a
Original System Image Cache
System2-a
Hypervisor
31
VM Health Management
VM1
VM2
The monitors for VM1 trigger a risk condition and
apply the configured response (mitigation)
Health Monitor
System1-a
System2-a
Original System Image Cache
Hypervisor
32
VM Health Management
VM1
VM2
The original version of System1 is pulled from
the cache, and a uniquely obfuscated variant is
produced (System1-b)
Health Monitor
System1-b
System1-a
System2-a
Original System Image Cache
Hypervisor
33
VM Health Management
VM3 is brought online and System1-b is deployed.
VM1
VM2
VM3
Health Monitor
System1-a
System2-a
Original System Image Cache
System1-b
Hypervisor
34
VM Health Management
Once VM3 is fully online, I/O ports from VM1 are
migrated to VM3
VM1
VM2
VM3
Health Monitor
System1-a
System2-a
Original System Image Cache
System1-b
Hypervisor
35
VM Health Management
VM1 is archived for later forensic review and
shut down
VM2
VM3
Health Monitor
System2-a
Original System Image Cache
System1-b
Hypervisor
36
Digital Fingerprinting
37
Overcoming Embedded Code
  • Dynamic Power Consumption in a digital processor
    is caused by transient currents and charges and
    discharge of load capacitance that occurs during
    bit transactions.
  • Key Comments
  • Transactions depend on specific/unique
  • Instructions Sets
  • Memory Addresses
  • Inter-Instructions Transitions
  • Bottom Line
  • Execution of a specific routine yields a unique
    power consumption signature
  • All manufactured hardware is unique
  • Even if it is from the same assembly line!
  • PMU/GridEye can allow us to obtain power
    signatures of desired devices directly or
    wirelessly

38
Dynamic Smart Sensors
39
Trusted Secure Control Devices
  • Control system devices will integrate a secure
    element (TPM, smartcard, USIM)
  • Required for new use cases
  • Near field communication (NFC)
  • Contactless applications are executed on a secure
    element
  • Enables mobile payment, ticketing, smart
    posters, DRM
  • Secure transactions
  • Secure browsing
  • Research needed for secure element (SE)
    integration
  • Research needed for secure channels to SE are
    established
  • Secure element (TPM) that supports trust
    establishment is required
  • Trust decision based on integrity stored in
    secure element
  • TPM was designed for current (insecure) operating
    system environments
  • Access to TPM over virtualization boundary not
    directly possible
  • Possible solutions
  • Virtualization of TPM (insecure)
  • Only one instance gets access to TPM (inflexible,
    trust statements incomplete)
  • Better approach
  • Provide vT-enhanced TPM(s)
  • Next specification of TCG (TPM.Next)

40
Mathematical Framework
41
System Model
  • We will develop efficient and effective
    statistical models of process behaviors
  • We will use local windows of system call
    profiles, port scanning activities, resource
    accessing local patterns, controller outputs, and
    inferred underlying controller state parameters
    as feature vectors
  • Each group of related process will be modeled as
    a mixture model to reflect the different
    operating states. As the system is very complex,
    the key is to find efficient and effective
    statistical models that allow accurate, real-time
    inference

42
Mixture Models
  • We use a unified a mixture model framework to
    model different processes in the system
  • Here x is a vector of observed variables, q
    consists of all the parameters and P(wj) are the
    priors for different types of underlying
    processes
  • For example, for a generator, q consists of the
    physics-based model parameters (such as the
    frequency of the generated electricity) here the
    estimated probability models will have very small
    variation due to stringent requirements
  • For cyber processes, q consists of model
    parameters in representations derived from all
    monitored measurements
  • In order to learn effective representations and
    therefore enable real-time inference, we use
    optimal component analysis learning

43
Suitability of real-time inference for GPU
  • The inference can be decomposed into
    computational components that are highly
    parallelizable
  • As we have a unified framework, the differences
    among different types of processes are in the
    data, and therefore lead naturally to single
    instruction multiple data (SIMD) parallelization
  • The high cost arises also due to the large data
    sets involved.
  • SIMD features of GPUs provide a means of dealing
    with the scalability of highly parallelizable
    algorithms operating on large data structures.
  • GPUs provide massive parallelism and high speed
    gains at low costs.
  • Source Bayesian real-time perception
    algorithms on GPU by João Filipe Ferreira, Jorge
    Lobo, Jorge Dias. Journal of Real Time Image
    Processing (2010)

Inspired by the study of biological systems,
several Bayesian inference algorithms for
artificial perception exist
44
Timing
  • Global Protection (SCADA) operates on the order
    of seconds
  • (2- 4 Sec)
  • Local Protection of power system cycles
  • (1/f 1/60HZ 16.67ms )
  • Instantaneous Protection (Physical) operates at
    the order of milliseconds
  • 2 6 Cycles
  • Time Delayed Back-Up and System wide Protection
    (COOP) operates on the order of 100s milliseconds
  • 20 30 Cycles
  • Additional Level of back up will be additional to
    Time-Delayed
  • Additional 20 30 Cycles
  • PMU data 30 Seconds/Sample (Typically)
  • Can go 240 HZ, 8000Hz or Higher

General reference Times Electric Power
Engineering Handbook 2nd ed., L.Grigsby editor,
CRC Press 2006.
45
Timing
  • Heisenberg uncertainty principle
  • Can not measure the present position and
    determine future momentum
  • Direction
  • Leverage all information in a unified way to
    provide an actionable decision (Cyber-Physical)
  • Must continue the mission while under duress
  • We can NOT overcome Physics
  • (Nonlinear Dynamic Decision Making)
  • We propose a unified resilient manifold framework
    to model interactions among all the components in
    different states (both steady states and
    transient states)

General reference Times Electric Power
Engineering Handbook 2nd ed., L.Grigsby editor,
CRC Press 2006.
46
Timing
  • Combine RTU/SCADA with PMU/WAMS
  • Real-Time Cognitive Trajectory Based Data Mining
  • Suggest PMU use wavelets vice FFT to allow for
    faster decisions
  • On line Dynamic State Estimation (Tracking)
  • Suggests Quantum Path Planning
  • Leverage work done in DoD Tracking (GMTI)
  • Developing a mathematical and computational
    framework for detecting and classifying weak,
    distributed anomalous behavior in computer
    networks
  • On Line Optimization for decision making

47
Comment
  • 2008 (NARI) Nanjing Automation Research Institute
    (China)
  • Vision obtained from Western Systems Coordinating
    Council
  • NASPINET
  • China is implementing real-time systems dynamics
    of Power Grid via non-linear estimation of
    real/static state estimations
  • Yusheng Xue Chief Engineer of NARI since 1993 for
    Peoples Republic of China
Write a Comment
User Comments (0)
About PowerShow.com