Title: Intrusion%20Tolerant%20Distributed%20Systems%20
1Intrusion Tolerant Distributed Systems
Algorithms and Architectures
Software Systems Research Seminar March 21, 2003
- Angelo Corsaro Venkita Subramonian
- DOC Group, Washington University
2Security State of the Art
- Most of secure systems are nowadays built by
trying to prevent attacks - Several techniques and tools have been developed
to make more secure systems, detect system
weakness, and protect systems - New Programming Languages
- Software Tools like code analyzer, system
profiler, etc. - New Hardware/Software components
- etc. etc.
- Yet, systems security keeps being compromised!!!
- Nowadays pervasive interconnectivity introduces
more challenges for security - The lesson learned in securing systems is that
this brute force approach does not work. - Experience has led to the key observation that it
isnt practical/feasible to build 100 secure
systems
3Classical Secure Distributed Systems
- Classical Secure distributed systems are based on
the assumption that there exist part of the
system which is trusted - The basic and recurrent idea is that of
connecting distributed components together so as
to form a global secure infrastructure - This approach requires large trusted parts on all
computers on the network
4Kerberos
- One of the most used and deployed distributed
security systems is Kerberos - It was designed and implemented at the MIT as
part of the Athena project
- The core assumption at the base of Kerbeross
design are the following - Client workstations are totally under control of
the user, i.e., cant be trusted - Remote services can be accessed only via an
authentication service - Servers are trusted, and are physically protected
- The servers are under the complete control and
responsibility of the administrator - The master server is replicated on passive
slaves, which can replace the server when it
fails
5Kerberos
6Kerberos
1
- Request for a TGS ticket
7Kerberos
2
1
- Request for a TGS ticket
- Ticket for TGS
8Kerberos
3
2
1
- Request for a TGS ticket
- Ticket for TGS
- Request for Server Ticket
9Kerberos
3
2
4
1
- Request for a TGS ticket
- Ticket for TGS
- Request for Server Ticket
- Server Ticket
10Kerberos
3
2
4
1
5
- Request for a TGS ticket
- Ticket for TGS
- Request for Server Ticket
- Server Ticket
- Request for Service
11Kerbeross Security Problems
- The security administrator can misuse his
privileges to performs unauthorized actions - Replicas (Kerberos uses passive replication) can
also provide information to intruders if not well
protected - If Kerberos server fails, the last DB changes are
lost - Nothing is done to prevent covert channels
- There is a single point of failure!!!
12Security New Trends
- Eliminating flaws that make systems un-secure is
not feasible (especially for legacy systems) - Currently adopted solutions for distributed
systems security have quite a few problems - How about building systems that can continue
critical operations in face of attacks?
- Can we build systems that instead of trying to
prevent attacks can instead tolerate them?
13Architectures for Intrusion Tolerance
14Intrusion Tolerance The Idea
- Intrusion Tolerant Systems are designed in such a
way that they can tolerate a bounded number of
misuses - If one or more intruders by-pass the protection
mechanism and if the number of misuses they do is
less than a given threshold, the security
properties of the system - Confidentiality
- Integrity
- Availability
- Are always ensured!!!
- The key observation at the basis of Intrusion
Tolerant systems is that an intrusion can be
though as a Byzantine Fault
15Types of Intrusion Tolerance
- Confidentiality Read access to a subset of
confidential data gives no information about the
data - Integrity The change of a subset of data does
not change the data perceived by legitimate users - Availability The change or deletion of a subset
of data or of a server does not produce a denial
of service to legitimate users
- For each property P is defined a threshold Tp
- The reading, modifying or destroying a part X of
the data or server D such that X lt T
Xlt T Intrusion
16Intermezzo
17Data Intrusion Tolerance
- Data intrusion-tolerance techniques have existed
for a long time - Confidentiality can be ensured by cryptographic
tools like the threshold scheme - The data is shared in shadows, each shadow being
stored on one security site - To build the data it is sufficient a number of
shadows called the threshold - This scheme ensures availability and integrity
- To prevent denial of service the server are
replicated - Different sites cannot take decision
independently, they must agree by communicating
data and local decisions - This last point requires replication and agreement
Site Y
Site X
Site Z
File A
File B
File C
18Intrusion Tolerant Security Service
19Intrusion Tolerant Security Server
- The goal of an Intrusion Tolerant Distributed
Security server is that of providing a trusted
service out of a set of potentially untrusted
computers - This way, the intrusion of one of some of the
computers wont compromise the security of the
global system
- All the sites that are part of the security
service, called security sites, have to provide a
series of services - Registration
- Authentication
- Sensitive Data Management
- Audit and Recovery Service
20Registration Service
- The registration permits a user to be registered
by the system for future access to secured
services - This operation must be carried out independently
on each security site to prevent a single site
from using information to impersonate the user - The operation is done under control of the
security administrator of each site
21Authentication Service
- The role of this service is to verify the claimed
identity of a subject - In a distributed system with several
authentication servers, each server must
independently authenticate the subject - Notice that the security sites are untrusted and
one site could fake the authentication
information - An agreement protocol is used to make sure that
the user is authenticated if a majority of server
succeeded - Upon authentication the server sends the user
some session information, such as session id, key
etc.
22Authorization Service
- The role of the authorization service is that of
checking that the access to a secured service by
a subject is authorized according to its
access-rights - Access rights could be implements in a UNIX-like
manner - The authorization service is made intrusion
tolerant by implementing it on security servers
- Authorization phases are
- The client asks the security server for
permission to access a secured service - The access rights stored on the security sites
allow to determine if the client has the proper
rights - The security sites vote to decide if the access
is authorized - If the sites agree to permit access they send a
ticket to the client, and another to the server - Using the ticket the client can now open a
session with the server
23Sensitive Data Management Service
- The role of this service is to store, manage and
retrieve the sensitive information on the
security servers - The data management service must enforce the
three main security properties - Confidentiality
- Integrity
- Availability
- Integrity property is provided by a modification
detection mechanism based such as cryptographic
signatures - Replication can be used to ensure availability,
while threshold techniques could be used for
confidentiality and availability
24Sensitive Data Management Service
- If data is replicated on N sites, then
- With respect to availability, up to N-1 replicas
can be lost - With respect to confidentiality, one replica is
sufficient to observe the data - If one data item is shared on N security sites
using a threshold of T, then - With respect to availability, N-T shadows can be
lost - With respect to confidentiality, T shadows are
necessary and sufficient to observe the data
25The Audit and Recovery Service
- The role of this service is to audit the security
information sent by the services - There exists two kind of information
- Authorized operations
- Attempted or successful intrusion or misuse
- Notice that it is not a role of the service that
of determine what constitutes an intrusion or a
misuse - Analysis of the audit is done offline by security
administrators
- The recovery service acts as an error recovery
mechanism to correct certain modified data
26Voting Algorithms for Intrusion Tolerance
27Need for voting algorithms
Authentication
Authorization
28FT Node architecture
P1
P2
P3
P1
P2
P3
P1
P2
P3
Bus Controller
Bus Controller
Bus Controller
Local broadcast medium
Cluster1
Cluster2
Cluster3
29Distributed Voting
- Two phases
- Local Computation
- Compute results locally and broadcast results
- Majority reconciliation
- Determine if majority exists
- Initiate fault diagnostics if necessary
- Distributed algorithm for both phases
- Coordinator commits the majority vote
30Phase2(1/2)
- Distributed algorithm that runs on every voter
- Receive result from all voters
- If my result same as all other results
- we have a unanimous vote
- commit vote
- Else if we have more than 50 of the results the
same - we have a majority
- if I am the coordinator and my result NOT same
as majority result - select a new coordinator from among the
majority processors - commit vote
- if I am the coordinator
- initiate fault recovery in minority nodes
- (continued)
31Phase2(2/2)
- Else
- we do not have a majority
- start local diagnostics
- if my status okay
- select new coordinator from among okay
processors - repeat voting process
32Choosing a new coordinator
- New coordinator chosen from a processor set
- Candidate processor set
- could be all processors, when there is no
majority - or set of processors belonging to the majority
- Check local node status
- If status okay
- broadcast status to other processors
- wait until broadcast from other processors
arrive - if my node has the largest node id among okay
processors - I declare myself new coordinator
33Committing a Vote
- Coordinator responsible for committing majority
vote - If I am the coordinator
- broadcast result to majority
- wait for ack from all processors in majority
- Else
- wait for result from coordinator
- send ack to coordinator
34Problems with 2 Phase protocol
- What if coordinator fails right before committing
majority vote? - User (client) will receive bad result
- Probability very less
- Within acceptable risk parameters
- But transient faults could have adverse effect on
security - An attacker could control what result a user sees
- Majority does not matter any more
35Security and transient faults
- Transient faults could hamper security
- Illuminating a single transistor in an IC using a
laser - Serious threat to Smartcard technology
- Attack invented and perfected by Sergei
Skorobogatov, Cambridge University
Sergei's work will trigger a generation change
in smartcard technology. The immediate effect of
his work is that many attacks on computer systems
that were developed as theoretical possibilities
by the research communities in the 1990s have
suddenly become practical EE Times, May 2002
36A Solution
- Algorithm by Castro and Liskov
2
2
voter
voter
voter
3
3
1
Client
- Pros
- Commit done by all voters as opposed to just one
coordinator, hence more secure than the 2-Phase
algorithm - Cons
- Does not scale well, since client has to wait for
f1 replies
37Other algorithms
- More algorithms in literature
- Reiter, M., The Rampart Toolkit for Building
High-Integrity Services, Theory and Practice in
Distributed Systems,Lecture Notes in Computer
Science 938, pp. 99-110. - Malkhi, D., Reiter, M., Byzantine Quorum
Systems,Proceedings of the 29th ACM Symposium on
Theory of Computing, May 1997. - Kihlstrom, K., et al., The SecureRing Protocols
for Securing Group Communication, Proceedings of
the 31st Hawaii International Conference on
System Sciences, Vol. 3, pp. 317-326, Jan 1998. - Deswarte, Y., et al. Intrusion Tolerance in
Distributed Computing Systems, Proceedings of
the 1991 IEEE Symposium on Research in Security
and Privacy, pp. 110-121, May 1991.
38Inexact voting
- Drawbacks to the previous algorithms
- Assumes state machine replication in all voters
- Two different non-faulty voters will produce the
same result - Some use-cases where this assumption does not
hold - E.g., sensor values
- Inexact voting
- Values that fall within a range of tolerance are
considered equal - Equivalence classes
- Algorithms can be modified to handle inexact
voting - BUT, performance overhead large for multiple
inexact comparisons to determine majority
39Proposed Algorithm Assumptions
- Network with
- Atomic broadcast capability
- Bounded message delay
- Fair-sharing of broadcast medium
- No voter will commit answer until all voters
ready - Enforced using application dependent thresholds
- Any commits before this threshold are considered
invalid - Majority of voters are fault-free for reliable
working of the system - Each voter can vote only once
- Enforced by the User Interface module
40Proposed Algorithm (1/2)
voter
voter
voter
2
2
1
Interface Module
Client
3
1. Commit, if not committed already
2. Compare with committed result
3. Timer expires, send result to client
41Proposed Algorithm (2/2)
3
3
voter
voter
voter
2
2
1
4
Interface Module
Client
5
1. Commit, if not committed already
2. Compare with committed result
3. Dissent, if no match
4. Commit new vote
5. Reset timer expiry
42Uniqueness of this algorithm
- Security increased
- No specific coordinator node hence reduced
vulnerability - Even if the first commit to User Interface module
is compromised, it gets invalidated by dissenting
voters - Denial of Service (vote-rigging) eliminated
since a vote from an already committed voter is
ignored - Fault-tolerance properties maintained as before
- Result still based on majority
- Concerns about the User-Interface module
- Single point of failure
- BUT, this module is very simple with very little
computation - User-Interface module can be isolated from the
voter complex - Less intensive computation on the client
- Does not have to reconcile all results from voters
43Authentication
- Voters must be authenticated by User Interface
module before accepting commits - This should not increase the complexity of the
module - Strong authentication with minimal interaction
between voters and the interface module preferred - Example mechanism
- Use SKEY authentication
44SKEY authentication scheme
Voter
Interface Module
R
R
f is a one-way function
45Distributed voting in WAN
- Centralized voting not appropriate in a WAN
setting - Multiple hops for vote to reach from voter to
coordinator - Link failures could partition the network
- Network congestion in the vicinity of the
coordinator - Inexact voting could be computationally very
intensive - Sensor data from a vast coverage area
- Single coordinator target for malicious attack
46Assumptions
- Reliable transport
- Messages are digitally signed and subject to
verification before delivery to upper layer - Unverifiable messages are discarded
- Presence of Public-Key infrastructure
- Every voter knows the public key of every other
voter
47Secure voting
1
1
voter
voter
voter
2
4
2
3
3
1. Send signed vote to other voters, hash the
result and save it
2. Verify sign and compare with own result
3. Hash senders result, sign it and send
endorsement back
4. Verify the endorsement and compare it with
saved value in step 1
48Performance
- Time complexity
- Each voter signs its result and broadcasts it -
O(1) - Each voter waits to receive one signed vote from
every other voter O(n) - Each voter does vote comparison O(1)
- Each voter receives an endorsement from every
other voter O(n) - Complexity is O(n)
- Number of messages
- Voter sends vote to every other voter n(n-1)
- Voter sends endorsement to every other voter
n(n-1) - O(n2)
49Concluding Remarks
- The Intrusion Tolerance mechanism described
provide a much robust way of enforcing security
that traditional techniques - The intrusion tolerance mechanism based on
fragmentation-scattering ensures confidentiality
and integrity of data and availability of
services - Efficient and secure voting algorithms are an
essential part of intrusion tolerant systems - More research needed to make intrusion tolerance
a real technology - Scope for further research overlapping security
and fault-tolerance
50Fault tolerance vs Security
Fault-tolerant Design Secure Design
Guard against faulty system components or random faults Guard against malicious outside attacks
Optimistic Pessimistic
Probabilistic phenomena Directed Intelligent attack
Redundancy as a solution Redundancy as an adversity
51Redundancy a boon or a bane?
Desired security behavior
Security
Fault tolerance
Effect of redundancy
Degree of redundancy
52References
- "An Intrusion-Tolerant Security Server for an
Open Distributed System" - L. Blain, Y. Deswarte
- Secure and Fault-Tolerant Voting in Distributed
Systems - Ben Hardkopf, Kevin Kwait, Shambu Updahyaya
- Exploiting the Overlap of Security and
Fault-Tolerance - Ben Hardkopf, Kevin Kwait