Title: Building a Secure and Resilient Network Infrastructure
1Building a Secure and Resilient Network
Infrastructure
- Dan Massey
- Colorado State University
2Outline
- Changes in the Infrastructure Environment
- Using Internet Worm Attacks To Motivate
- Secure and Resilient BGP Communication
- Path Vector Algorithm Convergence
- Network Fault Identification
- New Challenges in Authentication
- DNS Security
3Original Infrastructure Goals
- The original designs assumed that
- Hardware is unreliable servers/routers will fail
- Network links are unreliable connections will
fail - Data transport is unreliable bit errors will
occur - The goal was to build protocols that
- Provide functionality despite all of the above
- Scale to extremely large size
- Tremendously successful in this respect
- BGP routing protocol - 150K routes 20K systems
- DNS naming protocol - 1G of records in 60M zones
4The Infrastructure Today
- Success and growth to large-scale adds
- Implementation and under-specification errors
- Configuration errors by diverse administrators
- Complex interactions and challenge of scale
- Intentional attacks
- The Internet works today because
- Robust original design masks many problems.
- Clever operational tricks keep the system afloat
- Ex AOL BGP TTL Hack (RFC 3682) to protect
routers from DDoS - For every type of animal there is a most
convenient size, and a large change in size
inevitably carries with it a change of form.
5Changing the Form of the Internet
- We need to recognize current design successes.
- The Internet generally works today.
- Includes millions of already deployed systems.
- Provides a laboratory for large-scale system
problems. - New challenges require a new approach to design.
- Essential to add resilience and security.
- But this does not imply we must start from
scratch. - New solutions must either be incrementally
deployable or must prove the necessity for a
fresh start.
6Slammer Worm After 30 Minutes (graph by CAIDA)
7BGP Routing Infrastructure
- Internets Global Routing Protocol
- Connects Autonomous Systems (AS)
- Path Vector Routing Protocol
- Announce the path of AS used to reach destination
- Routes adapt to
- Link changes route polices
- Does not adapt to traffic load.
- Recent worm attacks should have no impact on BGP.
Prefix P
AS 1
AS 2
AS 3
Prefix P Path 2,1
Prefix P Path 3,1
8BGP Updates During Nimda Worm
9BGP Measurement Artifacts
- BGP peers establish TCP session and send full
route table (120K routes) - Updates sent only if routes change.
- Our results show frequent session resets between
ISP routers and the monitoring point. - Monitoring point sessions cross multiple systems
in the Internet. - Each reset adds 120K updates.
- But very few ISP-ISP session resets.
- Our work in 1 presents rules to remove session
reset artifacts.
10What Our Analysis Shows
37.6
40.2
8.3
8.8
A substantial percentage of the BGP messages
during the worm attack were not about route
changes
11FRTR Improving Peer Communication
- BGP Updates Are Not (Topology) Event Driven
- Session resets trigger high volume surges
- Govindan shows cascade failures can result.
- Lifetime of Invalid Routes is Unbounded
- Never recover (until reset) if update is somehow
lost. - Despite TCP, we found cases of lost
withdrawals. - Attacker can poison a route with one update.
- Soft-state (periodic re-announce) is too costly
- FRTR Uses Periodic Bloom Filter Digests
- Digests quickly confirm state after session
reset. - Periodic digests bound lifetime of faults (w/
high prob). - Co-Author Keyur Patel (Cisco) is exploring Cisco
development.
12FRTR Performance
- For each route at receiver, check against the
digest. - Bloom filter results in no false negatives.
- Compare total digests for missing route
detection. - False positive possible with known rate.
- Add salts to reduce the chance of repeated false
positives. - Overhead is a function of digest size and
frequency. - Work with Cisco suggests a 1.3 overhead
increase. - Complete Details to appear in 2 (DSN 2004)
13What Our Analysis Shows (2)
FRTR Elimanates Bursts
37.6
40.2
8.3
8.8
What about the 60 Not Due to Table Exchange
14A Closer Look at the Route Changes
15Improving Path Vector Convergence
- Infocom 02 4 uses consistency to detect invalid
paths. - Reject path ltx1, x2,, xn, r1,r2, rmgt if r1 is
adirect neighbor r1s path is not ltr1, r2, .,
rmgt - Adjusted to account for policy and implement in
BGP - Infocom 03 Afek, et al quickly flushes invalid
paths. - BGP requires updates be separated by a min
interval - Send withdraw (to flush route) if blocked by the
interval - Our recent work 5 attaches a new attribute
Root Cause Notification (RCN) - Identifies the failed link and includes a
sequence number. - Allows any route relying on the failed link to be
rejected.
16Simulation Results
17What Our Analysis Shows (3)
RCN Improves Convergence
FRTR Elimanates Bursts
37.6
40.2
8.3
8.8
Cant Eliminate the actual topology dynamics
18Identifying the Source of Faults
- It is believed that worm attacks caused edge
instability, but core links remained up. - Can we prove (or disprove) this claim?
- The Fault Identification Problem
- BGP Monitoring points collect gigabytes of data
from an ad hoc selection of monitoring points. - Underlying Internet topology is not known, but
data does include path information - What can you conclude regarding faults?
- Pursuing Two Parallel Solutions
- Enhance protocol to include fault data (RCN).
- Design tools and algorithms to automate fault
identification
19The Link Rank Analysis Toolset
- LinkRank 6 developed for analyzing BGP data.
- Assigns each AS-AS link a weight based on number
of prefixes. - Records aggregate rank changes over time.
- Figure shows the graph from AS 6539.
- Note all links leaving AS 701 show a route loss.
20Combining Multiple Views
- Previous snapshots suggested a failure at AS 701.
- View from other points shows all BGP monitors saw
a shift away from AS 701. - NANOG confirmed a corresponding failure event.
- Successfully applied LinkRank to several Internet
events.
21Formalizing the Results
- LinkRank relies heavily on human intuition.
- Investigating algorithms to automate detection.
- The Fault Identification Problem
- Given only path vector routing table snapshots.
- Can you find the minimum set of link changes that
explain the snapshots? - Can you find a representation of all possible
changes the explain the snapshots? - Results for shortest path policies in 7.
- Work in progress on other polices and partial
link failures.
22Lessons From The Worm Attacks
- Worm Shows Complexity of BGP Dynamics
- Need to stablize the peer communication (FRTR).
- Need to improve path vector convergence (RCN).
- Would like to identify real source topology
events. - But we must not forget that
- Ultimate goal of routing is to delivery packets.
- Route updates are only a means to toward this
goal. - Worm attack was not directed against routing.
23Infrastructure Faults and Attacks
- BGP and DNS Provide No Authentication
- Faults and attacks can mis-direct traffic.
- One (of many) examples observed from BGP logs.
- Server could have replied with false DNS data.
originates route to 192.26.92/24
ISPs announced new path for 20 minutes to 3 hours
Internet
c.gtld-servers.net
192.26.92.30
BGP monitor
24Cryptography is like magic fairy dust, we just
sprinkle it on our protocols and its makes
everything secure - See IEEE Security and
Privacy Magazine, Jan 2003
25Secure DNS Query and Response
Caching DNS Server
www.darpa.mil
Authoritative DNS Servers
www.darpa.mil
192.5.18.195 Plus (RSA) signature by darpa.mil
End-user
Attacker can not forge this answer without the
darpa.mil private key.
26 There is no magic fairy dust
27What To Take Away
- A new look at the Internet infrastructure
- Scaling up has more profound implications beyond
bigger numbers/tables. - Data reveals interesting problems and provides a
large-scale systems lab. - Challenges Remain in Improving the System
- But we can build backwards compatible changes
into the infrastructure (ex FRTR and RCN) - Need to develop general approaches to resilient
design of large-scale systems (Internet, Sensor
Nets, Etc.)
28Reference Cited
- Observation and Analysis of BGP Behavior under
Stress, L. Wang, X. Zhao, D. Pei, R. Bush,
D.Massey, A. Mankin, S. F. Wu, and L. Zhang,
Proceedings of the SIGCOMM Internet Measurement
Workshop, 2002 - FRTR A Scalable Mechanism to Restore Routing
Table Consistency, L. Wang, D. Massey, K. Patel,
and L. Zhang, To appear in IEEE Dependable
Systems and Networks (DSN), July 2004 - Understanding BGP Behavior Through A Study of DoD
Prefixes, X.Zhao, M. Lad, D. Pei, L.Wang, D.
Massey, S. F. Wu, and L. Zhang, Proceedings of
DISCEX III, April 2003. - Improving BGP Convergence with Consistency
Assertions, D. Pei, L. Wang, X. Zhao, D. Massey,
L. Zhang, A. Mankin, Proceedings of the IEEE
INFOCOM 2002. - BGP-RCN Improving BGP Convergence Through Root
Cause Notification, D. Pei, M. Azuma, N. Nguyen,
J. Chen, D. Massey, and L. Zhang, UCLA Department
of Computer Science Technical Report, UCLA CSD
TR-030047, October 2003. - Link-Rank A Graphical Tool for Capturing BGP
Routing Dynamics, M. Lad, D. Massey, and L.
Zhang, To appear in Network Operations and
Management Symposium (NOMS), April, 2004. - An Algorithmic Approach to Identifying Link
Failures, M. Lad, A. Nanavati, D. Massey, and L.
Zhang, To appear in 10th Pacific Rim Dependable
Computing Symposium (PRDC) March, 2004 - DNS Security Introduction and Requirements, R.
Arends, R. Austein, M. Larson, D. Massey and S.
Rose, Work in Progress, IETF DNS EXT Working
Group. Feb, 2004
29Acknowledgements
- Funding Sources
- FNIISC Project August 2000 - May 2004
- DARPA Fault Tolerant Networks
- PI USC/ISI (Dan Massey), UCLA (Lixia Zhang), UC
Davis (S. Felix Wu) - Beyond BGP Project October 2002 - September 2005
- NSF Special Projects in Networking
- PI USC/ISI (Dan Massey) and UCLA (Lixia Zhang)
- FMESHD Project July 2000 - December 2003
- DARPA Fault Tolerant Networks
- PI USC/ISI (Dan Massey) subk to NAI (Russ
Mundy) - With Thanks to Collaborators and Graduate
Students - Lixia Zhang and Felix Wu
- Lan Wang, Dan Pei, and Mohit Lad (UCLA USC/ISI
interns) - Naheed Vora (USC USC/ISI Intern)
30Revised DNS Key Management
darpa.mil NS records
Can Change mil key without notifying darpa.mil
darpa.mil DS record (hash of pubkey 1)
darpa.mil SIG(DS) by mil private key
mil DNS Server
darpa.mil DNS Server
darpa.mil KEY (pub key 1)
darpa.mil KEY (pub key 2)
darpa.mil SIG(A) by key 1
Can Change key 2 without notifying .mil
www.darpa.mil A record
www.darpa.mil SIG(A) by key 2
31Next Step DNS Security Activities
- Co-editor of the IETF specification 8.
- Last call workshop completed last month.
- Cleaning up minor issues and nits.
- Dept. of Homeland Security DNSSEC Group
- Group of 10 advising DHS on DNS security
deployment strategies. - Need operational policies for end systems.
- Investigating Resilient DNS
- Real security is more than authentication.
- Joint work with Amir/Terzis and Zhang/Wu.
- NSF ITR Proposal just completed (hours ago )
32DNS Key Roll-Over
darpa.mil DS record (hash of pubkey 3)
darpa.mil SIG(DS) by mil private key
darpa.mil DS record (hash of pubkey 1)
darpa.mil SIG(DS) by mil private key
mil DNS Server
darpa.mil DNS Server
darpa.mil KEY (pub key 1)
Objective Replace KEY 1 with new KEY 3
darpa.mil KEY (pub key 2)
darpa.mil KEY (pub key 3)
darpa.mil SIG(A) by key 1
darpa.mil SIG(A) by key 3
33Multi-Origin AS Routing Announcement
- MOAS exists in current BGP operation
- Some due to operational need some due to faults
- Blind acceptance of MOAS dangerous
- An open door for traffic hijacking
34BGP-based Solution Example
AS58
18.0.0.0/8
AS52
AS59
Example configuration
router bgp 59 neighbor 1.2.3.4 remote-as 52
neighbor 1.2.3.4 send-community neighbor
1.2.3.4 route-map setcommunity out route-map
setcommunity match ip address 18.0.0.0/8 set
community 59MOAS 58MOAS additive
35BGP false origin detection
(b) Two Origin ASs
(a) One Origin AS
36BGP Updates During Slammer Worm
37Constructing Fault Graphs
- Monitor observes a shift from red path to blue
path. - (Other monitors reveal node 5)
- Convert to a Fault-Graph
- Combine all topology data.
- Greedy algorithm to select core faults near
root. - Recursive search to find alternates for each core
fault. - Results in lower fault-graph.
- A set of edges is an explanation iff it is cut
in the fault-graph. - Min explanation min cut
- Extends to multiple views.
- Used to analyze LinkRank Data
Monitor
1
2
3
5
4
6
7
Source
Desitnation
1
2
5
4
7
Sink
38Infrastructure Security Enhancements
- BGP and DNS lack authentication.
- Easy to insert false BGP routes or reply with
false DNS data. - S-BGP (BBN) SoBGP (Cisco) propose adding Public
Key Authentication to BGP. - Verify origin is authorized to announce prefix
and verify each link in the AS path. - Is this path authentication the right goal?
- Requires a heavy-weight PKI structure.
- DNSSEC adds authentication to DNS
- Further along than the BGP approaches
- Provides lessons for BGP authentication.