Title: Networking Update
1Networking Update
- Terry Gray
- Director, Networks Distributed Computing
- University of Washington
- UW Medicine IT Steering Committee
- 16 January 2004
- 20 February 2004
2Outline
- In our last episode
- Context
- Expanded Partnership
- Recent Problems
- Today
- Systemic Problems and Progress
- Network Security Chronology
- Design Issues
3Context A Perfect Storm
- Increased dependency on network apps
- Decreased tolerance for outages
- Decades of deferred maintenance...
- Inadequate infrastructure investment
- Some old/unfortunate design decisions
- Some extraordinarily fragile applications
- Fragmented host management
- Increasingly hostile security environment
- Increasing legal/regulatory liability
- Importance of research/clinical leverage
4Key Elements of the Partnership
- Changed CC now responsible for...
- In-building network implementation
andoperational support for med ctrs, clinics - Med center network design for real
- Not Changed CC still responsible for...
- Network backbone, routers
- Regional and Internet connectivity
- SoM and Health Sciences networking
5Why the Partnership Makes Sense
- Consistency, interoperability, manageability
- Leverage CC networking expertise
- Clinical/research hi-performance network needs
- 24x7 Network Operations Center (NOC)
- Advanced network management tools
- Avoid design/build organizational conflicts
- Beyond the network...hope to share distributed
system architecture and network computing
expertise
6Recent Problems
- Oct 29 Partial router failure reveals escalation
procedure problems - Oct 30 Security breach triggers connectivity and
server problems - Nov 12 13 minute power outage triggers extended
server outage - Dec 12 Router upgrade uncovers wiring error,
which triggers multicast storm(None of these
were related to the network transition, save
perhaps timing of 4)
7System Elements
- Environmentals (Power, A/C, Physical Security)
- Network
- Client Workstations
- Servers
- Applications
- Personnel, Procedures, Policy, and
ArchitectureFailures at one level can trigger
problems at another level need Total System
perspective
8Reasonable Questions
- Whats up with CCs alarm system vendor?
- If power was out for only 14 minutes, why was
service out for multiple hours? - What can we say about an app so fragile that a
net interruption of a few seconds requires a
server reboot? - What can we say about thin clients built on top
of thick (WinXP) operating systems? - What can we say about a network where one wiring
fault can disable most of the net?
9Systemic Problems and Progress
10Systemic Network Problems(NB these pre-date Tom
et al)
- Old infrastructure (e.g cat 3 wire)
- Non-supportable technologies (e.g. FDDI)
- Non-supportable (non-geographic) topology
- Expensive shortcuts (e.g. cat5 mis-terminated)
- Security based on individual IP addresses
- Subnets with clients and critical servers
- Documentation deficiency
- Contact database
- Device location database
- Critical device registry
11Systemic General Problems
- Ever-increasing system complexity, dependencies
- Departmental autonomy
- Un-controlled hosts
- Un-reliable power and A/C in equipment rooms
- No net-oriented application procurement
standards - Are HA and DRBR expectations realistic?
- Are backup plans workable?
12Some Numbers
13Network Device Growth
Note Most dips reflect lower summer use last
one is a measurement anomaly
14Network Traffic Growth (linear)
15Network Traffic Growth (log)
16Near-term Progress and Plans
- Agreement on standard maintenance window
- Created Top 10 list --creeping to Top 20 )
- Static addressing work-around (success!)
- FDDI, VLAN elimination
- Subnet splits/upgrades (1500 computers)
- Equipment upgrades
- Router consolidation, dedicated subnets, separate
med center backbone - Equipment, outlet location database updates
- Initial wireless deployment
17Design Review and Cost Estimates
- Biggest cost physical infrastructure
wireplant upgrades - NetVersant engaged for cost estimation project
- Cisco engaged for network architecture review
- We recommend similar reliability/design
assessment for servers, apps procedures
18Design Issues
19Design Tradeoffs
- Networks Connectivity Security Isolation
- Fault Zone size vs. Economy/Simplicity
- Reliability vs. Complexity
- Prevention vs. (Fast) Remediation
- Security vs. Supportability vs.
FunctionalityDifferences in NetSec approaches
relate to - Balancing priorities (security vs. ops vs.
function) - Local technical and institutional feasibility
20Tradeoff Examples
- Defense-in-depth conjecture (for N layers)
- Security MTTE (exploit) ? N2
- Functionality MTTI (innovation) ? N2
- Supportability MTTR (repair) ? N2
- Perimeter Protection Paradox (for D devices)
- Firewall value ? D
- Firewall effectiveness ? 1 / D
- Border blocking criteria
- Threat cant reasonably be addressed at edge
- Wont harm network (performance, stateless block)
- Widespread consensus to do it
- Security by IP address
21Network Security Credo
- Focus first on the edge(Perimeter Protection
Paradox) - Add defense-in-depth as needed
- Keep it simple (e.g. Network Utility Model)
- But not too simple (e.g. offer some policy
choice) - Avoid
- one-size-fits-all policies
- cost-shifting from guilty to innocent
- confusing users and techs (broken by design)
22Preserving the Net Utility Model
- What is it?
- Why important?
- Incompatible with perimeter security?
- Too late to save?
- NUM-preserving perimeter defense
- Logical Firewalls
- Project 172
- Foiled by static IP addressing
- Requires all hosts be reconfigured
23Lines of Defense
- Network isolation for critical services.
- Host integrity. (Make the OS is net-safe.)
- Host perimeter. (Add host firewalling)
- Server sanctuary perimeter.
- Network perimeter defense.
- Real-time attack detection and containment.
24Network Security Chronology
- 1990 Five anti-interoperable networks
- 1994 Nebula shows network utility model viable
- 1998 Defined border blocking policy
- 2000 Published Network Security Credo
- 2000 Added source address spoof filters
- 2000 Proposed med ctr network zone
- 2000 Proposed server sanctuaries
- 2001 Ban clear-text passwords on CC systems
- 2001 Proposed pervasive host firewalls
- 2001 Developed logical firewall solution
- 2002 Developed Project-172 solution
- 2003 Slammer, Blaster death of the Internet
- 2003 Developed flex-net architecture
25Next-Gen Network Architecture
- Parallel networks more redundancy
- Supportable (geographic) topology
- Med center subnets separate backbone zone
- Perimeter, sanctuary, and end-point defense
- Higher performance
- High-availability strategies
- Workstations spread across independent nets
- Redundant routers
- Dual-homed servers
26Success Metrics
- Toms
- Nobody gets hurt
- Nobody goes to jail
- Terrys
- Works fine, lasts a long time
- Low ROI (Risk Of Interruption)
- Steves
- Four Nines or bust!
27Success Metrics II
- We all want
- High MTTF, Performance and Function
- Low MTTR and support cost
- The art is to balance those conflicting goals
- we are jugglers and technology actuaries
28Success Metrics III
- How many nines?
- Problem one what to measure?
- How do you reduce behavior of a complex net to a
single number? - Difficult for either uptime or utilization
metrics - Problem two data networks are not like phone or
power services - Imagine if phones could assume anyones number
- Or place a million calls per second!
29Concerns, Future Challenges
- Mitigating impact of closed networking
- Needs of the many vs. needs of the few
- Pressure to make network topology match
administrative boundaries - Complex access lists
- False sense of security
- Increased MTTR
- Next-generation threats firewalls wont help
- Security vs. High-Performance
- Wireless
- Balancing innovation, operations, security
30Lessons
- Five 9s is hard (unless we only attach phones?)
- Even host firewalls dont guarantee safety
- Perimeter firewalls may increase user confusion,
MTTR - Nebula existence proof security in an open
network - Even so defense-in-depth is a Good Thing
- It only takes one compromise inside to defeat a
firewall - Controlling net devices is hard --hublets,
wireless - The cost of static IP configuration is very high
- Net reliability host security are inextricably
linked - Never underestimate non-technical barriers to
progress
31Questions? Comments?