CS514: Intermediate Course in Operating Systems - PowerPoint PPT Presentation

About This Presentation
Title:

CS514: Intermediate Course in Operating Systems

Description:

Even today, most CIP work emphasizes security and denial of service attacks ... Or will the 800lb gorilla just crush the whole market? ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 40
Provided by: csCor
Category:

less

Transcript and Presenter's Notes

Title: CS514: Intermediate Course in Operating Systems


1
CS514 Intermediate Course in Operating Systems
  • Professor Ken BirmanVivek Vishnumurthy TA

2
After the Internet
  • Were living at the end of history
  • For government types, that refers to the fall of
    the Berlin Wall and the collapse of the USSR
  • For us, it refers to the .COM boom and bust
  • The Internet had infinite promise until 2000 now
    it is approaching maturity
  • What do we know how to do?
  • What major challenges do we face as we look at
    the after the Internet picture?

3
Critical InfrastructureRapidly Expanding Web of
Dependency
  • Massive rollout underway
  • Control of restructured power grid
  • New medical information systems link hospital to
    other providers, reach right into the home
  • Telephony infrastructure
  • Financial systems eMoney replaces cash!
  • Disaster response and coordination
  • Future military will be extremely dependent on
    information resources and solutions

4
Tangled Interdependencies
Internet Software, COTS Technology Base
5
Multiple Concerns
  • Infrastructure industries have been dangerously
    naïve about challenges of using Internet and
    computing technologies in critical ways
  • Nationally critical information systems poorly
    protected, fragile, easily disrupted
  • Stems from pervasive use of COTS components
  • Vendors poorly motivated to address the issue
  • Yet academic research is having little impact
  • No sense of excitement or importance
  • Few significant technology transition successes

6
Most serious issue?
  • Loss of public interest and enthusiasm
  • Government shares this view
  • Its just software we buy it from Microsoft
  • Academic researchers often seen as freeloading at
    taxpayers expense
  • Critical infrastructure components often look
    less critical considered in isolation
  • Ten thousand networked medical care systems would
    worry us, but not individual instances

7
Concrete Examples of Threats?
  • Power system requires new generation of
    technology for preventing cascaded failures,
    implementing load-following power contracts
  • Industry requires solutions but has no idea how
    to build them. Technical concern masked by
    politics
  • DOE effort is completely inadequate
  • Three branches of military are separately
    developing real-time information support tools.
  • Scale will be orders of magnitude beyond anything
    ever done with Internet technologies
  • Goals recall the FAAs AAS fiasco (lost 6B!)

8
Concrete examples of threats?
  • 2003 East Coast blackout
  • Restructuring of power grid broke it into
    multiple competing producers / consumers
  • But technology to monitor and control the
    restructured grid lagged the need
  • Consequences of this deficiency?
  • Operators were unable to make sense of a slowly
    cascading instability that ultimately engulfed
    the whole East Coast!

9
Vendor Perspective?
  • Little interest in better security
  • You have zero privacy anyway. Get over it.
    Scott McNealy, CEO Sun Microsystems 1/99
  • In contrast, Bill Gates has often stated that
    MSFT needs to improve
  • But doesnt have critical infrastructure in mind
  • And he doesnt point to Internet issues.
  • Internet technology is adequate for the most
    commercially lucrative Web functions
  • But inadequate reliability, security for other
    emerging needs, including CIP requirements
  • Issue is that market is the main driver for
    product evolution, and market for critical
    solutions is small

10
Security Often mistaken for the whole story
  • Even today, most CIP work emphasizes security and
    denial of service attacks
  • But critical applications must also work
  • Correctly
  • When and where required
  • Even when components fail or are overloaded
  • Even when the network size grows or the
    application itself is used on a large scale
  • Even when the network is disrupted by failures

11
Market failure
  • Refers to situations in which a good technology
    is unsuccessful as a product
  • For example, everyone wants reliability
  • Many people like group communication
  • But how much will they pay for it?
  • One metric as a fraction of their total
    software investment for the same machines
  • Probably not more than 5-10
  • Revenue stream may be too small to sustain
    healthy markets and product growth

12
Lets get technical
  • A digression to illustrate both the potential for
    progress but also the obstacles we confront!

13
Scalability Achilles Heel of a Networked World?
  • 1980s Client-server architectures.
  • 1 server, 10s of simultaneous clients
  • 1990s Web servers
  • Small server cluster in a data center or farm
  • 1000s of simultaneous clients
  • First decade of 2000?
  • Server geoplex large farms in a WAN setting
  • 10s of 1000s of simultaneous clients
  • Emergence of peer-to-peer applications live
    collaboration and sharing of objects
  • Wireless clients could add another factor of 10
    client load

14
Technologies need to keep pace
  • We want predictable, stable performance,
    reliability, security
  • despite
  • Large numbers of users
  • Large physical extent of network
  • Increasing rates of infrastructure disruption
    (purely because of growing span of network)
  • Wide range of performance profiles
  • Growth in actual volume of work applications are
    being asked to do

15
Scalable Publish Subscribe
  • A popular paradigm well use it to illustrate
    our points
  • Used to link large numbers of information sources
    in commercial or military settings to even larger
    numbers of consumers
  • Track down the right servers
  • Updates in real-time as data changes
  • Happens to be a top military priority, so one
    could imagine the government tackling it

16
Publisher offers new events to a proxy server.
Subjects are partitioned among the server sets.
In this example there are four partitions blue,
green, yellow and red. Server set and partition
function can adjust dynamically
Subscriber must identify the best servers.
Subjects are partitioned among servers hence one
subscriber may need multiple connections
log
publish
Server cluster
Like the subscribers, each publisher connects to
the best proxy (or proxies) given its own
location in the network. The one selected must
belong to the partition handling the subject of
the event.
17
Large-scale applications with similar technical
requirements
  • Restructured Electric Power Grid
  • Large-scale financial applications
  • Disaster response
  • Community medical systems
  • Large-scale online information provision
  • Decentralized stock markets
  • Network monitoring, control

18
Poor Scalability
  • Long rumored for distributed computing
    technologies and tools
  • Famous study by Jim Gray points to scalability
    issues in distributed databases
  • Things that scale well
  • Tend to be stateless or based on soft state
  • Have weak reliability semantics
  • Are loosely coupled

19
Do current technologies scale?
Category Typical large use Limits?
Client-Server and object-oriented environments LAN system, perhaps 250 simultaneous clients Server capacity limits scale.
Web-like architectures Internet, hundreds of clients No reliability guarantees
Publish-subscribe Group multicast About 50 receivers, 500 in hierarchies Throughput becomes unstable with scale. Multicast storms
Many-Many DSM Rarely seen except in small clusters Update costs grow with cluster size
Shared database Farm 50-100 RACS 100s, RAPS 10s Few successes with rapidly changing real-time data
20
Recall the Stock Exchange Problem Vsync.
multicast is too fragile
Most members are healthy.
21
With 32 processes.
Virtually synchronous Ensemble multicast protocols
250
ideal
200
150
actual
average throughput on nonperturbed members
100
50
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
perturb rate
22
The problem got worse as the system scaled up
Virtually synchronous Ensemble multicast protocols
250
group size 32
group size 64
group size 96
200
150
average throughput on nonperturbed members
100
50
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
perturb rate
23
Why doesnt anything scale?
  • With weak semantics
  • Faulty behavior may occur more often as system
    size increases (think the Internet)
  • With strong semantics
  • Encounter a system-wide cost (e.g. membership
    reconfiguration, congestion control)
  • That can be triggered more often as a function of
    scale (more failures, or more network events,
    or bigger latencies)
  • Grays O(n2) database degradation reflects very
    similar issues a new law of nature?

24
Serious issue for our scalable publish-subscribe
technology
  • What if we build it for the military or some
    other critical use, and it works in the
    laboratory but not in the field?
  • Early evaluation has ruled out most off-the-shelf
    networking technologies
  • They just dont have the necessary scalability!
  • In fact, this happened with Navys Cooperative
    Engagement Capability (CEC)
  • They built it but it melts down under stress!

25
Fight fire with fire!
  • Turn to randomized protocols
  • with probabilistic reliability goals
  • This overcomes the scalability problems just seen
  • Then think about how to present mechanism to
    user

26
Tools in our toolkit
  • Traditional deterministic tools
  • Virtual synchrony Only in small groups
  • Paxos
  • Transactions
  • New-age probabilistically reliable ones
  • Bimodal multicast
  • Astrolabe
  • DHTs

27
Publisher offers new events to a proxy server.
Subjects are partitioned among the server sets.
In this example there are four partitions blue,
green, yellow and red. Server set and partition
function can adjust dynamically
Subscriber must identify the best servers.
Subjects are partitioned among servers hence one
subscriber may need multiple connections
We can use Bimodal Multicast here
log
publish
This replication problem looks like an instance
of virtual synchrony
Server cluster
Perhaps this client can use Astrolabe to pick a
server
Like the subscribers, each publisher connects to
the best proxy (or proxies) given its own
location in the network. The one selected must
belong to the partition handling the subject of
the event.
28
Publisher uses Astrolabe to identify the correct
set of receivers
Subscriber must identify the best servers.
log
Bimodal Multicast
Astrolabe manages configuration and connection
parameters, tracks system membership and state.
Server cluster
The combined technologies solve the initial
problem!
29
A glimpse inside a data center
front-end applications, web sites, web services
routers
Pub-sub combined with point-to-pointcommunication
technologies like TCP
legacy systems
30
Cornell QuickSilver platform in a datacenter
To send an update, we not only need to find the
cluster, but also initiate some form of
replication protocol a multicast, chain update,
1SR transaction, etc. Notice the potentially
huge numbers of replications groups the
selected technology must not only be
fault-tolerant and fast, but it also needs to
scale in numbers of distribution patterns a
dimension as yet unexplored by research community
and overlooked in most products!
To send a query, client needs a way to map to
appropriate partition of the target service and
then to locate a suitable representative of the
appropriate cluster
System administrators will need a way to monitor
the state of all these services. This
hierarchical database is a good match with
Astrolabe, an example of a P2P solution Cornell
has been exploring. They also need a way to
update various control parameters at what may be
tens of thousands of locations. The resulting
scalable reliable multicast problem is also one
Cornell has looked at recently.
Services are hosted at data centers but
accessible system-wide
Data center A
Data center B
Best hope for dealing with legacy components is
to somehow wrap them in a software layer
designed to integrate them with the monitoring
and control infrastructure and bring autonomic
benefits to bear on them where practical. By
intercepting inputs or replicating checkpoints
may be able to harden these to some degree
Query source
Update source
Server pool
31
Good things?
  • We seem to have technologies that can overcome
    Internet limitations using randomized P2P gossip
  • However, Internet routing can defeat our clever
    solutions unless we know network topology
  • These have great scalability and can survive
    under stress
  • And both are backed by formal models as well as
    real code and experimental data
  • Indeed, analysis is robust too!

32
Bad things?
  • These are middleware, and the bottom line is that
    only MSFT can sell middleware!
  • Current commercial slump doesnt help nobody is
    buying anything
  • Indeed, while everything else advances at
    Internet speed the Internet architecture has
    somehow gotten stuck circa 1985!
  • Is this an instance of a market failure?
  • The modern Internet Unsafe at any speed?

33
The Internet policy
  • Assumes almost everything uses TCP
  • TCP is designed to be greedy
  • Ratchet bandwidth up until congestion occurs
  • Routers are designed to drop packets
  • They use RED (Random Early Detection)
  • Throw away packets at random until TCP gets the
    point and slows down
  • Our problem?
  • Were not running TCP and this policy penalizes
    us, although it works for TCP.

34
Internet itself Main weak point
  • Our hardest open problems arise in the Internet
  • Astrolabe and Bimodal Multicast dont do much for
    security
  • They need to know network topology but the
    Internet conceals this information
  • We could perhaps use these tools to detect and
    react to a DOS attack at the application layer,
    but in fact such an attack can only be stopped in
    the network itself
  • Butler Lampson
  • The Internet and the Web are successful
    precisely because they dont need to work very
    well to succeed

35
The Internet got stuck in 1985
  • Critical Infrastructure Protection hostage to a
    perception that the Internet is perfect!
  • Must somehow recapture the enthusiasm of the
    field and the commercial sector for evolution and
    change
  • Scalability building massive systems that work
    really well and yet make full use of COTS
  • Awesome performance, even under stress
  • Better Internet Time for a Supernet?

36
Lagging public interest
  • An extremely serious problem
  • The Internet boomed then it melted down
  • And were Internet people
  • Even worse in the CIP area
  • We predicted disaster in 1996 1999 2000
  • Cyberterrorists Internet will melt down
  • Were the people who keep crying wolf
  • Realistically, cant fight this perception
  • Argues that CIP success will have to come from
    other pressures, not a direct public clamor!

37
A missing pipeline
  • Long term research
  • Fundamental questions 10 year time horizon
  • New practical options 5 years from products
  • Industry stakeholders ready to apply good ideas
    in real settings
  • Companies interested in ideas for new products

Researchers at Cornell
Basic needs
Researchers at SRI
Developers at the Electric Power Research
Institute
Practical needs
COTS solutions
38
Best hope?
  • Government must work with all three communities
    CIP stakeholders, researchers, vendors
  • A tricky role consider MSFT initiative on
    security
  • Will MSFT trigger a wave of commercial products?
  • Or will the 800lb gorilla just crush the whole
    market?
  • Reexamine legal basis for hold harmless clauses
    that indemnify software vendors against damages
    if products are defective through outright
    negligence
  • Growing need for military, homeland defense helps
  • But need to balance against understandable
    inclination to keep such programs black

39
Conclusions
  • CIP hostage to complacency as an undramatic
    threat slowly grows!
  • Nationally critical infrastructure is exposed to
    security reliability problems and this exposure
    is growing, yet is largely ignored.
  • Research effort has contracted around an overly
    theoretical security community
  • Current trend a recipe for economic stagnation.
    Inadequate technology blocks new markets
Write a Comment
User Comments (0)
About PowerShow.com