NOW and Beyond - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

NOW and Beyond

Description:

Demand for resources revealed in price. distinct from the cost of acquiring the ... will cause resources to go to where they are most valued at the lowest price ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 29

Provided by: DavidE2

Learn more at: https://people.eecs.berkeley.edu

Category:

Tags: now | beyond

more less

Transcript and Presenter's Notes

Title: NOW and Beyond

1
NOW and Beyond

Workshop on Clusters and Computational Grids for
Scientific Computing
David E. Culler
Computer Science Division
Univ. of California, Berkeley
http//now.cs.berkeley.edu/

2
NOW Project Goals

Make a fundamental change in how we design and
construct large-scale systems
market reality
50/year performance growth gt cannot allow 1-2
year engineering lag
technological opportunity
single-chip Killer Switch gt fast, scalable
communication
Highly integrated building-wide system
Explore novel system design concepts in this new
cluster paradigm

3
Berkeley NOW

100 Sun UltraSparcs
200 disks
Myrinet SAN
160 MB/s
Fast comm.
AM, MPI, ...
Ether/ATM switched external net
Global OS
Self Config

4
Landmarks

Top 500 Linpack Performance List
MPI, NPB performance on par with MPPs
RSA 40-bit Key challenge
World Leading External Sort
Inktomi search engine
NPACI resource site

5
Taking Stock

Surprising successes
virtual networks
implicit co-scheduling
reactive IO
service-based applications
automatic network mapping
Surprising unsuccesses
global system layer
xFS file system
New directions for Millennium
Paranoid construction
Computational Economy
Smart Clients

6
Fast Communication

Fast communication on clusters is obtained
through direct access to the network, as on MPPs
Challenge is make this general purpose
system implementation should not dictate how it
can be used

7
Virtual Networks

Endpoint abstracts the notion of attached to the
network
Virtual network is a collection of endpoints that
can name each other.
Many processes on a node can each have many
endpoints, each with own protection domain.

8
How are they managed?

How do you get direct hardware access for
performance with a large space of logical
resources?
Just like virtual memory
active portion of large logical space is bound to
physical resources

Host Memory
Process n
Processor

Process 3
Process 2
Process 1
NIC Mem
P
Network Interface
9
Network Interface Support

NIC has endpoint frames
Services active endpoints
Signals misses to driver
using a system endpont

Frame 0
Transmit
Receive
Frame 7
EndPoint Miss
10
Communication under Load
gt Use of networking resources adapts to demand.
11
Implicit Coscheduling

Problem parallel programs designed to run in
parallel gt huge slowdowns with local scheduling
gang scheduling is rigid, fault prone, and
complex
Coordinate schedulers implicitly using the
communication in the program
very easy to build, robust to component failures
inherently service on-demand, scalable
Local service component can evolve.

12
Why it works

Infer non-local state from local observations
React to maintain coordination
observation implication action
fast response partner scheduled spin
delayed response partner not scheduled block

13
Example

Range of granularity and load imbalance
spin wait 10x slowdown

14
I/O Lessons from NOW sort

Complete system on every node powerful basis for
data intensive computing
complete disk sub-system
independent file systems
MMAP not read, MADVISE
full OS gt threads
Remote I/O (with fast comm.) provides same
bandwidth as local I/O.
I/O performance is very tempermental
variations in disk speeds
variations within a disk
variations in processing, interrupts, messaging,
...

15
Reactive I/O

Loosen data semantics
ex unordered bag of records
Build flows from producers (eg. Disks) to
consumers (eg. Summation)
Flow data to where it can be consumed

Adaptive Parallel Aggregation
Static Parallel Aggregation
16
Performance Scaling

Allows more data to go to faster consumer

17
Service Based Applications
Transcend Transcoding Proxy
Service request
Front-end service threads
User Profile Database
Manager
Physical processor
Caches

Application provides services to clients
Grows/Shrinks according to demand, availability,
and faults

18
On the other hand

Glunix
offered much that was not available elsewhere
interactive use, load balancing, transparency
(partial),
straightforward master-slaves architecture
millions of jobs served, reasonable scalability,
flexible partitioning
crash-prone, inscrutable, unaware,
xFS
very sophisticated co-operative caching network
RAID
integrated at vnode layer
never robust enough for real use
Both are hard, outstanding problems

19
Lessons

Strength of clusters comes from
complete, independent components
incremental scalability (up and down)
nodal isolation
Performance heterogeneity and change are
fundamental
Subsystems and applications need to be reactive
and self-tuning
Local intelligence simple, flexible composition

20
Millennium

Campus-wide cluster of clusters
PC based (Solaris/x86 and NT)
Distributed ownership and control
Computational science and internet systems testbed

21
Paranoid Construction

What must work for RSH, dCOM, RMI, read, ?
A page of C to safely read a line from a socket!
gt carefully controlled set of cluster system
ops
gt non-blocking with timeout and full error
checking
even if need a watcher thread
gt optimistic with fail-over of implementation
gt global capability at physical level
gt indirection used for transparency must track
fault envelope, not just provide mapping

22
Computational Economy Approach

System has a supply of various resources
Demand for resources revealed in price
distinct from the cost of acquiring the resources
User has unique assessment of value
Client agent negotiates for system resources on
users behalf
submits requests, receives bids or participates
in auctions
selects resources of highest value at least cost

23
Advantages of the Approach

Decentralized load balancing
according to users perception of importance, not
systems
adapts to system and workload changes
Creates Incentive to adopt efficient modes of use
maintain resources in usable form
avoid excessive usage when needed by others
exploit under-utilized resources
maximize flexibility (e.g., migratable,
restartable applications)
Establishes user-to-user feedback on resource
usage
basis for exchange rate across resources
Powerful framework for system design
Natural for client to be watchful, proactive, and
wary
Generalizes from resources to services
Rich body of theory ready for application

24
Resource Allocation
Stream of (partial, delayed, or
incomplete) resource status information
Stream of (incomplete) Client Requests
Allocator

Traditional approach allocates requests to
resources to optimize some system utility
function
e.g., put work on least loaded, most free mem,
short queue, ...
Economic approach views each user as having a
distinct utility function
e.g., can exchange resource and have both happy!

25
Pricing and all that

Whats the value of a CPU-minute, a MB-sec, a
GB-day?
Many iterative market schemes
raise price till load drops
Auctions avoid setting a price
Vikrey (second price sealed bid) will cause
resources to go to where they are most valued at
the lowest price
In self-interest to reveal true utility function!
Small problem auctions are awkward for most real
allocation problems
Big problem people (and their surrogates) dont
know what value to place on computation and
storage!

26
Smart Clients

Adopt the NT everything is two-tier, at least
UI stays on the desktop and interacts with
computation in the cluster of clusters via
distributed objects
Single-system image provided by wrapper
Client can provide complete functionality
resource discovery, load balancing
request remote execution service
Flexible applns will monitor availability and
adapt.
Higher level services 3-tier optimization
directory service, membership, parallel startup

27
Everything is a service

Load-balancing
Brokering
Replication
Directories
gt they need to be cost-effective or client will
fall back to self support
if they are cost-effective, competitors might
arise
Useful applications should be packaged as
services
their value may be greater than the cost of
resources consumed

28
Conclusions

Weve got the building blocks for very
interesting clustered systems
fast communication, authentication, directories,
distributed object models
Transparency and uniform access are convenient,
but...
It is time to focus on exploiting the new
characteristics of these systems in novel ways.
We need to get real serious about availability.
Agility (wary, reactive, adaptive) is
fundamental.
Gronky F77 MPI and no IO codes will seriously
hold us back
Need to provide a better framework for cluster
applications