Title: The Case for Open Infrastructure Services in Java
1The Case for Open Infrastructure Services in Java
- David Culler
- Computer Science Division
- U.C. Berkeley
- www.cs.berkeley.edu/culler
- Java Grande Dinner Keynote, June 2000
2Appetizer
- Grande-scale computing dominated by internet
services - Delivered to millions per day on well-engineered
clusters over service interfaces
Clients
Clients
Servers
Clients
Clients
Clients
Clients
Servers
Servers
The Internet
3Opportunity infrastructure services
- Prehistoric DNS, IP route tables,
- Historic crawl, index, search,
- Emerging compose and manipulate data and
services
Infrastructure Services
Clients
Clients
Servers
Clients
Clients
Clients
Clients
Servers
And client diversity has just begun!
Servers
The Internet
4Danger loss of distributed innovation
- PC generation of individual authoring distr.
- vs ATT, IBM, AOL scale service engineering
Infrastructure Services
Clients
Clients
Servers
Clients
Clients
Clients
Clients
Servers
Servers
The Internet
5UCB Ninja Vision
- Open platform architecture for world-scale
internet services - receptive execution environment
- push services into the platform
- scalability and availability built-in
- service composition as a first-class programming
concept - gt make it easy to author and publish high
quality services into a well-engineered
infrastructure - ..for example
6Example Ninja Jukebox 98
Collaborative Community anyone can add
content gt mp3.com, real jukebox,
napster Authentication and authorization was
built-in Jukebox 99 Music similarity query
engine gt mongomusic.com, ...
7Santio universal instant messaging
S. Gribble
AOL protocol
AOL protocol
AOL worker
english to spanish
profile DDS
english to spanish
english to spanish
AOL client
ICQ protocol
ICQ protocol
ICQ worker
sanctio service (cluster)
ICQ client
8Composable, Secure Proxy Architecture for Post-PC
devices
S. Ross, J. Hill
Internet Services
Diverse Clients
Personal Appl
Embeded Untrusted Client
DATEK (Trust Contract)
Trusted Client
https
9Reduce value of the information
DATEK
10Example eScience Services
Sugar MEMS simulation Service
Nodal Modeling
LAPACK Services
Netsolver
11Outline
- Call for distributed innovation of scalable,
composable services - Wandering Down the Java Garden Path
- Returning to robust building blocks and design
patterns - Postprandial thoughts
12A Structured Architecture Approach
13Guided by the CAP lemma
- Consider
- Consistency
- Availability
- Operation in the presence of network Partitions
- You may have any two of the three,
- but not all three
- Example replicate for availability
- lose consistency upon update during partition
- or can defer the updates till healed
- or can engineer the system so no partition
between replicas
14The Java Apple
- strong typing
- automatic memory management
- Concurrency built-in Threads and Synchronized
Methods - finally!
- Elegant remote access built-in RMI
- service lookup yields service object stub
- transparent access
- Code mobility
- traditionally for pulling down applets on demand
15iSpace Execution Environment
Untrusted Services
Trusted Services
Loader
Security Mgr
Ninja iSpace RMI
JVM persistent store APIs
iSpace
16Multispace Cluster Platform
client
- RMI Redirector Stubs run-time compiled RMI
superstub - stub selection policy
- fail-over,
- broadcast, multicast, fork, etc.
iSpace
17After the garden Post-Prototype Reality
- Powerful, attractive, tantalizing possibilities
- see examples ...
- Didnt scale
- service concurrency
- client population
- service diversity
- Wasnt robust
- Lessons
- Thread-per-task considered harmful
- Woes of blocking interfaces
- The Transparency trap
- Versions really matter
18Java RMI Thread-per-task services
Client
Service
- Server Thread per client thread
- familiar per-task programming model, including
RMI and I/O - Socket per client JVM (or per thread, per stub!)
19The transparency trap
- Server commits thread regardless of client load
- Client places demand regardless of server
concurrency
- resource ? to blocking composition depth
- ease leads to fine grain use of remote objects
- RMI call backs make client a server
- lifetime and scope of remote object unlimited
- inexpressive error model (wait or
RemoteException) - serialization is costly
20Blocking Thread Non-blocking ???
- JAVA i/o and comm APIs all blocking!
- need JNI for select!
Keep going to the thread well
21Study a Service test problem
- A popularity
- L I/O, network, or service composition depth
task arrivals rate A tasks / sec
Threaded server
dispatch( ) or create( )
latency L sec
concurrent tasks in server T A x L
task completions rate S tasks / sec
22Response time vs S ( T/L)
23Threads are a limited Resource
- Fix L 10 ms, for each T measure max A S
- Cluster parallelism just raises the threshold
CPU bound tasks saturate early focus on
threads, footprint follows
ultra 170 and E450, Solaris 7.2, jdk 1.2.2
24Alternative queues, events, typed msgs
- server provides bounded resources at request
interface - chooses when to assign resources to request event
- imposes load-conditioning or admission control
- client retains control of its thread
- chooses when to block
- permits negotiation protocol
- key to service composition
- queues absorb load and decouple operations
- provide non-blocking interface
- RMI as syntax sugar
Explicit request queue
25Java Event-based Server
- Fixed threads , independent of concurrent
tasks in server (A x L)
26Event-per-task saturates gracefully
- Better and more robust performance
- Use cluster parallelism to match demand
- Decompose task into multiple events
- circulate or pipeline
- but ...
27Down side of event approach
- Lose the familiar sequential programming (plus
synchronization) - need a handler per stage of the task
- Does not naturally exploit SMP parallelism
- must pipeline multiple event handler blocks
- Blocking interfaces (or faults) cause throughput
to follow 1/L in an event block!
28Hybrid, Robust building block
- Compose service as graph of task handlers
- Decouple stages of task within a node
- Replicate across cluster nodes for scale and
availability - Thread parallelism and latency tolerance within
task handler block (i.e., A x L lt T per node)
29Hybrid Performance
Ultra 1
- Competitive with pure event block
- small overhead due to extra threads
- Upon blocking op, throughput tracks T/L
30Four key task handler design patterns
- Wrap
- Pipeline
- Replicate
- Combine
31Wrap
gt
- Take arbitrary piece of code
- place queue in front
- encapsulate with bounded thread pool T lt T
- gt get robust service with non-blocking
interface
32Wrap (thread-per-task server)
gt
- Get robust hybrid task handler with T/L tolerance
- Preserve conventional task sequencing
- Building block for composed services
33Pipeline
gt
- Decouple stages within task handler across
multiple task handlers - Wrapped Blocking call is natural boundary
34Why Pipeline?
- Functional parallelism across stages
- when thread blocks in one...
- Functional parallelism across processors
- Functional parallelism across nodes
- Increase locality (cache, VM, TLB, ) within node
- tend to perform operation (stage) on convoy of
tasks - Limit number of threads devoted to low
concurrency operation - ex file system can only handle 40-50 concurrent
write requests, so this limits useful T - additional threads can be applied to remainder of
stage
35Replicate
gt
- Scale throughput across nodes
- Provide fault isolation boundary
- Mediate thread-pool bottleneck within node
36Combine
gt
- Two task handlers share pool and queue
- Common use is before/after wrapped call
- Avoid wasting threads
37A Prescription
- Well-conditioned node
- Wrap to introduce load conditioning
- Pipeline to avoid wasting threads at bottlenecks
- Pipeline to enhance locality
- Available Service
- Replicate for Fault Tolerance
- Scaling
- Replicate to meet concurrency demand
- Tuning
- Combine to limit threads per node
- Pipeline for functional specialization
38Ninja vSPACE design
- Each blocking interface is wrapped
- Service described by collection of task handler
modules - Each module implements a set of task types
- includes completion events
- module clones are replicated on demand
- Most task handlers are state free
- Persistent state provided by DDS
- Explicit queues are the fundamental means of
introspection
39Example Hash Table Distr. Data Struct.
Clustered Service
Distr Hash table API
Redundant low latency high xput network
System Area Network
Single-node durable hash table
40DDS Hash Table Brick Design
I/O core
I/O core
distributed
hashtable
disk
network
RPC skeletons
file system /
network stack
single-node
raw disk
HT
Ideal I/O Core
buffer
I/O core
I/O core
cache
disk
network
I/O core
I/O core
disk
network
file system /
network stack
operating system
raw disk
Pragmetic I/O Core
DDS Brick
41Scalable Throughput
42Robust under load
43Fault and Recovery
Garbage collection
Recover done
Recovered node cold
Recover start
Three nodes
One dies
44Dessert thoughts
- Performance and efficiency on Java is critical
first step, but cannot stay in MPP mode - Huge Opportunity
- distributed innovation of widely used services
(with I/O) - service composition as new level of programming
- Need to deal with resource containment, load,
errors, versions and coupling from the beginning - events, queues, types msgs gt managed RMI
- Event driven execution (encapsulating threads) is
exciting opens a rich set of questions - expressiveness, synthesis
- introspection, scheduling, concurrency control
- debugging
45Where to go for more
- http//ninja.cs.berkeley.edu
- A Design Framework for Highly Concurrent Systems,
Matt Welsh, Steven Gribble, Eric Brewer, and
David Culler. - Scalable, Distributed Data Structures for
Internet Service Construction, Steven Gribble,
Eric Brewer, Joseph Hellerstein, and David
Culler. - A security Architecture for the Post-PC World, S.
Ross, J. Hill, M. Chen, D. Culler, A. Joseph, E.
Brewer - The MultiSpace an Evolutionary Platform for
Infrastructural Services, Steven Gribble, Matt
Welsh, Eric Brewer, and David Culler.
46Backup Mobility not enough
- RMI names classes / interfaces in the registry
- which class do you get?
- Class path management nightmare
- Must maintain source web server
- distinct services may need distinct instances
- service name ! class name
- versioning is essential
- use renaming to allow multiple versions within VM
- service publication expresses entire dependence
set