Title: Emergence of Extreme Networked Devices
1Emergence of Extreme Networked Devices
- David Culler
- Computer Science Division
- U.C. Berkeley
- www.cs.berkeley.edu/culler
USC, Feb 28, 2001
2The Expanding Computing Spectrum
- Servers
- Workstations
- Personal Computers
3Convergence at the middle
- Common platform
- powerful microproc (choice of 3), dram (3), disk
(2) - deep I/O hierarchy, OS layering
- Common system abstraction
- collection of threads sharing a large virtual
address space - GUI orientation
- blocking interfaces
- Concurrency as threads
- Services as local call / remote thread
- RPC, rmi, dCOM, http
- Ample resources easily abstracted
- open loop
- transparent allocation and usage
4Emerging Extremes
- Servers
- Workstations
- Personal Computers
svr
5Convergence at the Extremes
- Concurrency intensive
- data streams and real-time events, not
command-response - Communications-centric
- Limited resources (relative to load)
- Huge variation in load
- population usage physical stimuli
- robustness
- Hands-off (no UI)
- Dynamic configuration, discovery
- Self-organized and reactive control
- Similar execution model
- event driven,
- components
- Complimentary roles
- tiny semi-autonomous devices empowered by
infrastructure - infrastructure services connected to the real
world
6Outline
- Emerging Extremes
- Robust Framework for Open Scalable Internet
Services - the garden path threads to non-block I/O and RPC
- structured event-driven alternatives
- controllers within a graph of stages
- Tiny OS for Wireless Embedded Sensor Networks
7Ninja Open Infrastructure Services
Infrastructure Services
Clients
Clients
Servers
Clients
Clients
Clients
Clients
Servers
Servers
The Internet
- systematic framework for building robust,
composable services - focus here on execution model
8Variation in Load slashdot effect
USGS Web Server Traffic October 16, 1999 Hector
Mine Earthquake
http//pasadena.wr.usgs.gov/stans/slashdot.html
9Inherent Variation Gnutella Router Traffic
Matt Welsh
10Toward Robust Behavior Under Load
- Traditional Capacity Planning
- over-provision by factor over typical (increasing
4 -gt 10-15) - cluster-based replication is, at least,
cost-effective - peaks occur when it matters most
- Content-distribution
- potential replication proportional to use
- Still want graceful degradation when instance is
overloaded
11Threads as THE building block
- Freely compose these two primitives
- But,... threads a limited resource
12Service test problem
- A popularity
- L I/O, network, or service composition depth
task arrivals rate A tasks / sec
Threaded server
dispatch( ) or create( )
latency L sec
concurrent tasks in server T A x L
task completions rate S tasks / sec
13Threads are a limited resource
- Fix L 10 ms, for each T measure max A S
- Cluster parallelism just raises the threshold
ultra 170 and E450, Solaris 7.2, jdk 1.2.2
14Alternative queues, events, typed msgs
- single-threaded server
- queues absorb load and decouple operations
- svr chooses when to assign resources to request
event - bounded resources at request interface
- impose load-conditioning or admission control
- provide non-blocking interface
- client retains control of its thread
- chooses when to block
- permits negotiation protocol
- key to service composition
Explicit request queue
15Event-per-task saturates gracefully
- Better and more robust performance
- Use cluster parallelism to match desired thruput
- Can decompose task into multiple events
- circulate or pipeline
- but ...
16Down-side of monolithic event approach
- Lose familiar programming model
- thread steps through each stage in the task
- need a handler per stage
- Difficult software engineering
- composing and scheduling
- Does not naturally exploit SMP parallelism
- must pipeline multiple event handler blocks
- Whenever the thread blocks, the whole structure
stalls - throughput 1/L
17State-of-practice bounded thread pool
task arrivals rate A tasks / sec
- Only allow K threads to accept connections
- some OSs have fixed hard limit
- Additional requests time-out
- choose K lt Tmax xput
- choose K large enough to hide L
Threaded server
task completions rate S tasks / sec
18A third road
- Building block
- bounded internal thread pool
- queue-based interface
- subset of task stages
- request event processing in familiar style
- can chunk request stream for efficiency
- Compose Service as a graph of stages
- modularity
- stages can be replicated across nodes
- Stage control loop manages threads
19Well-conditioned Service Architecture
- Abstract System I/F as non-blocking stages
- careful engineering at the system interface
- Describe stages as modular state machines
- Associate thread manager with stages
- Build Service as composition of stages
- can be dynamic
Matt Welsh
20example http throughput
SPECweb99 static workload, 4 classes
21Response Time
22Reactive Stage Thread Pool Sizing
Two Packet Types ping fast query 20
ms delay
clients
Thread Governor - observes queue length -
over threshold gt add threads
23Scalable Persistent Data Structures
Clustered Service
Steve Gribble
24Scalable Throughput
25Robust under load
26Outline
- Emerging Extremes
- Robust Framework for Open Scalable Internet
Services - modular generalized state machines
- constrained use of threads
- thread-manager as controller
- Tiny OS for Wireless Embedded Sensor Networks
- characteristics of the other extreme
- current platforms
- events and primitive threads in a graph of
components - exploring open problems
27Emerging Microscopic Devices
- CMOS trend is not just Moores law
- Micro Electical Mechanical Systems (MEMS)
- rich array of sensors are becoming cheap and tiny
- Imagine, all sorts of chips that are connected
to the physical world and to cyberspace!
28Characteristics of Network Sensors
- Small physical size and low power consumption
- Concurrency-intensive operation
- flow-thru, not wait-command-respond
- Limited Physical Parallelism and Controller
Hierarchy - primitive direct-to-device interface
- Diversity in Design and Usage
- application specific, not general purpose
- huge device variation
- gt efficient modularity
- gt migration across HW/SW boundary
- Robust Operation
- numerous, unattended, critical
- gt narrow interfaces
29Current Example
- 1 x 1.5 motherboard
- ATMEL 4Mhz, 8bit MCU, 512 bytes RAM, 8K pgm flash
- 900Mhz Radio (RF Monolithics) 10-100 ft. range
- ATMEL network pgming assist
- Radio Signal strength control and sensing
- I2C EPROM (logging)
- Base-station ready
- stackable expansion connector
- all ports, i2c, pwr, clock
- Several sensor boards
- basic protoboard
- tiny weather station (temp,light,hum,press)
- vibrations (acc, temp, ...)
- accelerometers
- magnetometers
30Basic Power Breakdown
- But what does this mean?
- Lithium Battery runs for 35 hours at peak load
and years at minimum load! - three orders of magnitude difference!
- A one byte transmission uses the same energy as
approx 11000 cycles of computation. - Idleness is not enough, sleep!
31A Operating System for Tiny Devices?
- Traditional approaches
- command processing loop (wait request, act,
respond) - monolithic event processing
- bring full thread/socket posix regime to platform
- Alternative
- provide framework for concurrency and modularity
- never poll, never block
- interleaving flows, events, energy management
- allow appropriate abstractions to emerge
32Tiny OS Concepts
- Scheduler Graph of Components
- constrained two-level scheduling model threads
events - Component
- Commands,
- Event Handlers
- Frame (storage)
- Tasks (concurrency)
- Constrained Storage Model
- frame per component, shared stack, no heap
- Very lean multithreading
- Efficient Layering
Events
Commands
send_msg(addr, type, data)
power(mode)
init
Messaging Component
Internal State
internal thread
TX_packet(buf)
Power(mode)
TX_packet_done (success)
init
RX_packet_done (buffer)
33Application Component Graph
Route map
router
sensor appln
application
Active Messages
Radio Packet
Serial Packet
packet
Temp
photo
SW
HW
UART
Radio byte
ADC
byte
Example ad hoc, multi-hop routing of photo
sensor readings
clocks
RFM
bit
34TOS Execution Model
- commands request action
- ack/nack at every boundary
- call cmd or post task
- events notify occurrence
- HW intrpt at lowest level
- may signal events
- call cmds
- post tasks
- Tasks provide logical concurrency
- preempted by events
- Migration of HW/SW boundary
data processing
application comp
message-event driven
active message
event-driven packet-pump
crc
event-driven byte-pump
encode/decode
event-driven bit-pump
35Dynamics of Events and Threads
bit event gt end of byte gt end of packet gt
end of msg send
thread posted to start send next message
bit event filtered at byte layer
radio takes clock events to detect recv
36Storage Breakdown (C Code)
3450 B code 226 B data
37Empirical Breakdown of Effort
- can take apart time, power, space,
- 50 cycle thread overhead, 10 cycle event overhead
38Working Across Levels
- Encoding
- DC-balanced SECDED
- Proximity detection
- signal strength or error rates
- Low power listening
- Fair and efficient network access
- Security
- Tiny virtual machines
- Larger challenges
39Low-Power Listening
- Costs about as much to listen as to xmit, even
when nothing is received - Only way to save power is to turn radio off when
there is nothing to hear. - Can turn radio on/of in about 1 bit
- Can detect transmission at cost of 2 bit times
- Small sub-msg recv sampling (10x)
- Application-level synchronization rendezvous to
determine when to sample (10X)
sleep
preamble
message
Xmit Recv
b
Jason Hill
40Managing local contention
- Highly correlated traffic, no collision detection
- sensor events and beacons
- Randomize initial listen period, simple backoff
Channel Utilization 70
Throughput per node is fair
Alec Woo
41Managing aggregate contention
- Hidden nodes between each pair of levels
- CSMA is not enough
- RTS/CTS acks too costly (power BW)
- Pmsg-to-base drops rapidly with hops
- Investment in packet increases with distance
- Local rate control to approx. fairness
- Priority to forwarding, adjust own data rate
- Additive increase, multiplicative decrease
- Listen for retransmission as ack
- ½ of packets get through 4 levels out
42Authentication / Security
- RC-5 shared key crypto in 1.7 kb
- Modified Tesla protocol for confidential
authenticated base broadcast - Easy to compromise a node, but hard to get most
of them
43Whats in a program?
- HW collection of components supports space of
applications - Application-Specific Virtual Machine
- code-density, not portability
- small byte-code interpreter component
- accepts clock message event capsules
- Hides split-phase operations below interpreter
- Capsules define specific query / logic
- filter criteria
- diffusion primitives
- ...
44Thoughts about robust Algorithms
- Active Dynamic Route Determination
- When hear a new route beacon, record parent,
retransmit from SELF, ignore additional messages
for epoch - Radio cell structure very unpredictable
- Builds and maintains good breadth-first forest
- Each node maintains O(1) state
- Fundamental operation is pruning retransmission
- Monotonic variables
- Message signature caches
- Takes energy to retain structure
45Larger Challenges
- Programming support for systems of generalized
state machines - language, debugging, verification
- Programming the unstructured aggregate
- Resilient Aggregators
- Understanding how an extreme system is behaving
and what is its envelope - adversarial simulation
46Tides of Innovation
Innovation
??
Integration
Personal Computer Workstation Server
Log R
Minicomputer
Mainframe
Time
2/2001
47Summary
- The extremes of the computing spectrum present
tremendous opportunities for innovation - Systems challenges
- variation in load, unpredictability, hands-off
embedded operation - limited resources, concurrency intensive, power
constrained - self-organizing and adaptive
- More in common with each other than with the
average devices - New kinds of software system structures
- modular event-driven structures
- intrinsic feedback and control
48(No Transcript)
49Historical Perspective
- New eras of computing start when the previous era
is so strong it is hard to imagine that things
could be different - mainframe -gt mini
- mini -gt workstation -gt PC
- PC -gt ???
- It is often smaller than what came before.
- Most think of the new technology as just a toy
- The new dominant use was almost completely
absent. - it is likely to come from the extremes
50Mean Response Time(A)
- closed system, but limited bandwidth
51Threaded non-blocking disk-read service
52Example Disk-read Stage