Building Network-Centric Systems Liviu Iftode - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Building Network-Centric Systems Liviu Iftode

Description:

Occasional TCP/IP networking with low expectations and mostly non-interactive traffic ... A software offloading architecture using existing hardware ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 64
Provided by: liviui
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Building Network-Centric Systems Liviu Iftode


1
Building Network-Centric SystemsLiviu Iftode
2
Before WWW, people were happy...
E-mail, Telnet
TCP/IP
Emacs
NFS
CS.umd.EDU
CS.rutgers.EDU
TCP/IP
  • Mostly local computing
  • Occasional TCP/IP networking with low
    expectations and mostly non-interactive traffic
  • local area networks file server (NFS)
  • wide area networks -Internet- E-mail, Telnet,
    Ftp
  • Networking was not a major concern for the OS

3
One Exception Cluster Computing
Multicomputers
Clusters of computers
  • Cost-effective solution for high-performance
    distributed computing
  • TCP/IP networking was the headache
  • large software overheads
  • Software DSM not a network-centric system -(

4
The Great WWW Challenge
Web Browsing
http//www.Bank.com
TCP/IP
Bank.com
  • World Wide Web made access over the Internet easy
  • Internet became commercial
  • Dramatic increase of interactive traffic
  • WWW networking creates a network-centric system
    Internet server
  • performance service more network clients
  • availability be accessible all the time over the
    network
  • security protect resources against network
    attacks

5
Network-Centric Systems
  • Networking dominates the operating system
  • Mobile Systems
  • mobility aware TCP/IP (Mobile IP, I-TCP, etc),
    disconnected file systems (Coda),
    adaptation-aware applications for
    mobility(Odyssey), etc
  • Internet Servers
  • resource allocation (Lazy Receive Processing,
    Resource Containers), OS shortcuts (Scout,
    IO-Lite), etc
  • Pervasive/Ubiquitous Systems
  • Tiny OS , sensor networks (Directed Diffusion,
    etc), programmability (One World, etc)
  • Storage Networking
  • network-attached storage (NASD, etc),
    peer-to-peer systems (Oceanstore, etc), secure
    file systems (SFS, Farsite), etc

6
Big Picture
  • Research sparked by various OS-Networking
    tensions
  • Shift of focus from Performance to Availability
    and Manageability
  • Networking and Storage I/O Convergence
  • Server-based and serverless systems
  • TCP/IP and non-TCP/IP protocols
  • Local area, wide-area, ad-hoc and
    application/overlay networks
  • Significant interest from industry

7
Outline
  • TCP Servers
  • Migratory-TCP and Service Continuations
  • Cooperative Computing, Smart Messages and Spatial
    Programming
  • Federated File Systems
  • Talk Highlights and Conclusions

8
Problem 1 TCP/IP is too Expensive
Breakdown of the CPU time for Apache
(uniprocessor based Web-server)
9
Traditional Send/Receive Communication
App
OS
App
OS
NIC
NIC
send(a)
copy(a,send_buf)
DMA(send_buf,NIC)
send_buf is transferred
interrupt
DMA(NIC,recv_buf)
copy(recv_buf,b)
receive(b)
sender
receiver
10
A Closer Look
11
Multiprocessor Server Performance Does not
Scale
  • 700

Dual Processor
  • 600

Uniprocessor
  • 500
  • Throughput (requests/s)
  • 400
  • 300
  • 200
  • 100
  • 0
  • 300
  • 350
  • 400
  • 450
  • 500
  • 550
  • 600
  • 650
  • 700
  • 750
  • Offered load (connections/s)

Apache Web server 1.3.20 on 1 Way and 2 Way
300MHz Pentium II SMP with repeatedly accessing
a static16 KB file
12
TCP/IP-Application Co-Habitation
  • TCP/IP steals compute cycles and memory from
    applications
  • TCP/IP executes in kernel-mode mode switching
    overhead
  • TCP/IP executes asynchronously
  • interrupt processing overhead
  • internal synchronization on multiprocessor
    servers causes execution serialization
  • Cache pollution
  • Hidden Service-work
  • TCP packet retransmission
  • TCP ACK processing
  • ARP request service
  • Extreme cases can compromise server performance
  • Receive livelocks
  • Denial-of-service (DoS) attacks

13
Two Solutions
  • Replace TCP/IP with a lightweight transport
    protocol
  • Offload some/all of the TCP from host to a
    dedicated computing unit (processor, computer or
    intelligent network interface)
  • Industry high-performance, expensive solutions
  • Memory-to-Memory Communication InfiniBand
  • Intelligent network interface TCP Offloading
    Engine(TOE)
  • Cost-effective and flexible solutions TCP Servers

14
Memory-to-Memory(M-M) Communication
Sender
Receiver
Send
Receive
Application
TCP/IP
OS
Network Interface (NIC)
Memory Buffer
Remote DMA
M-M
OS
OS
NIC
NIC
15
Memory-to-Memory Communication is Non-Intrusive
App
App
NIC
NIC
RDMA_Write(a,b)
a transferred into b
b is updated
Sender low overhead
Receiver zero overhead
16
TCP Server at a Glance
  • A software offloading architecture using existing
    hardware
  • Basic idea Dedicate one or more computing units
    exclusively for TCP/IP
  • Compared to TOE
  • track technology better latest processors
  • flexible adapt to changing load conditions
  • cost-effective no extra hardware
  • Isolate application computation from network
    processing
  • Eliminate network interrupts and context switches
  • Efficient resource allocation
  • Additional performance gains (zero-copy) with
    extended socket API
  • Related work
  • Very preliminary offloading solutions Piglet,
    CSP
  • Socket Direct Protocol, Zero-copy TCP

17
Two TCP Server Architectures
  • TCP Servers for Multiprocessor Servers

TCP-Server
Server Appl
TCP/IP
CPU
CPU
Shared Memory
  • TCP Servers for Cluster-based Servers

TCP/IP
M-M
TCP-Server
Server Appl
18
Where to Split TCP/IP Processing? (How much to
offload?)
APPLICATION
Application Processors
SYSTEM CALLS
SEND copy_from_application_buffers TCP_send IP_
send packet_scheduler setup_DMA packet_out
RECEIVE copy_to_application_buffers TCP_receive
IP_receive software_interrupt_handler interrupt
_handler packet_in
TCP Servers
19
Evaluation Testbed
  • Multiprocessor Server
  • 4-Way 550MHz Intel Pentium II system running
    Apache 1.3.20 web server on Linux 2.4.9
  • NIC 3-Com 996-BT Gigabit Ethernet
  • Used sclients as a client program Banga 97

20
Comparative Throughput
Clients issue file requests according to a web
server trace
21
Adaptive TCP Servers
  • Static TCP Server configuration
  • Too few TCP Servers can lead to network
    processing becoming the bottleneck
  • Too many TCP Servers lead to degradation in
    performance of CPU intensive applications
  • Dynamic TCP Server configuration
  • Monitor the TCP Server queue lengths and system
    load
  • Dynamically add or remove TCP Server processors

22
Next Target The Storage Networking
  • Storage Networking dilemma

TCP or not TCP?
M-M Communication (InfiniBand)
TCP Offloading
iSCSI (SCSI over IP)
DAFS (Direct Access File Systems)
  • non-TCP/IP solutions require new wiring or
    tunneling over IP-based Ethernet networks
  • TCP/IP solutions require TCP offloading

23
Future Work TCP Servers iSCSI
TCP-Server iSCSI
Server Appl
SCSI Storage
iSCSI
CPU
CPU
TCP/IP
Shared Memory
  • Use TCP-Servers to connect to SCSI storage using
    iSCSI protocol over TCP/IP networks

24
Problem 2 TCP/IP is too Rigid
  • Server vs. Service Availability
  • client interested in Service availability
  • Adverse conditions may affect service
    availability
  • internetwork congestion or failure
  • servers overloaded, failed or under DoS attack
  • TCP has one response
  • network delays gt packet loss gt retransmission
  • TCP limits the OS solutions for service
    availability
  • early binding of service to a server
  • client cannot switch to another server for
    sustained service after the connection is
    established

25
Service Availability through Migration
Server 1
Client
Server 2
26
Migratory TCP at a Glance
  • Migratory TCP migrates live connections among
    cooperative servers
  • Migration mechanism is generic (not application
    specific) lightweight (fine-grained migration)
    and low-latency
  • Migration triggered by client or server
  • Servers can be geographically distributed
    (different IP addresses)
  • Requires changes to the server application
  • Totally transparent to the client application
  • Interoperates with existing TCP
  • Migration policies decoupled from migration
    mechanism

27
Basic Idea Fine-Grained State Migration
Server1 Process
Application state
Connection state
C2
Client
C1 C2 C3 C4
C5 C6
Server2 Process
28
Migratory-TCP (Lazy) Protocol
Server 1
Connect (0)
Client
lt State Replygt (3)
lt State Requestgt (2)
C
Migration Request (1)
Migration Accept(4)
Server 2
29
Non-Intrusive Migration
  • Migrate state without involving old-server
    application (only old server OS)
  • Old server exports per-connection state
    periodically
  • Connection state and Application state can go out
    of sync
  • Upon migration, new server imports the last
    exported state of the migrated connection
  • OS uses connection state to synchronize with
    application
  • Non-intrusive migration with M-M communication
  • uses RDMA read to extract state from the old
    server with zero-overhead
  • works even when the old server is overloaded or
    frozen

30
Service Continuation (SC)
Connection state
31
Related Work
  • Process migration Sprite Douglis 91, Locus
    Walker 83, MOSIX Barak 98, etc.
  • VM migration Rosemblum 02, Nieh 02
  • Migration in web server clusters Snoeren 00,
    Luo 01
  • Fault-tolerant TCP Alvisi 00
  • TCP extensions for host mobility I-TCP Bakre
    95, Snoop TCP Balakrishnan 95, end-to-end
    approaches Snoeren 00, Msocks Maltz 98
  • SCTP (RFC 2960)

32
Evaluation
  • Implemented SC and M-TCP in FreeBSD kernel
  • Integrated SC in real Internet servers
  • web, media streaming, transactional DB
  • Microbenchmark
  • impact of migration on client perceived
    throughput for a two-process server using TTCP
  • Real applications
  • sustain web server throughput under load produced
    by increasing the number of client connections

33
Impact of Migration on Throughput
34
Web Server Throughput
35
Future Research Use SC to Build Self-Healing
Cluster-based Systems
36
Problem 3 Computer Systems move Outdoors
Linux Car
Sensors
Linux Camera
Linux Watch
  • Massive numbers of computers will be embedded
    everywhere in the physical world
  • Dynamic ad-hoc networking
  • How to execute user-defined applications over
    these networks?

37
Outdoor Distributed Computing
  • Traditional distributed computing has been indoor
  • Target performance and/or fault tolerance
  • Stable configuration, robust networking (TCP/IP
    or M-M)
  • Relatively small scale
  • Functionally equivalent nodes
  • Message passing or shared memory programming
  • Outdoor Distributed Computing
  • Target Collect/Disseminate distributed data
    and/or perform collective tasks
  • Volatile nodes and links
  • Node equivalence determined by their physical
    properties (content-based naming)
  • Data migration is not good
  • expensive to perform end-to-end transfer control
  • too rigid for such a dynamic network

38
Cooperative Computing at a Glance
  • Distributed computing with execution migration
  • Smart Message carries the execution state (and
    possibly the code) in addition to the payload
  • execution state assumed to be small (explicit
    migration)
  • code usually cached (few applications)
  • Nodes cooperate by allowing Smart Messages
  • to execute on them
  • to use their memory to store persistent data
    (tags)
  • Nodes do not provide routing
  • Smart Message executes on each node of its path
  • Application executed on target nodes (nodes of
    interest)
  • Routing executed on each node of the path
    (self-routing)
  • During its lifetime, an application generates at
    least one, possibly multiple, smart messages

39
Smart vs. Dumb Messages
Marys lunch Appetizer Entree Dessert
Data migration
40
Smart Messages
Hot
Hot
Hot
41
Cooperative Node Architecure
Virtual Machine
SM Arrival
SM Migration
Admission Manager
Scheduling
Tag Space
OS I/O
  • Admission control for resource security
  • Non-preemptive scheduling with timeout-kill
  • Tags created by SMs (limited lifetime) or I/O
    tags (permanent)
  • global tag name space hash(SM code), tag name
  • five protection domains defined using hash(SM
    code), SM source node ID, and SM starting time.

42
Related Work
  • Mobile agents (DAgents, Ajanta)
  • Active networks (ANTS, SNAP)
  • Sensor networks (Diffusion, TinyOS, TAG)
  • Pervasive computing (One.world)

43
Prototype Implementation
  • 8 HP iPAQs running Linux
  • 802.11 wireless communication
  • Sun Java K Virtual Machine
  • Geographic (simplified GPSR) and On-Demand (AODV)
    routing

user node
intermediate node
node of interest
Routing algorithm
Code not cached (ms)
Code cached (ms)
Geographic (GPSR)
415.6
126.6
On-demand (AODV)
506.6
314.7
Completion Time
44
Self-Routing
  • There is no best routing outdoors
  • Depends on application and node property dynamics
  • Application-controlled routing
  • Possible with Smart Messages (execution state
    carried in the message)
  • When migration times out, the application is
    upcalled on the current node to decide what to do
    next

45
Self-Routing Effectiveness (simulation)
  • geographical routing to reach target regions
  • on-demand routing within region
  • application decides when to switch between the
    two

starting node
node of interest
other node
46
Next Target Spatial Programming
  • Smart Message too low-level programming
  • How to describe distributed computing over
    dynamic outdoor networks of embedded systems
    with limited knowledge about resource number,
    location, etc
  • Spatial Programming (SP) design guidelines
  • space is a first-order programming concept
  • resources named by their expected location and
    properties (spatial reference)
  • reference consistency spatial reference-to-
    resource mappings are consistent throughout the
    program
  • program must tolerate resource dynamics
  • SP can be implemented using Smart Messages (the
    spatial reference mapping table carried as
    payload)

47
Spatial Programming Example
Mobile sprinklers with temperature sensors
Right Hill
Left Hill
Hot spot
  • Program sprinklers to water the hottest spot of
    the Left Hill

48
Problem 4 Manageable Distributed File
Systems
  • Most distributed file servers use TCP/IP both for
    client-server and intra-server communication
  • Strong file consistency, file locking and load
    balancing difficult to provide
  • File servers require significant human effort to
    manage add storage, move directories, etc
  • Cluster-based file servers are cost-effective
  • Scalable performance requires load balancing
  • Load balancing may require file migration
  • File migration limited if file naming is
    location-dependent
  • We need a scalable, location-independent and easy
    to manage cluster-based distributed file system

49
Federated File System at a Glance
A2
A2
A3
A3
A3
A1
FedFS
FedFS
Local FS
Local FS
Local FS
Local FS
M-M Interconnect
  • Global file name space over cluster of autonomous
    local file systems interconnected by a M-M network

50
Location Independent Global File Naming
  • Virtual Directory (VD) union of local
    directories
  • volatile, created on demand (dirmerge)
  • contains information about files including
    location (homes of files)
  • assigned dynamically to nodes (managers)
  • supports location independent file naming and
    file migration
  • Directory Tables (DT) local caches of VD
    entries (TLB)

usr
virtual directory
file1
file2
local directories
usr
usr
file1
file2
Local file system 1
Local file system 2
51
Direct Access File System (DAFS)
52
Federated DAFS
Distributed NFS over FedFS
NFS Server
FedFS
Local FS
M-M
TCP/IP

TCP/IP
Application
NFS Client
TCP/IP
TCP/IP
M-M
53
Related Work
  • Cluster-based File Systems
  • FrangipaniThekkath97, PVFS Carns00,GFS,
    Archipelago JI00, Trapeze (Duke)
  • DAFS NetApp03,Magoutis01,02,03
  • User-level communication in cluster-based network
    servers Carrera02

54
Experimental Platform
  • Eight node server cluster
  • 800 MHz PIII, 512 MB SDRAM, 9 GB 10K RPM SCSI
  • Client
  • Dual processor (300 MHz PII), 512 MB SDRAM
  • Linux-2.4
  • Servers and Clients equipped with Emulex cLAN
    adapter (M-M network)

55
Workload I
  • Postmark Synthetic benchmark
  • Short-lived small files
  • Mix of metadata-intensive operations
  • Postmark outline
  • Create a pool of files
  • Perform transactions READ/WRITE paired with
    CREATE/DELETE
  • Delete created files
  • Each Postmark client performs 30,000 transactions
  • Clients distribute requests to servers using a
    hash function on pathnames
  • Files are physically placed on the node which
    receives client requests

56
Postmark Throughput
  • 30000
  • File size 2K
  • File size 4K
  • File size 8K
  • 25000
  • File size 16K
  • 20000
  • Postmark Throughput (txns/sec)
  • 15000
  • 10000
  • 5000
  • 0
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • Number of Servers

57
Workload II
  • Postmark performs only READ transactions
  • No create/delete operations
  • Federated DAFS does not control file placement
  • No client request sent to files correct location

58
Postmark Read Throughput
  • 60000
  • PostmarkRead
  • PostmarkRead - NoCache
  • 50000
  • 40000
  • 30000
  • Postmark Read Throughput (txns/sec)
  • 20000
  • 10000
  • 0
  • 2
  • 4
  • Number of Servers

59
Next Target Federated DAFS over the Internet
DAFS Server
FedFS
Local FS
M-M
TCP/IP
DAFS Server
Application
Internet
DAFS Client
FedFS
Local FS
M-M
M-M
Application
DAFS Client
M-M
DAFS Server
FedFS
Local FS
M-M
60
Outline
  • TCP Servers
  • Migratory-TCP and Service Continuations
  • Cooperative Computing, Smart Messages and Spatial
    Programming
  • Federated File Systems
  • Talk Highlights and Conclusions

61
Talk Highlights
  • Back to Migration
  • Service Continuation service availability and
    self-healing clusters
  • Smart Messages programming dynamic networks of
    embedded systems
  • Exploit Non-Intrusive M-M Communication
  • TCP offloading
  • State migration
  • Federated file systems
  • Network and Storage I/O Convergence
  • TCP Servers iSCSI
  • Federated File Systems M-M
  • Programmability
  • Smart Messages and Spatial Programming
  • Extended Server API Service Continuation, TCP
    Servers, Federated file system

62
Conclusions
  • Network-Centric Systems very promising
    border-crossing systems research area
  • Common issues for a large spectrum of systems and
    networks
  • Tremendous potential to impact industry

63
Aknowledgements
  • UMD students Andrzej Kochut, Chunyuan Liao,
    Tamer Nadeem, Iulian Neamtiu and Jihwang Yeo.
  • Rutgers students Ashok Arumugam, Kalpana
    Banerjee, Aniruddha Bohra, Cristian Borcea,
    Suresh Gopalakrisnan, Deepa Iyer, Porlin Kang,
    Vivek Pathak, Murali Rangarajan, Rabita Sarker,
    Akhilesh Saxena, Steve Smaldone, Kiran
    Srinivasan, Florin Sultan and Gang Xu.
  • Post-doc Chalermek Intanagonwiwat
  • Collaborations at Rutgers EEL (Ulrich Kremer),
    DARK (Ricardo Bianchini), PANIC (Rich Martin and
    Thu Nguyen)
  • Support NSF ITR ANI-0121416 and CAREER
    CCR-013366
Write a Comment
User Comments (0)
About PowerShow.com