Title: OptIPuter System Software
1OptIPuter System Software
- Andrew A. ChienComputer Science and Engineering,
UCSD - January 2004
- OptIPuter All-Hands Meeting
2System Software/Middleware Progress
- Significant Progress in Key Areas
- A unified Vision of Application Interface to the
OptIPuter Middleware - Distributed Virtual Computer
- Promise of Simpler Application Models, New
Capabilities - Innovation in protocols to exploit High Speed
Optical Networks - SRBUDP, XCP, GTP, SABUL/UDT
- High Performance for a wide range of network
structures - Real-time and Optical Networking
- Foundation for new Applications in this space
- Developing Cross-team Ties and Synergies
- Applications Teams (BIRN Tomas Molina SIO?)
Designate POCs for Middleware Team (Chien) - Application Performance Modeling
- Data Access Characterization and Filesystem
- Information about how Middleware will be
presented, made available - What middleware service capabilities are planned
- Optical Signaling and Network Management
- Optical Configuration Services (defined within
Y2) - Infrastructure (they will designate for access)
3Year 2 Focused Research Questions
- F1 How to we control lambdas and how do
protocols influence their utility? - F2 How is a LambdaGrid different from a Grid in
terms of middleware? - F3 How can lambdas enhance collaboration?
- F4 How are applications quantitatively helped by
LambdaGrids? - Explain how Y2 Activities Align
- Review Research Development/Implementation
Timeline - Infrastructure What Resources Exist today, Later
this year? - Image-based definition of whats on the
infrastructure
4Draft Year 2 Objectives
- UCSD Distributed Virtual Computer
Interface/Service for Applications (F2) - DVC resource description
- Tools which allocate resources and configure
optical networks - Interface to selected other OptIPUter
capabilities - UCI Real-time (F1, F2)
- Integrate the TMO middleware subsystem into
OptIPuter middleware (DVC) - Begin design of the TMO programming framework for
the OptIPuter - UCSD/UIC/SIO Storage
- Complete OptIPuter storage benchmarks based on
BIRN SIO applications - Base and scalable versions
- Analytical and Simulation Evaluation of Benefits
of Erasure Coding Approach - Identify Promising parts of design space, how
much can it help - UCI/UCSD Security (F2)
- Definition of DVCs security Models
- Protocols which build from traditional Grid
Security Infrastructures - TAMU Performance Modeling
- Measuring the performance of visualization
applications - Characterize computation, communication, and
storage access - UIC/USC-ISI/UCSD High Speed Protocols (F2, F1)
5Middleware Integration and Presentation
- Current practice C and C APIs
- E.g. Globus 2.4, MPI, sockets, etc.
- Applications Community Embracing Grid Services
- Optical Provisioning / Lightpath Setup Embracing
Grid Services - Need to use C/C whatever exists to make rapid
progress - Longer term interfaces based on Grid Services
Interface Description, Binding, Calling
Mechanisms (not necessarily full OGSA
implementations) - Presentation of many Lambda Grid Middleware
Services thru Grid Services - DVC Services, Lightpath Services
- Performance Services
- Real-time Configuration and Management Services
- Performance Critical and Non-client Server
Activities May Need Other Interfaces - High Performance Data Transport
- Streaming Communication
- LambdaRAM
- Optical Multicast
6Thinking about OptIPuter Testbeds
- What are the goals for the infrastructure?
- Run Application, Visualization, Collaboration,
Data Mining Experiments - Run System Software / Middleware Experiments
- Demonstrations
- Who needs to specify what goes into the testbed?
- Applications, Visualization,
- System Software / Middleware
- Infrastructure Team?
- Who is going to build/configure them? Do we need
a lead or primary testbed to drive integration? - Infrastructure Team?
- System Software/Middleware Team?
- Other projects gt 1 FTE minimum to coordinate,
and committed for each of the resource
participants - Who is going to use it?
- Everyone
- Leverage the Technical Infrastructure of other
Projects (NMI, Teragrid, etc.)
7A Model for Using an Experimental Infrastructure
- OptIPuter owned resources can be allocated and
configured at the lowest level - Machine image OS, Middleware, OptIPuter System
Software, Application Software - Model
- Build Experiment Image on your systems
- Allocate OptIPuter resources
- Image the systems (Rocks support)
- Run Experiment
- Release the resources
- Reimage back into a base configuration (Rocks
support) - Experiments across applications, system software,
protocols, etc. involve pairwise integration and
development of shared images - System Software, System Software and
Applications, other combinations - Concerns about hardware differences and the
ability to experiment with a single image
8OptIPuter Software Architecture for Distributed
Virtual Computers v1.1
OptIPuter Applications
Visualization
DVC 1
DVC 2
DVC 3
Layer 5 SABUL, RBUDP, Fast, GTP
Real-Time Objects
Security Models
Data Services DWTP
Higher Level Grid Services
Grid and Web Middleware (Globus/OGSA/WebServices
/J2EE)
Layer 4 XCP
Node Operating Systems
l-configuration, Net Management
Physical Resources
9Distributed Virtual Computers
- Nut Taesombut and Andrew Chien
- University of California, San Diego
10DVC Examples
SDSC
UCI or UIC
SIO/NCMIR
UCSD CSE
- Collaborative Visualization Cluster
- Grid Resources Photonic Multicast or LambdaRAM
(Leigh) - Virtual Cluster (Hide Complexity of Grid
Resource Flexibility) - Shared Single Domain (Spans Multiple)
- Private Connections Simple Network Naming
- Direct Access to Everything (Storage, Displays,
etc.) - Real-Time Virtual Cluster for Distributed
Collaborative Visualization - Grid Resources Real-Time (TMO)
11Distributed Virtual Computer (DVC)
- What is DVC?
- Simple computing environment for applications
(physically secure, local LAN/SAN) - Derived from rich Grid or resources and on-demand
Optical networks - Formed on-demand
- DVC Principles
- Separate Environment Configuration/Management
from Application programming - Shared DVC configuration across a range of
applications with similar security,
fault-tolerance, and performance requirements - Applications simplified by simple execution
environments - Key DVC abstractions
- Single Namespace, Security Domain, Simple
Communication Primitives and Resource Management
12DVC Core Services
- 1.0 Design of DVC Configuration Descriptions
- What resources
- What optical network configuration
- What special properties (high BW, real-time,
secure, etc.) - 1.0 Implementation of DVC Executor and Runtime
- Planner That Identifies Resources
- Selects from Virtual Grid Resources
- Negotiates with Resource Managers and Brokers
- Executor and Monitor for DVC
- Acquires and Configures (configures optical
network) - Monitors for Failures and Performance
- Adapts and Reconfigures
- Configuration and Management Services
- Abstractions for Communication, Security,
Real-time
13Year 2 Work on DVCs
- Design and Implementation of a DVC system
prototype - DVC abstractions and core services
- DVC Application Interfaces (DVC API or Service
Interface) - Resource Allocation Interfaces and Support
- Synergy of DVCs with Optiputer System Software
Technologies - Encapsulate Novel communication capabilities
- LambdaRAM, Optical Multicast
- Real-time DVC
- Understand synergy with TMO real-time middleware
- Optical network provisioning and management
- Future Work
- Encapsulate High-speed Transport Protocols
- RBUDP, SABUL/UDT, XCP, GTP
- Presenting Performance guarantees / quality of
service - Integrating Storage and Data Abstractions
- Novel Filesystems and Data Services
14OptIPuter Component Technologies
15Support for Real-Time Computing
OptIPuter System Software
- Kane Kim
- Professor, EECS, UCICo-Director, Networked
Systems Center, UCI - September 2003b
16Attractive Characteristics of OptIPuter
- Lambdas will be abundantly available
- A good number of them can be used in a dedicated
mode for a major application. - High-precision control of Network Communication
Latency may be possible. - gt An exciting prospect for real-time
(RT) distributed computing technologists !
17Some New-Generation RT Applications Benefiting
from Networks with Well Controlled Latency
- Multi-party multi-media based conferencing /
collaboration - Including on-line game playing
- Distributed orchestra ?
- Earthquake monitoring crisis management
- Bioscience applications
- Missile defense
18Vision -- RT Tightly Coupled Wide-Area
Distributed Computing
- Goals
- High-precision timings of critical actions
- Tight bounds on response times
- Ease of programming
- High-level prog
- Top-down design
- Ease of timing analysis
- Real-Time Object (TMO) network
Dynamically formed Meta Computer
19Year 1 Results
- Designed a Time-Triggered Message-Triggered
Object (TMO) support middleware subsystem model
that can be easily implemented on both Windows
and Linux platforms.
- Developed a global time based coordination
approach for use in fair and efficient
distributed on-line game systems and - a feasibility demo based on LANs and the TMO
support middleware - a step toward realizing an expanded demo in the
OptIPuter environment. - a paper will be presented at the IDPT 2003
conference, December 2003.
? ?
Compo-nents of a C object
TT Method 1
AAC
TT Method 2
AAC
??
Deadlines
Service Method 1
Service Method 2
??
- No thread, No priority
- High-level programming style
20Storage Research Activities
21Goals on Storage Research
- Develop New Techniques for High Performance File
System which Exploits Optical Networks - Meets Performance Needs of Applications
- Use Benchmarking Tools to Guide Development of
Optiputer Data Storage System - Approach
- Analyze Current Application Usage
- Study SIO and BIRN Data Visualization Programs
- Extrapolate to Large-scale Distributed Deployment
- Design Optiputer Storage Benchmarks
- What are the Access Patterns
- What are the Critical Performance Concerns
- What Does the Optiputer Environment Enable
- Design New Techniques for OptIPuter Storage
System - Novel Coding and Speculation
22Storage Progress Understanding Collaborative Viz
Architecture
23Storage Progress (cont)
- High Performance File System Survey
- Study existing parallel/distributed file systems
- GPFS, Lustre, PVFS, Galley, DASSF, Vesta, Armada,
FAB, MPIO, Frangipani, Zebra, etc. - No existing system meets needs of OptIPuter
environment! - Approach Coding for Performance Robustness
- Novel data coding (distribution and replication)
for file segments - Erasure codes enable reconstruction with K of N
parts - Benefits
- Redundancy and Parallelism to Decrease access
time (unloaded) or to reduce impact of late
arrivers for segment (loaded, failure) - Challenges
- Cost of coding, parallelism, and reconstruction
- Example
- Reed-Solomon, Turbo, Viterbi, Tornado, LT,
LuigiRizzo, Online, etc. - Best performance 138MBps encoding and 154MBps
decoding (LuigiRizzo)
24Performance Modeling
25Prophesy Application Performance Modeling
- Performance modeling of applications on OptIPuter
- Goal Cross platform comparison (vs. traditional
grid parallel) - Progress since Sept. 2003
- Completed work with isocoupling (IPDPS 2004)
- Reuse of coupling values
- Better understanding of relationship between
kernels in an application - Now focused on visualization applications
Source Taylor, TAMU
26High Speed Protocols
27High Performance Transport Problem
- OptIPuter is Bridging the Gap Between High Speed
Link Technologies and Growing Demands of Advanced
Applications - Transport Protocols Are the Weak Link
- TCP Has Well-Documented Problems That Militate
Against its Achieving High Speeds - Slow Start Probing Algorithm
- Congestion Avoidance Algorithm
- Flow Control Algorithm
- Operating System Considerations
- Friendliness and Fairness Among Multiple
Connections - These Problems Are the Foci of Much Ongoing Work
- OptIPuter is Pursuing Four Complementary Avenues
of Investigation - RBUDP Addresses Problems of Bulk Data Transfer
- SABUL Addresses Problems of High Speed Reliable
Communication - GTP Addresses Problems of Multiparty
Communication - XCP Addresses Problems of General Purpose,
Reliable Communication
28OptIPuter Transport Protocol Roles
E2e Path
Allocated Lambda
Routed
Managed Group
Enhanced Routers
Standard Routers
Unicast
RBUDP
GTP
SABUL
XCP
29Year 2 EVL OptIPuter Networking ResearchChaoyue
Xiong, Eric He, Jason Leigh
- Continuing protocol development to create a
streaming version of Reliable Blast UDP for
Quanta. - Experimentation with Photonic Multicast
30Streaming Reliable Blast UDP for the Quanta
Networking Toolkit
- Motivation
- Many high speed transport protocols work best for
large payloads (e.g. hundreds of megabytes to
gigabytes.) - SRBUDP is designed for gigabit streaming
applications (like graphics) where payloads are
small but potentially un-ending. - Methodology
- Early loss recovery to guarantee data integrity
and to reduce latency - Rate control to ensure balance between sender and
receiver - Prediction of the cause of loss (ie caused by
network or caused by receiver) to choose best
strategy to resolve loss - Status
- NS2 simulation just completed.
- Results are promising but must be validated over
real network. - For details contact Chaoyue Xiong
(cxiong_at_evl.uic.edu) - Next 6 month goal
- Develop working prototype to test over high speed
links with a real streaming graphics application-
ie TeraVision - TeraVision is a hardware system for capturing
high resolution graphics and streaming it over
the network.
31Photonic Multicasting
- Motivation
- For collaborative work involving
ultra-high-resolution displays it is necessary to
multicast gigabits of graphics to all
collaborating sites. - Methodology
- New Glimmerglass Reflexion switch provides 14
photonic multicasting- ie take an input signal
and optically split it into 4. (no routers
involved). - Send packets via UDP or multicast to switch and
have multiple receivers receive the same packet.
(Turns out UDP does not work- must use multicast
protocol because UDP uses ARP to map to 1 MAC
address). - Good theoretical stuff in papers- but never
actually attempted in practice. - Status
- Demonstrated working prototype at SC2003, able to
stream a single TeraVision graphics stream to
multiple local area endpoints connected to the GG
using this capability - Photonic Data Controller developed to be able to
control both the GG and Calient switches
(including GGs photonic multicast capability). - For details contact Eric He (eric_at_evl.uic.edu)
- Next 6 month Goal
- Understand issues involved in achieving wide area
photonic multicasting.
32Group Transport Protocol (GTP)
- Ryan Wu and Andrew A. Chien
- Computer Science and Engineering
- University of California, San Diego
- OptIPuter All Hands Meeting, January 2004
33Group Transport Protocol (GTP)
- Objective Develop High Performance Multipoint
Transport Protocols - In OptIPuter Network Environment (No/Little
Internal Network Contention) - Why Grouping?
- Intragroup Management of Congestion Over Multiple
Flows - Support Data Fetching from Multiple Senders
Concurrently - Multi-Flow Scheduling Makes a Clean Transition
when Flows Join or Leave - Achieves Fairness Among Flows
(a) Shared IP connection senders connect with
receiver via shared links and intermediate nodes.
(b) Dedicated lambda connection dedicated
capacity between sender/receiver pair.
34Group Transport Protocol (GTP)
- Features of GTP
- - Receiver based reliable response-request
transmission model - - Fast start, to deliver bandwidth quickly to
the application - - Receiver/sender capability estimation and
quick rate adaptation to changes - - Multi-flow scheduling at receiver, achieving
Max-min fairness among flows
GTP Framework (Receiver)
35Group Transport Protocol (GTP)
- Simulation results
- Smooth bandwidth transfers between senders
- GTP matches other UDP rate-based transport
protocols (e.g. RBUDP, SABUL) for single flows - GTP outperforms for converging flows (from
multiple senders to one receiver) higher
throughput and lower loss
Comparison between TCP and GTP (ideal case
produced by ns-2) Two Senders and One Receiver,
Flow 2 Starts at Time6s
36Group Transport Protocol (GTP)
- DummyNet experiments show clean transitions
- Smooth bandwidth handoffs
- Very high network utilizations
- TeraGrid experiments show promising performance
See CCGrid2004 Paper for details
37XCP/OptIPuter Project Status
- Aaron Falk
- USC Information Sciences Institute
- January 12, 2004
38XCP Accomplishments
- Built a simple, high-performance testbed for XCP
performance testing - The challenge has been developing a configuration
where an end-system and PC router could fill a
Gigabit Ethernet link with a 100 ms RTT with a
single TCP or XCP flow. The current
configuration uses dual-PCI bus (for the routers)
and 2.8GHz processors. - The sending rate for both TCP and XCP maxes out
at around 180Mbps with 100 CPU utilization on
the sender. We believe this is caused by the CPU
cycles needed to traverse the TCP mbuf chain. - Generating results
- Early results indicate the start-up behavior of
our XCP implementation matches simulation
performance (at a macro level), meaning it
performs better than TCP - An abstract has been accepted for presentation at
PFLDnet2004 (Protocols for Fast, Long-Distance
Networks). - Implementing XCP
- We have developed BSD kernel code implementing
XCP for an XCP sender, receiver and router.
Currently we are working on debugging. (Kernel
level debugging is particularly laborious.)
39High Performance Network and Data Services for
Data Mining, Data Integration, and Data
Exploration
- Robert L Grossman, Yunhong Gu, Xinwei Hong,
David HanleyNational Center for Data
MiningUniversity of Illinois at Chicago
40Summary of Work June, 2003 December, 2003
41SABUL/UDT Protocol Overview
- Uses both Rate Control (RC) and (window based)
Flow Control (FC) - Constant RC interval to remove RTT bias
- Employs bandwidth estimation
- Selective acknowledgement (ACK)
- Reduces control traffic results in faster
recovery - Uses packet delay as well as packet loss to
indicate congestion - Slow start controlled by FC
- Can be layered over optical paths or used by
applications in routed networks
42SABUL/UDT is Fast
Chicago-Amsterdam with 110 RTT
- More than 950Mb/s on 1Gb/s link
- 6.8Gb/s on 10Gb/s link with multiple connections
(11/03) - 5.4Gb/s on Itanium system with 10G NIC (11/03)
- Overall 11.8 Gb/s on routed and 10G lambdas
(11/03 at SC03)
43SABUL/UDT is basis for a Variety of Data
Services More to Come