Title: Overview Part 2
1Distributed Systems
- Lecture 2
- Overview (Part 2)
- 23. April, 2002
2Schedule of Today
- Some other international DS People
- German Researchers
- Two Approaches towards DS
- Goals and Challenges of DS
3DS People
Notion History
- Jean Bacon (Cambridge, UK)
- Opera group ? Systems group in CA , MSSA,
- IMP (interactive presentations support),
- Active Systems Asynchronous Middleware is
capable of rapid response to events and may be
used for many application areas, including - detection of mobile users or computers,
- response to faults in telecommunications
networks, - detection of illegal entry by
surveillance equipment, - detection of suspicious patterns of use
of bank, - credit or phone cards.
- The approach is to use Middleware IDL
(Interface Definition Language) to specify the
events that a given service is able to detect and
notify -
4Ranking of US CS Departments
5Cornell People
- Ken Birman
- Secure, reliable scalable DS
- ISIS (Toolkit ?commercial), Horus, Ensemble,
Springlass - Emin Gün Sirer
- Spin, Kimera, MagnetOS, CliqueNet
6Cornell people
- Fred B. Schneider
- Language Based Security
- Containment and Integrity for Mobile Code
- Cornell Online Certification Authority (COCA)
- Andrew Myers
- THOR, a Distributed OO-Data Base
- Jiv an extended version of Java protecting
privacy, ...
7Illinois People
- Roy Campell
- Active Spaces, 2K comp.-based OS
- Mobile Security, Cherubim, Seraphim,
- ..., many more
- Klara Nahrstedt
- Ad hoc networks, QoS networking,
- ...
8Active Spaces
92 K
10Illinois People
- M. Dennis Mikunas
- Mobile security, security architecture, active
spaces, network-centric operating system. - Daniel Reed
- Smart Environments
- Performance instrumentation and analysis
techniques for large scale parallel systems and
resource management policies.
11Smart Environments
- Intelligent Information Spaces
- Test bed to explore and evaluate intelligent
- devices and augmented realities
- Proposed work in ubiquitous information spaces
- spans three basic areas
- interoperable component architectures for device
coordination, - seamless object communication for user quality
of service (QoS), and - adaptive user context and modality management.
12UCLA People
- Leonard Kleinrock
- Inventor of the Internet, ANDS,
- SSN, Travler, WAMIS, SESAME
- ...
- Gerald F. Popek
- Panda, Ficus, Truffles, Travler
13Advanced Networking and Distributed Systems
(ANDS)
14UCLA People
- Peter Reiher
- Dsitributed Operating File Systems
- Majid Sarrafzadeh
- Embedded System Design
- Low-Power Computing
- Reconfigurable Computing
- VLSI CAD
- e-commerce
15Yale People
- Arvind Krishnamurthy
- Power aware File Systems for mobiles,
- Probabilistic Packet Scheduling
- Yang Richard Yang
- Network congestion, mobile wireless networks
- Network scurity
- Edmund Yeh
- Queuing theory, wireless systems,
- Data networks
16Washington People
- Thomas Anderson
- Detour Towards a Virtual Internet
- Portolano Invisible Computing
- Access Communication Computation
- for WAN and Systems Research
- WebOS OS support for WA applications
- NoW (Network of Workstations)
- Steven Gribble
- Ninja, DDS, TACC, Denali, Piazza
17WebOS
- WebOS provides basic operating systems services
needed to build - applications that are geographically distributed,
highly available, - incrementally scalable, and dynamically
reconfiguring. Our initial - implementation is split into the following
pieces - WebFS A global file system layer
- Active Names A mechanism for logically moving
service functionality - Secure Remote Execution
- ...
18Washington People
- Ed Lazowksa
- Quantitative System performance,
- Parallel Distributed Systems
- Hank M. Levy
- SMT Simultaneous Multithreading,
- Web Analysis, Piazza
- Porcupine, Opal, Etch, ...
- GMS (Global Memory System)
19Texas Austin
- Lorenzo Alvisi
- Lightweight Fault-Tolerance
- Cache Consistency in WANs
- WAFT Support for Fault-Tolerance
- in WA-OO-Systems
- Byzantine Fault-Tolerance for DSS
- Mike Dahlin
- Peer-to-peer study group
- Lab. for Advanced Systems Research (LASR)
- OS Support for a Program-Enabled Web
- C0PE Consistent 0-Administrator Personal
Environment
20Texas Austin
- Mohamed G. Gouda
- Programming Methodology,
- Concurrent and Distr. Computing,
- Fault-tolerant Computing, Secure Computing,
Network Protocols, - Formal Methods
- Jayadev Misra
- Parallel and distributed computing
- Proving distributed algorithms
21Wisconsin People
- Andrea C. Arpaci-Dusseau
- Gray Box System, WIND, NoW-Sort,
- Implicit Coscheduling
- Remzi H. Arpaci-Dusseau
- Storage Systems and I/O (WIND)Empirical
Analysis - Stoprage Management
22Wisconsin People
- Lawrence H. Landweber
- TheoryNet, CSNET,
- Mentor of Internet,
- First OSI protocol implementation, ...
-
- Marvin Solomon
- OO-database systems,
- Software development tools,
- Distributed OS,
- Computer networks,
23European Scene
- DS groups all over the continent
- UK, F, Ne, No, I, S, ...
- Have a look of your own
24 German DS People
- Sebastian Abeck
- Application Management
- Teleteaching
- Frank Bellosa (Erlangen)
- Power Management
- Components for distributed real-time systems
List is not yet complete
25German DS People
- Alejandro Buchmann (Darmstadt)
- TRUSTED (Testbed for Reliable, Ubiquitous,
Secure, Transactional, Event-driven and
Distributed Systems - Peter Druschel (Rice, Houston)
- Pastry/PAST Peer-to-peer systems
- ScalaServer System support for scalable network
servers
26German DS People
- Claudia Eckert (Darmstadt)
- Mobile Computing
- Security in DS
- GSFS (Group aware cryptographic
- file system)
- Kurt Geihs (TU Berlin)
- Middleware for future application scenarios
- QoS distributed Systems
27German DS People
- Hermann Härtig (TU Dresden)
- DROPS, L4Linux, (Verified)Fiasco,
- COMQUAD(Components with QUantitative properties
and ADaptivity) together with - Alexander Schill, Heinrich Hußmann, Klaus
Meissner (TU Hohenheim), Klaus Meyer-Wegener,
Andreas Pfitzmann - µSINA (Secure Internet
- Networking Architecture),
related to our research group
28German DS People
- Gerd Hegering (TU München)
- LEONET (Naturanaloge Lern- und
- Optimierungsvefahren für vernetzte Systeme
- Service Oriented Accounting Management
- Hans-Ulrich Heiss (TU Berlin)
- Cluster Computing
- Security
- Resource Management in DS
- DISCOURSE (4 Berlin Unis)
29German DS People
- Gernot Heiser
- OS, Embedded Systems, DS, SASOS, e.g. Mungi,
- IA 64-bit Linux, L4-ports on MIPS and Alpha,
Gelato, - Contributiobns to HW-Developments
- U4600, a 64-bit computer based on MIPS
R4600 processor, - used as a research and teaching platform,
and - PLEB, a computer about 10x7x1.5cm in size,
based on the StrongARM SA-1100 processor and
used for a number of projects including remote
data capture, robotics and research into
ubiquitous computing.
related to our research group
30German DS People
- Winfried Lamersdorf (Hamburg)
- Distributed Applications
- Open Distributed Software Architectures
- Friedemann Mattern (ETH Zürich)
- Ubiquitious Computing
- Middleware (MICO)
- Security Privacy
31German DS People
- Michael Merz (Hamburg)
- OSM
- COSMOS (Common Open
- Software Market for SMEs)
- Max Mühlhäuser (Darmstadt)
- uBiZ ( ubiquitous business, information, and
zest ) - uLearn
No photo available
32German DS People
- Jürgen Nehmer (Kaiserslautern)
- Ara, GeneSys, B10, Squirrel
- Mosquito, Panda, ....
33German DS People
- Hajo Plattner (Walldorf)
- R3
- Arno Puder (ATT Labs)
- Mico ( Open source CORBA
- implementation )
- COST ( Opensource testbed)
34German DS People
- Hajo Rothermel (Stuttgart)
- Mobile Computing
- Ditributed Multimedia
- Groupware workflow a
- Communication protocols
- Trusts security
- Arno Schill (Dresden)
- Standards
- QoS
- Mobile Computing
35German DS People
- Johann Schlichter (TU München)
- Community Online Services
- Agent based information management
- Wolfgang Schröder-Preikschat
- (Magdeburg?Erlangen)
- Pure ( Portable universal
- runtime executive)
- PEACE ( Process execution and
- communication environment)
36German DS People
- Peter Sturm (Trier)
- WWW-Caching
- Customized Software for large Systems
- Distributed Applications
- Martina Zitterbart
- Multicast Group Communication
- Mobile Comunication
- Distributed persistent objects, ....
No photo available
No photo available
37Impetus
Notion History
- The first driving force behind the trend towards
- distributed systems is economics.
- A. Tanenbaum
- Two different starting-points for distributed
systems - Distribute
- Connect
38Notion History
Distribution Problem
Suppose you have an expensive mainframe with an
OS and applications.
1. How to distribute these applications onto
cheaper PCs or WSs? 2. How to distribute
services of the centralized OS amongst the nodes?
39Notion History
Connection Problem
Suppose you have n specialized PCs and/or WSs
with different OSes or hosts spread all over the
world
1. How to connect these systems to get an
appropriate remote service? 2. How to support
this heterogeneity and how to
meet platform dependant formats?
Hope To end the tyranny of geography.
40Real Impetus
Notion History
- PCs have been the driving force to develop DS.
- Typical requirement of interconnected PCs (at
that time) - Data sharing (e.g., distributed file systems,
Web) - Device Sharing (expensive peripherals, color
printer) - Flexibility (workloads can be shifted to less
loaded - machines, e.g. rlogin)
- Communication (Email, etc.)
41Potential Goals
Goals Challenges
Potential benefits of a distributed system over a
centralized mainframe?
42Goals Challenges
Potential Goals
Potential benefits of a distributed system over
an isolated PC?
43Goals Challenges
Major Disadvantages?
44Problems and Challenges of DS
Goals Challenges
- Transparency
- Flexibility
- Reliability
- Performance
- Scalability
45Transparency
Goals Challenges
- A user wants to have a single image view when
working with a DS, - i.e. he/she does not have to be aware of where
objects are located, - or at what time its better to get a certain
service done in time etc.
- Transparency forms
- Location
- Migration
- Replication
- Failure
- Concurrency
- Parallelism
Resources can be established where they are needed
Resources can migrate without name change
Resources can be replicated as often as needed
Users are unaware of failures of individual
components
Users are unaware of sharing resources with others
Users are unaware of parallel execution of
activities
46More on Transparency in DS
Goals Challenges
- Access transparency (access onto a remote object
- similar to an access onto a local one)
- Location transparency
- location of processes (tasks)
- location of CPUs and other resources
- location of files
- mobile users
47More on Transparency in DS
Goals Challenges
- Parallelism transparence (user writes a serial
program, compiler and OS handle the rest, i.e.
how many threads on what nodes at what time etc.) - Scalability transparence (extending the system
- with new nodes at runtime)
- Fault transparency (losses of nodes not
noticeable) - Performance transparency
- (no node collapses due to a unbalanced load)
- Concurrency transparency
- (user should not care about other users)
48Replication
Goals Challenges
- Objectives
- Improve availability / Fault Tolerance
- Improve performance of queries
- (either throughput or latency, or better
both) - Lower cost
49Flexibility
Goals Challenges
- Flexibility is a natural characteristic of any
DS, - but you have to support, e.g. it should be easy
- to add new system components or to update old
ones. - Furthermore it may be attractive to install
- different versions of one system component
- (debuggers, compilers etc.)
- ?
- Client server architectures commonly used for DS
50Client Server Model
Goals Challenges
Client
Server
Kernel
Kernel
Interconnection medium
Remark Though communication is between client
and server, kernels and
communication layers of both nodes are involved
51Layout of a Request/Reply-Message
Goals Challenges
struct message nodeid_t source / system
supplied / nodeid_t dest / receiver
identity / int opcode / desired operation
/ int count / data size / int xyz /
specific / int result / for reply
/ char objectN_Name / name of target
object / char dataBUF_SIZE / data to
be transferred /
52Naming Problem
Goals Challenges
- Before a client can use any service he/she has to
locate that service. How to do? - Machine.server_process is sufficient to route a
request to the target, but . - - Server ID may have changed due to a restart
- - Server has migrated to another
machine - Global process IDs, but how to allocate unique
global process IDs? - Establishing a central service contradicts to
scalability - Use broadcasting to locate the target server,
but broadcasting eats up resources, - not a scalable solution, well get sequences of
messages of the type - where ? here ? request ? reply
- Introduce a specific name server, i.e. resulting
in the following sequence - lookup(service) ? reply(server) ?
request(server) ? reply(result)
53Reliability
Goals Challenges
DS provide higher availability by replication,
but in general more distributed components are
needed to perform a complex service.
? Potentially additional points of failure ?
system architect must increase reliability by
extra measures!
- Reliability requires
- Consistency
- Security
- Fault Tolerance
54Performance
Goals Challenges
Hard to achieve the outermost performance in a DS
because other requirements may conflict with
performance, in particular - Transparency -
Migration - Reliability
Remark DS should offer high performance per
unit cost
55Performance Measures
Goals Challenges
- Bandwidth (throughput) bits to be
transmitted/ time - (e.g. bandwidth of Ethernet 10 Mbps
- ? to transmit 1 bit it takes 0.1 µs)
- Latency (delay) time needed to transfer
- a packet from source to target
- (latency(transcontinental network) ? 10 ms,
i.e. from NY to LA) - Round-trip time time needed to transfer
acknowledge - Latency Propagation Transmit Queue, whereby
- Propagation gt Distance/Speed_of_Light
- Transmit Size/Bandwidth
SoL(vacuum) 3108 m/s SoL(cable) ? 2.3 108
m/s SoL(fiber) ? 2.0 108 m/s
56Application dominates Performance
Goals Challenges
Though bandwidth latency define the performance
of a link, their relative importance depends on
the application.
Example 1 Client sends 1-byte message to server
and receives a 1-byte answer. This application is
heavily latency bound, suppose no serious
computing is involved to give the answer Medium
1 Transcontinental channel with a 100 ms RTT
Medium 2 SAN-channel with a 1 ms RTT Whether
this channel has a bandwidth of 1 Mbps or 100
Mbps is irrelevant, because time to transmit one
byte is either 8 µs or 0.08 µs.
57Application dominates Performance
Goals Challenges
Though bandwidth latency define the performance
of a link, their relative importance depends on
the application.
Example 2 A digital library is asked to fetch a
25 MB image In this case bandwidth of the channel
is very important, whereas its negligible
whether its medium 1 or medium 2, i.e. 100
ms-RTT or a 1 ms RTT. The time to transmit this
image is about 20 s gt its irrelevant whether
its only 20.001 s or 20.1 s.
58Scalability
Goals Challenges
Performance must not degrade with a growing DS, ?
consequence for a system-architect
- Avoid any form of a centralized solution within a
DS, - i.e. each central resource might become a
bottleneck - Components (e.g. Single Server) or
- Tables (e.g. directories in DFS)
- Algorithms (e.g. deadlock detection)
594 Design Rules for Scalability
Goals Challenges
- Bottlenecks can be resources or communication
with them - Do not require any node within the DS
- to hold the complete system state
- Allow nodes to make decisions based on local
information - Design algorithms that survive failures of nodes
- Make no assumptions about a global clock (time)
60Preview
- Taxonomy of DS
- Motivation by Example
- Conceptual Problems
- Preliminaries of Hardware for DS
- Node Interfaces
- Network Interconnetions
- Network Topologies