Title: CS 34701: Large-Scale Networked Systems
1CS 34701 Large-Scale Networked Systems
- Professor Ian Foster
- TA Adriana Iamnitchi
- http//dsl.cs.uchicago.edu/Courses/cs347-2002/
2CS 34701 Course Goals
- Primary
- Gain deep understanding of fundamental issues
that effect design of large-scale networked
systems - Map primary contemporary research themes
- Gain experience in network research
- Secondary
- By studying a set of outstanding papers, build
knowledge of how to present research - Learn how to read papers evaluate ideas
3How the Class Works
- Research papers
- Prior to each class, we all read and evaluate two
research papers - During each class, we discuss those papers
- Project
- One-page project description by 2nd week
- Five-page project summary by 5th week
- 10-20 final paper by 9th week
- Project presentations 9th and 10th weeks.
4Paper Review Discussion
- Everyone reads two papers per class and submits
an evaluation (see below) - We discuss (not present) papers in class
- A team of 2-3 leads each discussion
- The leading team submits discussion plan before
class, submits master critique and summarizes
discussion at the beginning of following class - Look over schedule between now Friday, when we
will allocate discussants
5Evaluations
- You must submit evaluations of papers
- Email them by 6pm the day before
- Answer a set of standard questions
- State the main contribution of the paper
- Critique the main contribution
- What are the three strongest and/or most
interesting ideas in the paper? - Three most striking weaknesses in the paper?
- Three questions to ask the authors?
- Detail an interesting extension to the work not
mentioned in the future work section. - Optional comments on the paper that youd like to
see discussed in class.
6What Ill Assume You Know
- Basic Internet architecture
- IP, TCP, DNS, HTTP
- Basic principles of distributed computing
- Asynchrony (cannot distinguish between
communication failures and latency) - Partial global state knowledge (cannot know
everything correctly) - Failures happen. In very large systems, even rare
failures happen often - If there are things that dont make sense, ask!
7Large-Scale Networked Systems
- Internet-connected networks with a large number
of components, spanning multiple DNS domains
(usually WAN) - Designed to solve specific problems
- Content distribution
- Cycle sharing
- File sharing
- Sensor data fusion
- Distributed data analysis
8Example Gnutella
- Peer-to-peer file sharing system
- File sharing goal is to enable publication and
access to files - P2P no central servers all clients also act as
servers and are equivalent (more or less) - Issues
- Scaling to very large numbers of nodes
- Properties bootstrapping, reliability, cost,
anonymity, security, freeloading,
9Gnutella Protocol Overview
- P2P file sharing application on top of an overlay
network - Nodes maintain open TCP connections.
- Messages are broadcasted (flooded) or
back-propagated. - Protocol
Broadcast (Flooding) Back-propagated Node to node
Membership PING PONG
Query QUERY QUERY HIT
File download GET, PUSH
10Gnutella search mechanism
- Steps
- Node 2 initiates search for file A
7
1
4
2
6
3
5
11Gnutella search mechanism
- Steps
- Node 2 initiates search for file A
- Sends message to all neighbors
7
1
4
2
6
3
5
12Gnutella search mechanism
- Steps
- Node 2 initiates search for file A
- Sends message to all neighbors
- Neighbors forward message
7
1
4
2
6
3
5
13Gnutella search mechanism
- Steps
- Node 2 initiates search for A
- Sends message to all neighbors
- Neighbors forward message
- Nodes that have file A initiate a reply message
7
1
4
2
6
3
5
14Gnutella search mechanism
- Steps
- Node 2 initiates search for A
- Sends message to all neighbors
- Neighbors forward message
- Nodes that have file A initiate a reply message
- Query reply message is back-propagated
7
1
4
2
6
3
5
15Gnutella search mechanism
- Steps
- Node 2 initiates search for A
- Sends message to all neighbors
- Neighbors forward message
- Nodes that have file A initiate a reply message
- Query reply message is back-propagated
- Node 2 gets replies
7
1
4
2
6
3
5
16Gnutella search mechanism
- Steps
- Node 2 initiates search for A
- Sends message to all neighbors
- Neighbors forward message
- Nodes that have file A initiate a reply message
- Query reply message is back-propagated
- Node 2 gets replies
- File download
download A
7
1
4
2
6
3
5
17Tools for network exploration
- Eavesdropper - modified node inserted into the
network to log traffic. - Crawler - connects to all active nodes and uses
the membership protocol to discover graph
topology. - Parallel crawling.
- Graph analysis tools
- high-volume offline
- computations.
18Network growth
- High user interest
- Users tolerate high latency, low quality results.
- Better resources
- DSL and cable modem nodes grew from 24 to 41
over 6 months.
- Open architecture / open-source environment
- Competing implementations,
- Lower overhead network traffic, improved
- resource utilization, better structure,
- Recently, two-level structure.
19Growth invariants
- Graph connectivity 3.4 links per node on
average. - Path length distribution node-to-node distance
maintains similar distributions.
- Avg. node-to-node distance grew 25 while the
network grew 50 times over 6 months. - Random graph theory predicts about 75 increase.
20Is Gnutella a power-law network?
Power-law networks the number of nodes N with
exactly L links is proportional to L-k N L-k
- Examples
- The Internet,
- In/out links to/from
- HTML pages,
- Citations network,
- US power grid,
- Social networks.
November 2000
Implication High tolerance to random node
failure but low reliability when facing an
intelligent adversary
21Is Gnutella a power-law network?
- Later, larger networks display a bimodal
distribution. - Implications
- High tolerance to random node failures preserved
- Increased reliability
- when facing an
- attack.
May 2001
22Traffic analysis
- ? 6-8 kbps per link over any connection.
- Traffic structure changed over time.
23Total generated traffic
- 1Gbps (or 330TB/month)!
- Note that this estimate excludes actual file
transfers - Q Does it matter?
- Compare to 15,000TB/month estimated in US
Internet backbone (Dec. 2000). - Reasoning
- QUERY and PING messages are flooded. They form
more than 90 of generated traffic - predominant TTL7
- gt95 of nodes are less than 7 hops away
- measured traffic at each link about 6 to 8kbs
- network with 50k nodes and 170k links
24Topology mismatch
- The overlay network topology doesnt match the
underlying Internet infrastructure topology! - 40 of all nodes are in the 10 largest Autonomous
Systems (AS). - Only 2-4 of all TCP connections link nodes
within the same AS. - Largely random wiring.
- Entropy experiment gives similar results.
25Course Topics
- Internet Architecture and Design Principles
- Flat Pricing vs. Prioritized Traffic
- Internet Measurements
- Availability in Wide-Area
- Patterns in Real Networks
- Modeling the Internet Topology
- Internet Services DNS
- Web Caching, Content Distribution Networks
- Overlay Networks
- Peer-to-Peer systems
- Computational Grids
- Security Issues
- Sensor Nets
- Wireless Networks
- XML SOAP and Web Services
26Course Topics
- Internet Design Principles
- How do I deliver Internet services end-to-end
vs. within the network? - Flat Pricing vs. Prioritized Traffic
- How do I determine which traffic to pass over the
Internet? - Internet Measurements
- What does the Internet really look like?
27Course Topics
- Availability in Wide-Area
- How reliable is the Internet?
- Patterns in Real Networks
- What does Internet traffic look like?
- Modeling the Internet Topology
- How can I construct realistic models of Internet
structure?
28Course Topics
- Internet Services DNS
- How well does DNS work?
- Web Caching, Content Distribution Networks
- How do we optimize Web content mgmt?
- Overlay Networks
- Improving routing performance
29Course Topics
- Peer-to-Peer systems
- Gnutella, etc., etc.
- Computational Grids
- Globus, etc.
- Security Issues
- Authorization, etc.
30Course Topics
- Sensor Nets
- How do I structure program networks of
lightweight devices? - Wireless Networks
- How do I route in ad hoc networks?
- XML SOAP and Web Services
- What are Web services anyway?
31Projects
- Literature surveys, real implementations,
analytical evaluations - Can be performed individually or in a team of two
- Your project ideas appreciated (to be discussed
before proposal due date) - Primary goal is to do something interesting and
to do it well
32Example Project
- Gnutella network analysis
- Develop a crawler that traverses network,
collects membership connectivity info - Analyze structure
- Characterize structure
- See, e.g.
- Mapping the Gnutella Network Properties of
Large-Scale Peer-to-Peer Systems and Implications
for System Design, M. Ripeanu, I. Foster, A.
Iamnitchi, in IEEE Internet Computing Journal,
vol. 6(1), 2002
33Project Ideas
- http//dsl.cs.uchicago.edu/Courses/cs347-2002/cs34
7_projects.htm - Gnutella network measurements
- Topology discovery for 500K nodes
- Structural analysis with 500K nodes
- Study impact of overlay networks
- Etc.
34Project Ideas
- Overlay networks build unstructured or
semistructured self-organizing overlays
optimizing different cost functions - Topology-aware map onto physical infrastructure
- Usage-aware map onto usage patterns
- Analysis of Sloan Digital Sky Survey logs to
explore access patterns - What files are accessed how often
- What community usage patterns emerge?
- How can we exploit these in content distribution
networks?
35Project Ideas
- Compare qualitatively and analytically current
file-location solutions (CAN, Chord, Gnutella,
Napster, etc.) in the context of scientific
file-sharing collaborations. - Evaluate sharing patterns based on real usage
traces in a scientific collaboration - Use these patterns to evaluate benefits/drawbacks
and propose better alternatives - Expand existing simulator to evaluate request
forwarding techniques for resource location in
grid environments
36For More Information
- Contact me
- Ian Foster, foster_at_cs.uchicago.edu
- Email or set up a meeting
- Contact Anda, our TA
- Adriana Iamnitchi, anda_at_cs.uchicago.edu
- Monitor the class web page
- http//dsl.cs.uchicago.edu/Courses/cs347-2002/
37Next 2 Classes
- Friday
- Discuss
- J. Saltzer, D. Reed, and D. Clark, End-to-end
Arguments in System Design. ACM Transactions on
Computer Systems, Vol. 2, No. 4, pp. 195-206,
1984. - D. Clark and M. Blumenthal, Rethinking the design
of the Internet The end to end arguments vs. the
brave new world, Workshop on Policy Implications
of End-to-End. December 1, 2001. - Leading group Ian 2 volunteers (who?)
- Wednesday
- Leading Group Anda 1-2 volunteers (who?)