CS 34701: Large-Scale Networked Systems - PowerPoint PPT Presentation

About This Presentation
Title:

CS 34701: Large-Scale Networked Systems

Description:

CS 34701: LargeScale Networked Systems – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 38
Provided by: leel171
Category:

less

Transcript and Presenter's Notes

Title: CS 34701: Large-Scale Networked Systems


1
CS 34701 Large-Scale Networked Systems
  • Professor Ian Foster
  • TA Adriana Iamnitchi
  • http//dsl.cs.uchicago.edu/Courses/cs347-2002/

2
CS 34701 Course Goals
  • Primary
  • Gain deep understanding of fundamental issues
    that effect design of large-scale networked
    systems
  • Map primary contemporary research themes
  • Gain experience in network research
  • Secondary
  • By studying a set of outstanding papers, build
    knowledge of how to present research
  • Learn how to read papers evaluate ideas

3
How the Class Works
  • Research papers
  • Prior to each class, we all read and evaluate two
    research papers
  • During each class, we discuss those papers
  • Project
  • One-page project description by 2nd week
  • Five-page project summary by 5th week
  • 10-20 final paper by 9th week
  • Project presentations 9th and 10th weeks.

4
Paper Review Discussion
  • Everyone reads two papers per class and submits
    an evaluation (see below)
  • We discuss (not present) papers in class
  • A team of 2-3 leads each discussion
  • The leading team submits discussion plan before
    class, submits master critique and summarizes
    discussion at the beginning of following class
  • Look over schedule between now Friday, when we
    will allocate discussants

5
Evaluations
  • You must submit evaluations of papers
  • Email them by 6pm the day before
  • Answer a set of standard questions
  • State the main contribution of the paper
  • Critique the main contribution
  • What are the three strongest and/or most
    interesting ideas in the paper?
  • Three most striking weaknesses in the paper?
  • Three questions to ask the authors?
  • Detail an interesting extension to the work not
    mentioned in the future work section.
  • Optional comments on the paper that youd like to
    see discussed in class.

6
What Ill Assume You Know
  • Basic Internet architecture
  • IP, TCP, DNS, HTTP
  • Basic principles of distributed computing
  • Asynchrony (cannot distinguish between
    communication failures and latency)
  • Partial global state knowledge (cannot know
    everything correctly)
  • Failures happen. In very large systems, even rare
    failures happen often
  • If there are things that dont make sense, ask!

7
Large-Scale Networked Systems
  • Internet-connected networks with a large number
    of components, spanning multiple DNS domains
    (usually WAN)
  • Designed to solve specific problems
  • Content distribution
  • Cycle sharing
  • File sharing
  • Sensor data fusion
  • Distributed data analysis

8
Example Gnutella
  • Peer-to-peer file sharing system
  • File sharing goal is to enable publication and
    access to files
  • P2P no central servers all clients also act as
    servers and are equivalent (more or less)
  • Issues
  • Scaling to very large numbers of nodes
  • Properties bootstrapping, reliability, cost,
    anonymity, security, freeloading,

9
Gnutella Protocol Overview
  • P2P file sharing application on top of an overlay
    network
  • Nodes maintain open TCP connections.
  • Messages are broadcasted (flooded) or
    back-propagated.
  • Protocol

Broadcast (Flooding) Back-propagated Node to node
Membership PING PONG
Query QUERY QUERY HIT
File download GET, PUSH
10
Gnutella search mechanism
  • Steps
  • Node 2 initiates search for file A

7
1
4
2
6
3
5
11
Gnutella search mechanism
  • Steps
  • Node 2 initiates search for file A
  • Sends message to all neighbors

7
1
4
2
6
3
5
12
Gnutella search mechanism
  • Steps
  • Node 2 initiates search for file A
  • Sends message to all neighbors
  • Neighbors forward message

7
1
4
2
6
3
5
13
Gnutella search mechanism
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that have file A initiate a reply message

7
1
4
2
6
3
5
14
Gnutella search mechanism
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that have file A initiate a reply message
  • Query reply message is back-propagated

7
1
4
2
6
3
5
15
Gnutella search mechanism
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that have file A initiate a reply message
  • Query reply message is back-propagated
  • Node 2 gets replies

7
1
4
2
6
3
5
16
Gnutella search mechanism
  • Steps
  • Node 2 initiates search for A
  • Sends message to all neighbors
  • Neighbors forward message
  • Nodes that have file A initiate a reply message
  • Query reply message is back-propagated
  • Node 2 gets replies
  • File download

download A
7
1
4
2
6
3
5
17
Tools for network exploration
  • Eavesdropper - modified node inserted into the
    network to log traffic.
  • Crawler - connects to all active nodes and uses
    the membership protocol to discover graph
    topology.
  • Parallel crawling.
  • Graph analysis tools
  • high-volume offline
  • computations.

18
Network growth
  • High user interest
  • Users tolerate high latency, low quality results.
  • Better resources
  • DSL and cable modem nodes grew from 24 to 41
    over 6 months.
  • Open architecture / open-source environment
  • Competing implementations,
  • Lower overhead network traffic, improved
  • resource utilization, better structure,
  • Recently, two-level structure.

19
Growth invariants
  1. Graph connectivity 3.4 links per node on
    average.
  2. Path length distribution node-to-node distance
    maintains similar distributions.
  • Avg. node-to-node distance grew 25 while the
    network grew 50 times over 6 months.
  • Random graph theory predicts about 75 increase.

20
Is Gnutella a power-law network?
Power-law networks the number of nodes N with
exactly L links is proportional to L-k N L-k
  • Examples
  • The Internet,
  • In/out links to/from
  • HTML pages,
  • Citations network,
  • US power grid,
  • Social networks.

November 2000
Implication High tolerance to random node
failure but low reliability when facing an
intelligent adversary
21
Is Gnutella a power-law network?
  • Later, larger networks display a bimodal
    distribution.
  • Implications
  • High tolerance to random node failures preserved
  • Increased reliability
  • when facing an
  • attack.

May 2001
22
Traffic analysis
  • ? 6-8 kbps per link over any connection.
  • Traffic structure changed over time.

23
Total generated traffic
  • 1Gbps (or 330TB/month)!
  • Note that this estimate excludes actual file
    transfers
  • Q Does it matter?
  • Compare to 15,000TB/month estimated in US
    Internet backbone (Dec. 2000).
  • Reasoning
  • QUERY and PING messages are flooded. They form
    more than 90 of generated traffic
  • predominant TTL7
  • gt95 of nodes are less than 7 hops away
  • measured traffic at each link about 6 to 8kbs
  • network with 50k nodes and 170k links

24
Topology mismatch
  • The overlay network topology doesnt match the
    underlying Internet infrastructure topology!
  • 40 of all nodes are in the 10 largest Autonomous
    Systems (AS).
  • Only 2-4 of all TCP connections link nodes
    within the same AS.
  • Largely random wiring.
  • Entropy experiment gives similar results.

25
Course Topics
  • Internet Architecture and Design Principles
  • Flat Pricing vs. Prioritized Traffic
  • Internet Measurements
  • Availability in Wide-Area
  • Patterns in Real Networks
  • Modeling the Internet Topology
  • Internet Services DNS
  • Web Caching, Content Distribution Networks
  • Overlay Networks
  • Peer-to-Peer systems
  • Computational Grids
  • Security Issues
  • Sensor Nets
  • Wireless Networks
  • XML SOAP and Web Services

26
Course Topics
  • Internet Design Principles
  • How do I deliver Internet services end-to-end
    vs. within the network?
  • Flat Pricing vs. Prioritized Traffic
  • How do I determine which traffic to pass over the
    Internet?
  • Internet Measurements
  • What does the Internet really look like?

27
Course Topics
  • Availability in Wide-Area
  • How reliable is the Internet?
  • Patterns in Real Networks
  • What does Internet traffic look like?
  • Modeling the Internet Topology
  • How can I construct realistic models of Internet
    structure?

28
Course Topics
  • Internet Services DNS
  • How well does DNS work?
  • Web Caching, Content Distribution Networks
  • How do we optimize Web content mgmt?
  • Overlay Networks
  • Improving routing performance

29
Course Topics
  • Peer-to-Peer systems
  • Gnutella, etc., etc.
  • Computational Grids
  • Globus, etc.
  • Security Issues
  • Authorization, etc.

30
Course Topics
  • Sensor Nets
  • How do I structure program networks of
    lightweight devices?
  • Wireless Networks
  • How do I route in ad hoc networks?
  • XML SOAP and Web Services
  • What are Web services anyway?

31
Projects
  • Literature surveys, real implementations,
    analytical evaluations
  • Can be performed individually or in a team of two
  • Your project ideas appreciated (to be discussed
    before proposal due date)
  • Primary goal is to do something interesting and
    to do it well

32
Example Project
  • Gnutella network analysis
  • Develop a crawler that traverses network,
    collects membership connectivity info
  • Analyze structure
  • Characterize structure
  • See, e.g.
  • Mapping the Gnutella Network Properties of
    Large-Scale Peer-to-Peer Systems and Implications
    for System Design, M. Ripeanu, I. Foster, A.
    Iamnitchi, in IEEE Internet Computing Journal,
    vol. 6(1), 2002

33
Project Ideas
  • http//dsl.cs.uchicago.edu/Courses/cs347-2002/cs34
    7_projects.htm
  • Gnutella network measurements
  • Topology discovery for 500K nodes
  • Structural analysis with 500K nodes
  • Study impact of overlay networks
  • Etc.

34
Project Ideas
  • Overlay networks build unstructured or
    semistructured self-organizing overlays
    optimizing different cost functions
  • Topology-aware map onto physical infrastructure
  • Usage-aware map onto usage patterns
  • Analysis of Sloan Digital Sky Survey logs to
    explore access patterns
  • What files are accessed how often
  • What community usage patterns emerge?
  • How can we exploit these in content distribution
    networks?

35
Project Ideas
  • Compare qualitatively and analytically current
    file-location solutions (CAN, Chord, Gnutella,
    Napster, etc.) in the context of scientific
    file-sharing collaborations.
  • Evaluate sharing patterns based on real usage
    traces in a scientific collaboration
  • Use these patterns to evaluate benefits/drawbacks
    and propose better alternatives
  • Expand existing simulator to evaluate request
    forwarding techniques for resource location in
    grid environments

36
For More Information
  • Contact me
  • Ian Foster, foster_at_cs.uchicago.edu
  • Email or set up a meeting
  • Contact Anda, our TA
  • Adriana Iamnitchi, anda_at_cs.uchicago.edu
  • Monitor the class web page
  • http//dsl.cs.uchicago.edu/Courses/cs347-2002/

37
Next 2 Classes
  • Friday
  • Discuss
  • J. Saltzer, D. Reed, and D. Clark, End-to-end
    Arguments in System Design. ACM Transactions on
    Computer Systems, Vol. 2, No. 4, pp. 195-206,
    1984.
  • D. Clark and M. Blumenthal, Rethinking the design
    of the Internet The end to end arguments vs. the
    brave new world, Workshop on Policy Implications
    of End-to-End. December 1, 2001.
  • Leading group Ian 2 volunteers (who?)
  • Wednesday
  • Leading Group Anda 1-2 volunteers (who?)
Write a Comment
User Comments (0)
About PowerShow.com