Title: CS290F, Winter 2005 Large-scale Networked Systems
1CS290F, Winter 2005Large-scale Networked Systems
- Instructor
- Ben Y. Zhao (ravenben at cs.ucsb.edu, 1151
Engineering I) - Office hour Thur, 3-4pm
- Lecture time TuTh, 100-300 pm
- Place 1401 Phelps
- Teaching Assistant
- Ted Huffmire (huffmire at cs.ucsb.edu)
- Office hours TBA
2Overview
- Administrivia
- What this class is, and is not
- Enrollment, etc
- What you have to do for this class
- Grading
- Break
- Course projects
- organization
- some initial ideas
3Administrivia
- Course webpage
- http//www.cs.ucsb.edu/ravenben/classes/290F
- check it periodically for announcements
- Class mailing list
- cs290f at lists.cs.ucsb.edu
- go to http//lists.cs.ucsb.edu/mailman/listinfo/c
s290f - sign up today!
- Deadlines
- Unless otherwise specified, it means 10 minutes
before the lecture - Special circumstances should be brought to my
attention before deadlines
4Class Registration
- Please send me (ravenben_at_cs.ucsb.edu ) an e-mail
with the subject cs290f registration" and the
following - Last and first name
- Student ID
- Your department
- Preferred email address
- URL of your home page
- Please indicate explicitly if I can add you to
the on-line web page that lists each student
enrolled in the class (only your name and URL
will be made public).
5Class Enrollment
- Class size capped at 40
- currently enrollment is full
- for good discussion, we should be at 25
- For those unable to enroll
- email me with your student ID (perm )
- I expect enrollment to open up in first 1-2 weeks
6Goals of this Course
- Understand
- How do peer-to-peer systems work?
- What are the design issues in building large
networked systems? - Where is networking research heading?
- Get familiar with current network research
- Understand solutions in context
- Goals
- Assumptions
7Goals of this Course (contd)
- Appreciate what is good research
- Problem selection
- Solution and research methodology
- Presentation
- Apply what you learned in a class project
8Reasons to Not Take This Class
- Its not...
- a survey class on p2p systems
- a reading seminar
- I might do a p2p reading group / seminar later
- will be focused more on reading papers
- Unreasonably high expectations
- this is my first grad class
- its in my research area
- And
- smaller class will benefit everyone
- also, this class will likely be offered next year
9What do you need to do?
- A research-oriented class project
- Paper reading / reviews
- Lead class discussion on one research paper
- Participate in discussions in class
- this is meant to be an interactive class
10Research Project
- Investigate new ideas and solutions in class
project - Define the problem
- Execute the research
- Work with your partners
- Write up and present your research
- Ideally, best projects will become conference or
workshop papers - e.g., SIGCOMM, MOBICOM, SOSP/OSDI, NSDI
- IPTPS, WCW, HotOS, WMCSA, NOSSDAV
11Research Project Steps
- Ill distribute a list of projects
- You can either choose one of these projects or
come up with your own - Pick your project, partner, and submit a one-page
proposal describing - The problem you are solving
- Your plan of attack with milestones and dates
- Any special resources you may need
- A midterm presentation of your progress (5
minutes) - Poster session
- Submit project papers
12Paper Reviews
- Goal synthesize main ideas and concepts in
papers - Number 2-3 per class
- Length no more than ½ page per paper
- Content
- Main points intended by the author
- Points you particularly liked/disliked
- Other comments (writing, conclusions)
- Submission
- Submit each review via email before class on
lecture day - See class web page for details
13What is a Presentation?
- Presentations are in Powerpoint
- or PDF if you hate MSFT
- Target a 30 minute presentation
- thats about 15 slides
- idea is to engage the class in discussion on your
points - leave the audience with thoughtful questions
- Youll be graded on several things
- did you present the high-level points of the
paper? - did you get a good discussion going
- did you relate the paper to others in the course
(context!)
14Choose Your Paper
- Pick your paper from the lecture schedule online
- the readings and dates will be finalized by
Thursday - Email me to claim the paper
- Ill let you know if its already been taken
- presenters names will be updated online asap
15About Grading
Paper reviews 10
Class presentations 20
Class Project 70
- This is a graduate networking class
- key is what you learn, not the grade
- Evaluation of projects
- 2 components execution and impact
- execution
- proposal, design, implementation, evaluation,
presentation - impact
- how novel is your work, how does it advance the
state of the art? - good execution ? B/A-, impact ? A/A
16Class Topics
- Study existing p2p systems
- Gnutella, Freenet, Tapestry, OceanStore, PAST,
RON, I3, Vivaldi - Study fundamental system issues
- security, fault-tolerance, system measurement and
deployment - Study issues in alternative network environments
- sensor networks
- ad-hoc and mobile networks
17Break
- Take a break, get up, walk around, stretch
- come back in 10 minutes
18Course Projects
- Small project teams
- ideally teams of 3
- 2 (likely too small to tackle significant
project) - 4 (likely too large to coordinate well)
- Choose your team members
- use the class enrollment webpage (Ill put that
up shortly) - use the class mailing list
- Send me a one-page project proposal by Jan 15
- what is the problem youre solving (w/ related
works) - motivation/challenges (why is this important and
new) - plan of attack (w/ milestones and dates)
- any resources you might need
19Background on Structured Overlays
- Each data item and machine (node) in the system
has associated a unique ID in a large ID space - ID space is partitioned among nodes
- DHT Hash table like interface
- put(id, data)
- data get(id)
- data items are stored at the node responsible for
its ID - DOLR directory service interface
- publish (id)
- routeToId (id, message)
- data stored anywhere and located via directory
20Structured Peer-to-Peer Overlays
- Assign random nodeIDs and keys from secure hash
- incrementally route towards destination ID
- each node has small set of outgoing routes, e.g.
prefix routing
ID ABCE
ABC0
To ABCD
AB5F
A930
21Whats in a Protocol?
- Definition of name-proximity
- each hop gets you closer to destination ID
- prefix routing, numerical closeness, hamming
distance - Size of routing table
- amount of state kept by each node as f (N), N
network size - of overlay routing hops
- worst case routing performance (in overlay hops,
not IP) - Network locality
- does choice of neighbor consider network distance
- impact on actual performance of P2P routing
- Application Interface
22Chord
- NodeIDs are numbers on ring
- Closeness defined by numerical proximity
- Finger table
- keep routes for next node 2i away in namespace
- routing table size log2 n
- n total of nodes
- Routing
- iterative hops from source
- at most log2 n hops
Node 0/1024
0
128
896
256
768
640
384
512
23Chord II
- Pros
- simplicity
- Cons
- limited flexibility in routing
- neighbor choices unrelated to network proximity
but can be optimized over time - Application Interface
- distributed hash table (DHash)
24Tapestry / Pastry
- incremental prefix routing
- 1111?0XXX?00XX ?000X?0000
- routing table
- keep nodes matching at least i digits to
destination - table size b logb n
- routing
- recursive routing from source
- at most logb n hops
Node 0/1024
0
128
896
256
768
640
384
512
25Routing in Detail
Example Octal digits, 212 namespace, 2175 ? 0157
5712
0880
3210
4510
7510
26Project Ideas
- Discuss 10 project suggestions
- some related to peer-to-peer protocols and
systems - Legend based on how well-defined projects are
not necessary how difficult they are - Well-defined projects (5)
- Less-defined project (3)
- You need to define projects goals (2)
- You can also work on variants of these, or make
up your own topics - Need to send me a one page proposal by Jan 15
- Ill provide feedback/meetings to iterate on topic
27Project 1 Implement Chimera
- Implement a structured peer-to-peer protocol that
is a hybrid of Tapestry, Pastry, and Bamboo - implement in C/C
- focus on usability and performance
- Key features
- simplicity (leafsets like Pastry)
- stability (p2p exchange of routing tables like
Bamboo) - performance (locality like Tapestry)
- support standard p2p API and easy to build up
on - might support multiple teams
28Project 2 the DIY P2P protocol
- Study the related literature on existing p2p
protocols, and design something different - must have something significantly novel/new
- can be new contribution in performance, security,
theoretical interest, specialized for a useful
application - The challenge
- finding an area yet unexplored in the p2p space
- The result
- well-specified protocols on routing, node
insertion/deletion, failure handling - Warning this is not easy
29Project 3-5 P2P Applications/Systems
- All applications built on common API
- should work w/ existing protocols and new ones
developed in class - Project 3 a lightweight p2p CDN/web-cache
- use a DHT/DOLR to quickly search for desired
object - use request tracking to determine optimal data
placement - implement a client-side http proxy
- The challenge
- maintaining data stability as nodes come and go
- providing fairness / load balancing across nodes
30Project 4 P2P EBay
- Design and implement a scalable and secure
auctioning / e-commerce system on a p2p
infrastructure - support large of transactions/time
- support large of users and items
- be stable and resilient against attacks
- The challenge
- understanding the types of attacks
- selecting a reasonable subset to resist
- maintaining scalability despite security
enhancements
31Project 5 Lightweight Data Synchronization
- Design and implement Quartz
- lightweight p2p data sharing system
- store your most critical files (lt100MB) online
- use simple application-specific handlers to
provide fast data synchronization (a la CVS) - synchronize your HTML bookmarks across machines
- synchronize your papers, homework files,
financial records - end to end encryption
- The challenge
- keeping data highly-persistent on a dynamic p2p
network - optimizing per-node operational overhead
- a simple user interface that people will
actually use
32Project 6 P2P Security
- Security is one of the biggest challenges in p2p
- large population spread across networks and
domains - high probability of node compromise
- need to function in presence of malicious nodes
- Existing work on security discouraging
- Sybil attacks hard to prevent
- bad users can create lots of identities online
and generate collusion attacks - hard to avoid malicious nodes
- youre 1, they are many
33Project 6 a P2P reputation system
- Design and evaluate a highly-adaptive p2p
reputation system - quickly recognize malicious (or compromised)
nodes - use p2p collaboration to share reputation
information - form trusted circles
- use third-party anonymous verification to build
reputations - The challenge
- building reliable reputations that are highly
adaptable - doing so without generating massive network probe
traffic - minimizing impact if system is circumvented
34Project 7 Distributed Workload Logging
- Modelnet and Emulab provide emulation
environments for up to 1000s of virtual nodes - good for reproducible experiments
- bad in a dedicated cluster, unlike real world
conditions - PlanetLab provides real-world platform for
distributed applications (200 nodes across
Inet) - good for deploying / hardening your
application/system - bad for reproducible results
- What we need
- a way to gather interesting (and unexpected)
environmental conditions from distributed
networks like Planetlab, and use it as trace to
drive Modelnet/Emulab
35Project 7 cont.
- The goal
- design, implement, and deploy a distributed
sensor and logging interface - needs to be scalable, lightweight (minimize
impact on environment), and accurate - gather a large data set, and build an
environmental trace
36Project 8 P2P Benchmarking
- Design and test and benchmark for P2P routing
- needs to account for a wide-range set of
application-level behaviors and working
conditions - throughput
- latency
- key lookup versus data read/store
- rate of node churn
- The challenge
- being comprehensive
- being fair to goals of different protocols
37Project 9 Quantifying P2P Performance
- Test a set of protocols against a number of
network topologies - mostly simulations (might need to implement some
protocols) - test using different types of network models,
GT-ITM, BRITE, Waxman, etc - vary protocol parameters to measure impact on
basic performance - The challenge
- making sense of large quantities of test data
- getting a better understanding of inherent
properties of network models and how they impact
overlay networks
38Project 10 P2P Ad-hoc
- Take p2p routing algorithms, and apply to the
ad-hoc or sensor net space - consider new constraints power, processing, no
underlying IP - design new routing protocol
- validate via simulation
- The challenge
- theres fairly little previous work, and its not
clear the idea is feasible
39Support Infrastructure
- CSIL machines
- you should all have accounts
- A new 32 node networking cluster
- just assembled, being installed now
- Dell Xeon servers, 2.4Ghz, 2G RAM
- cluster building in progress
- will have modelNet installed, and support
large-scale network emulation tests (300 nodes)
40Summary of TODOs
- What you need to do
- send me registration email with subject cs290f
registration - sign up online for class mailing
listhttp//lists.cs.ucsb.edu/mailman/listinfo/cs2
90f - pick a paper to present and email me to claim
it(after Thur, first come first serve) - find other team members and pick a project
topic(first draft of proposal due Jan 15)