Title: CIS 6930.008: Internet-Scale Networked Systems
1CIS 6930.008 Internet-Scale Networked Systems
Adriana Iamnitchi (Anda) anda_at_cse.usf.edu
2Contact Info
- Email anda_at_cse.usf.edu
- Office ENB 334
- Office hours Wed 2-4 and by appointment (email
me) - Course page http//www.csee.usf.edu/anda/cis6930
.008
3CIS 6930.008 Course Goals
- Primary
- Gain deep understanding of fundamental issues
that affect design of large-scale federated
distributed systems - Map primary contemporary research themes
- Gain experience in distributes systems research
- Secondary
- By studying a set of outstanding papers, build
knowledge of how to present research - Learn how to read papers evaluate ideas
4What Ill Assume You Know
- Basic Internet architecture
- IP, TCP, DNS, HTTP
- Basic principles of distributed computing
- Asynchrony (cannot distinguish between
communication failures and latency) - Partial global state knowledge (cannot know
everything correctly) - Failures happen. In very large systems, even rare
failures happen often - If there are things that dont make sense, ask!
5Examples of Distributed Systems
ATT web
Gnutella network
The Internet
A Sensor Network
6Definition (a version)
- A distributed system is a collection of
autonomous, programmable, failure-prone entities
that are able to communicate through a
communication medium that is unreliable. - Entitya process on a device (PC, PDA, mote)
- Communication MediumWired or wireless network
- Internet-Scale
- Spanning multiple institutional or network
(DNS) domains - (Much) Larger than cluster
7This semesters Theme (a proposal)
- Exploiting
- Emergent Behavior
- in Large-Scale Distributed Systems
8Filecules and Small Worlds in a Scientific
Workload Characteristics and Significance
9Grid Resource-Sharing Environment
- Users
- 1000s from 10s institutions
- Well-established communities
- Resources
- Computers, data, instruments, storage,
applications - Owned/administered by institutions
- Applications data- and compute-intensive
processing - Approach common infrastructure
10The Problem
- We have now
- Mature grid deployments running in production
mode - We do not have yet
- Quantitative characterization of real workloads.
- How many files, how much input data per process,
etc. - And thus, benchmarks, workload models,
reproducible results - Costs
- Local solutions, often replicating work
- Temporary solutions that become permanent
- Far from optimal solutions
- Impossible to compare alternatives on relevant
workloads
11Still, Why Should We Care?
- Impossibility results, high costs Tradeoffs are
necessary - Solution Select tradeoffs based on
- User requirements (of course)
- Usage patterns
- Patterns exist and can be exploited. Examples
- Zipf distribution for request popularity (web
caching) Breslau et al., Infocom99 - Network topology
Partial Topology
Random 30 die
Targeted 4 die
from Saroiu et al., MMCN 2002
12The DØ Experiment
- High-energy physics data grid
- 72 institutions, 18 countries, 500 physicists
- Detector Data
- 1,000,000 Channels
- Event rate 50 Hz
- So far, 1.9 PB of data
- Data Processing
- Signals physics events
- Events about 250 KB, stored in files of 1GB
- Every bit of raw data is accessed for
processing/filtering - Past year overall 0.6 PB
- DØ
- processes PBs/year
- processes 10s TB/day
- uses 25 50 remote computing
13Filecules and Small Worlds in Scientific
Communities Characteristics and Significance
- Joint work with
- Matei Ripeanu (UBC) and
- Ian Foster (ANL and UChicago)
14Yellow Submarine Les Bonbons
No 24 in B minor, BWV 869 Les Bonbons
Yellow Submarine Wood Is a Pleasant Thing to
Think About
Wood Is a Pleasant Thing to Think About
15The DØ Collaboration
6 months of traces (January June 2002) 300
users, 2 million requests for 200K files
Small average path length
Small World!
Large clustering coefficient
16Small-World Graphs
- Small path length, large clustering coefficient
- Typically compared against random graphs
- Think of
- Its a small world!
- Six degrees of separation
- Milgrams experiments in the 60s
- Guares play Six Degrees of Separation
17Other Small Worlds
D. J. Watts and S. H. Strogatz, Collective
dynamics of small-world networks. Nature,
393440-442, 1998 R. Albert and A.-L. Barabási,
Statistical mechanics of complex networks, R.
Modern Physics 74, 47 (2002).
18Web Data-Sharing Graphs
Data-Sharing Relationships in the Web, Iamnitchi,
Ripeanu, and Foster, WWW03
19DØ Data-Sharing Graphs
28 days, 1 file
7days, 1file
20KaZaA Data-Sharing Graphs
2 hours 1 file
28 days 1 file
1 day 2 files
4h 2 files
12h 4 files
7day, 1file
Small-World File-Sharing Communities, Iamnitchi,
Ripeanu, and Foster, Infocom 04
21Interest-Aware Data Dissemination
D0
Web
Kazaa
Interest-Aware Information Dissemination in
Small-World Communities, Iamnitchi and Foster,
HPDC05
22Current Work Tagging Communities
Tracking User Attention in Collaborative Tagging
Communities, Elizeu Santos-Neto, Matei Ripeanu,
and Adriana Iamnitchi, Workshop on Contextualized
Attention Metadata (CAMA'07), Vancouver, Canada,
June 2007.
23DØ Workload Characterization
- Joint work with
- Shyamala Doraimani (USF) and Gabriele Garzoglio
(FNAL)
24DØ Traces
- Traces from January 2003 to May 2005
- 234,000 jobs, 561 users, 34 domains, 1.13 million
files accessed - 108 input files per job on average
- Detailed data access information about half of
these jobs (113,062)
25Contradicts Traditional Models
- File size distribution
- Expected log-normal. Why not?
- Deployment decisions
- Domain specific
- Data transformation
- File popularity distribution
- Expected Zipf. Why not? (speculations)
- Scientific data is uniformly interesting
- User community is relatively small
26Filecules Intuition
27Filecules General Characteristics
Filecules in High-Energy Physics Characteristics
and Impact on Resource Management, Adriana
Iamnitchi, Shyamala Doraimani, Gabriele
Garzoglio, HPDC06
28Filecules Size
- Filecules of different sizes
- Largest filecule17 TB or 51,841 files
- 28 mono-file filecules
29Consequences for Caching
- Use filecule membership for prefetching
- When a file is missing from the local cache,
prefetch the entire filecule - Use time locality in cache replacement
- Least Recently Used (classic algorithm)
- Implemented
- LRU with files and LRU with filecules
- Greedy Request Value prefetching job
reordering - Does not exploit temporal locality
- Prefetching based on cache content
- Our variant of LRU with filecules and job
reordering
E. Otoo, et al. Optimal file-bundle caching
algorithms for data-grids. In SC 04
30Comparison Caching Algorithms (1)
31Comparison Caching Algorithms (2)
of cache change is a measure of transfer costs.
32Summary Part 1
- Revisited traditional workload models
- Generalized from file systems, the web, etc.
- Some confirmed (temporal locality), some infirmed
(file size distribution and popularity) - Compared caching algorithms on D0 data
- Temporal locality is relevant
- Filecules guide prefetching
33Summary
- Workload characterization based on a HEP grid
- Quantify scale (data processed, number of files)
- Contradict traditional models
- Patterns can guide system design
- Filecules caching, data replication
- Small world data sharing adaptive information
dissemination, replica placement
34AdministraviaPaper Reviewing (1)
- Goals
- Think of what you read
- Get used to writing paper reviews
- Reviews due by noon before class
- Be professional in your writing
- Have an eye on the writing style
- Clarity
- Beware of traps learn to use them in writing and
detect them in reading - Detect (and stay away from) trivial claims.
- E.g., 1st sentence in the Introduction
- The tremendous/unprecedented/phenomenal
growth/scale/ubiquity of the Internet
35AdministraviaPaper Reviewing (2)
- Follow the form provided when relevant.
- State the main contribution of the paper
- Critique the main contribution Rate the
significance of the paper on a scale of 5
(breakthrough), 4 (significant contribution), 3
(modest contribution), 2 (incremental
contribution), 1 (no contribution or negative
contribution). Explain your rating in a sentence
or two. - Rate how convincing the methodology is.
- Do the claims and conclusions follow from the
experiments? - Are the assumptions realistic?
- Are the experiments well designed?
- Are there different experiments that would be
more convincing? - Are there other alternatives the authors should
have considered? - (And, of course, is the paper free of
methodological errors?)
36AdministraviaPaper Reviewing (3)
- What is the most important limitation of the
approach? - What are the three strongest and/or most
interesting ideas in the paper? - What are the three most striking weaknesses in
the paper? - Name three questions that you would like to ask
the authors. - Detail an interesting extension to the work not
mentioned in the future work section. - Optional comments on the paper that youd like to
see discussed in class.
37AdministraviaDiscussion leading
- Come prepared!
- Prepare discussion outline
- Prepare questions
- What ifs
- Unclear aspects of the solution proposed
-
- Similar ideas in different contexts
- Initiate short brainstorming sessions
- Leaders do NOT need to submit paper reviews
- Main goals
- Keep discussion flowing
- Keep discussion relevant
- Engage everybody (Ill have an eye on this, too)
38AdministraviaProjects
- Combine with your research if relevant to the
class - Get approval from all instructors if you overlap
final projects - Dont sell the same piece of work twice
- You can get more than twice as many results with
less than twice as much work - Aim high!
- Put one extra month and get a publication out of
it - It is doable (we have proofs)
- Try ideas that you postponed out of fear its
just a class, not your PhD.
39AdministraviaProject deadlines (tentative)
- January 30 1-page project proposal
- Feb. 26 3-page literature survey
- Know relevant work in your problem area
- If implementation project, list tools, similar
projects - March 31 5-page Midterm project due
- Have a clear image of whats possible/doable
- Report preliminary results
- Last classIn-class project presentation
- Demo, if appropriate
- May 1
- Final report due
40Next Classed
- Lectures on basics of distributed systems
- Will start reading papers in about 2 weeks
41Questions?