Welcome to CIS 455 / 555 - PowerPoint PPT Presentation

About This Presentation
Title:

Welcome to CIS 455 / 555

Description:

(This is NOT a course on building Web sites! ... must be exchanged in distributed fashion for the functioning of the application ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 27
Provided by: zack4
Category:
Tags: cis | welcome

less

Transcript and Presenter's Notes

Title: Welcome to CIS 455 / 555


1
Welcome to CIS 455 / 555 Internet and Web
Systems
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 455 / 555 Internet and Web Systems
  • January 13, 2010

2
What this Course Is About
  • How do we build services like Google, Akamai,
    iTunes, Facebook, EBAY, ?
  • What are the principles behind them?(This is NOT
    a course on building Web sites!)
  • How do cloud computing, P2P, and Web services
    relate?
  • The main themes of the course
  • Distributed systems concepts, with emphasis on
    data, scalability and interoperability (including
    the cloud)
  • Data representation fundamentals, with emphasis
    on XML
  • Information retrieval concepts, including ranking
    and indexing
  • Its a course that involves building software
    using the principles learned, evaluating it, and
    programming in teams

3
How Does this Relate to Other CIS Courses?
  • CIS 330/550
  • Data representation and management
  • Relational querying with SQL XML querying with
    XQuery
  • DBMS-backed web sites
  • 455/555 focuses on data with respect to
    interoperability
  • CIS 350/573 software engineering and mashups
  • CIS 505 focuses on distributed systems and
    algorithms
  • CIS 505 is less project-oriented than CIS 555
  • CIS 555 covers Web services, cloud architectures
    in more detail

4
Some Things Well Look at
  • What are the principles behind building systems
    that work on the Internet?
  • How do these relate to many of todays hot
    technologies?
  • Web servers, DHTML, Servlets, JSP,
  • XML
  • Web services
  • Peer-to-peer
  • Application servers
  • Cloud computing environments
  • Content distribution networks
  • Web search
  • Mash-ups
  • The cloud

5
Staff
  • Instructor Zack Ives, zives_at_cis
  • Office 576 Levine North
  • Office hours Th 330-430 (and by arrangement)
  • TA Katie Gibson, gibsonk_at_seas
  • Office hours TBA
  • Discussion group
  • cis-455-555-spring10_at_googlegroups.com
  • http//groups.google.com/cis-455-555-spring10

6
Textbooks
  • Distributed Systems Principles and Paradigms,
    2nd ed, Tanenbaum and van Steen
  • Well read from the book 50 of the time
  • Frequent supplementary handouts
  • Excerpts from several books
  • Many recent research papers
  • Your first one, which you should read by
    Wedhttp//research.microsoft.com/en-us/um/people
    /blampson/33-Hints/Acrobat.pdf (linked off the
    CIS 555 Schedule Slides page)
  • Send me mail if its difficult for you to find a
    way of printing the paper yourself

7
Prerequisites, Workload, etc.
  • Necessary skills
  • Ability to code in Java there is a substantial
    implementation project
  • Good debugging skills this will be the biggest
    time sink!
  • The ability to work as a team with classmates
    (towards the end)
  • A willingness to learn how to read API
    documentation
  • Some exposure to threads and concurrent
    programming
  • A willingness to push the envelope
  • Workload
  • Several programming/debugging-based homework
    assignments
  • A substantial term project with experimental
    evaluation and a report
  • Two midterms
  • Payoff
  • Lots of practical development and debugging
    experience
  • A good working knowledge of the fundamentals
    behind scalable systems
  • A working academic clone of Google, hosted on
    Amazon EC2!
  • WARNING this course should be considered 1.5 CU!

8
A Disclaimer
  • This remains a bleeding edge course!
  • Goal 0 an understanding of scalable distributed
    data-centric systems
  • Goal 1 a look under the covers of todays
    hottest topics in lectures and in projects
  • Goal 2 a level of comfort in managing large,
    complex software development with others code
  • Part of this means doing a substantial
    implementation project
  • As in the real world learning APIs, dealing with
    inadequate tools
  • Most of you will find this a struggle! Youll
    spend many hours debugging!
  • We will be using some immature technology
  • Not everything has been tested and validated
    ahead of time
  • e.g., this will be the first year we are using
    Amazon Elastic Compute Cloud
  • Well do the best we can to smooth over the bugs
  • We hope it will be a fun course, though
  • And an interesting one!

9
A Bit of Context for the Course
10
What Exactly Is the Web?
  • The Web consists of HTTP servers that publish
    HTML, XML, and a few other content types
  • These are hyperlinked via URLs (a subset of URIs)
  • Plus there are a huge number of web clients
  • The Web is built on a number of Internet
    protocols
  • DNS, TCP, IP
  • Other Internet services use other protocols
  • SMTP, IMAP, POP, AIM, FTP,
  • Streaming media, music swapping protocols,
  • Web services, custom applications may actually
    also use HTTP in ways it wasnt designed for

11
The Internet is Built in Layers
Your Application


Web Services, distrib transactions,
Middleware
Lightweight streaming, etc.
SSH, FTP,HTTP, IM, P2P,
Session
TCP (session-based)
UDP (sessionless)
Transport
IPv4, IPv6 Unicast, (multicast)
IP
WiFi, ZigBee, Ethernet, WiMax
Link
12
What Is an Internet System?
  • Not just a web server or web application
  • An application built over the Internet, whose
    functionality is distributed across more than one
    machine
  • Typically, at least in a client-server or
    server-to-server fashion, but may have many more
    participants
  • Typically, data and/or code must be exchanged in
    distributed fashion for the functioning of the
    application
  • Often, the data must be partitioned, replicated,
    translated, etc. (shards in Google-speak)
  • Often, the code is written in multiple different
    environments, languages, etc.
  • Often, there are concerns about handling
    failures, firewalls, attacks,

13
Why Are Internet System Topics Interesting?
  • Understanding whats underneath todays Web
  • How does it work?
  • What are its shortcomings?
  • What are its strengths?
  • Understanding distributed algorithms
  • Using the right approach when designing new
    protocols and web systems
  • Being able to anticipate whats actually possible
    in the future

14
Example Web Search, a Cloud Service
client
client
client
HTML forms results
queries
Web Pages
Search Interface Servers
Uses a model ofdocument/wordsimilarity to
rankmatches
pages
Crawlers
results
query
Index Servers
keywords locations
15
Example Social Networking (Facebook / Twitter),
a Cloud Service
client
client
client
pages notifications
clicks
User PageServers
updates, posts
Users entities
suggestions
Recommender
common properties, usage logs,
16
Example Information Integration
client
client
client
results in mediated schema
queries
Maps all data into a single format and virtual
schema
Mediator System
XQuery XPath over XML
ODBC results
HTML
SQL
HTTP POST
XML
XML sources
Relational sources
HTML sources
17
Example SETI_at_home
Breaks computation intomany parts and
distributes them tothe clients
Problem Partitioning
Data Aggregation
New sub-problems
Computedsubresults
client
client
client
18
Example P2P File Sharing
Processes name-basedrequests for data eachnode
can make requests,forward requests,return data
request
client
client
data
request
data
data
request
client
client
19
What are the Hard Problems?
  • Disclaimer most of the hard problems ARENT
    solved (or solvable) and there often isnt any
    single BEST solution
  • Much of systems design is about finding the
    right compromise for each specific problem
  • We can divide them into
  • Scalability
  • Availability / reliability
  • Consistency
  • Interoperability
  • Location and resource discovery

20
Scalability
  • How do we support a large number of clients or
    requests?
  • Distribute work!
  • Challenges
  • Coordination takes significant overhead in the
    general case
  • Load balancing avoid having bottlenecks
  • Parts of the solution
  • Client-server, multi-tier, P2P architectures
  • Restricted programming models, e.g., MapReduce
  • Data partitioning, replication, remote procedure
    calls,

21
Availability/Reliability
  • How do we ensure the system is up when we want
    it to be, and doing the right thing?
  • Replication and redundancy
  • Security measures against attacks
  • Ability to undo/redo
  • Challenges
  • Keeping things consistent
  • Performance vs. security
  • Acknowledgments
  • Parts of the solution
  • Data partitioning, replication,
  • Logging, transactions,
  • Redundant hardware, multiple sites,
  • Quorum and consensus algorithms

22
Consistency / Consensus
  • Replication, distribution, and failures make it
    difficult to keep a unified, consistent view of
    the world how do we combat this?
  • Locking, concurrency control, and invalidation
    schemes
  • Clock synchronization
  • Challenges
  • Locking has huge performance overhead
  • Network partitions, disconnected operation
  • Parts of the solution
  • Optimistic concurrency control, 2-phase locking
  • Distributed clock sync
  • Conflict resolvers

23
Interoperability
  • How do we coordinate the efforts of components
    that have different data formats and/or source
    languages, and are on different machines?
  • Standardization!
  • Challenges
  • Everything has a different semantics!
  • Parts of the solution
  • Standard data formats XML, XML schemas
  • Schema mediation and data translation
  • Remote procedure calls CORBA, XML-RPC,

24
Location Resource Discovery
  • How do you find what youre looking for?
  • Naming
  • Declarative queries over standard schemas
  • Advertisements
  • Challenges
  • Naming has implicit semantics
  • What do you do when you dont know what to call
    something?
  • Parts of the solution
  • Directory systems DNS, LDAP, etc.
  • Resource discovery and advertising protocols
  • Overlay networks, sharding schemes
  • Standardized schemas

25
Our First Focus Single Machines, aka Servers
  • How do you handle large numbers of concurrent
    users?
  • Processes
  • Threads
  • Events
  • Hybrids (e.g., thread pools)
  • Staged architectures

26
Next Time (Wed due to MLK Day)
  • Well look under the covers of an HTTP server
  • Key ideas in building scalable systems
  • Principles of HTTP and web servers
  • Management of concurrent sessions
  • To read by next Wednesday
  • Lampson and Saltzer paperhttp//research.microsof
    t.com/en-us/um/people/blampson/33-Hints/Acrobat.pd
    f
  • Tanenbaum Ch. 3.1
  • If necessary Review Tanenbaum Modern OS, Ch.
    2.3 or a similar OS book on interprocess
    communication
Write a Comment
User Comments (0)
About PowerShow.com