Web Engineering Class 10 - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Web Engineering Class 10

Description:

Bittorrent (etc.) Splitting big file into small parts. Giving parts to network of nodes ... Also bittorrent. 41 (Re)Sources. http://www.zakon.org/robert ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 47
Provided by: jacekk
Category:

less

Transcript and Presenter's Notes

Title: Web Engineering Class 10


1
Web EngineeringClass 10
  • 2005/12/13
  • Jacek Kopecký
  • jacek.kopecky_at_deri.org

2
Homework Comments
  • Homework 6 scored
  • Sending questions on homework by email
  • Please add question to subject line
  • Formats
  • latex, ogg, svg, mng, vrml, .kmz
  • jar, au, rtf, swx
  • Already supported formats
  • flash, pdf, png
  • Dont judge support by existence of media type

3
Homework CommentsUnderappreciated Formats
  • Should become supported
  • latex
  • ogg (Vorbis, FLAC, container)
  • svg
  • mng
  • vrml
  • .kmz (Google Earth)
  • I dont think should become (more) supported
  • jar
  • au
  • rtf
  • swx (StarOffice, OpenOffice.org)

4
Homework CommentsUnderappreciated Formats
  • Already supported formats
  • flash
  • pdf
  • png
  • Comparing JPG and PNG (and SVG)
  • LZW patent is expired
  • GIF can be used freely now

5
Overview of Class 10
  • Scalability problems of web applications
  • Growth of the Web
  • User communities
  • Website growth
  • Technologies for scaling up
  • Scaling users, admins to bigger sites
  • Examples
  • A lot is my opinions

6
Early Web Growth
7
Later Web Growth
8
Web Growth Interpretation
  • Early exponential, later linear
  • Data on web sites
  • Currently over 70,000,000 sites
  • Estimated 970,000,000 users
  • 14 users per site?
  • Still potential for more users
  • Esp. in Chinese, Spanish, French, Portuguese
  • Each user can frequent many sites
  • I access at least 50 sites daily

9
Finding Out About Websites
  • Email your friends
  • Usually interesting one-time links
  • Blogs similar to email, growing
  • Word-of-mouth
  • Showing somebody a useful/nice site
  • News sites (incl. community news)
  • Mentioning links
  • Commercial ads (mostly ignored)

10
Initial User Attention
  • New users coming slowly
  • For emailed, blogged links, word-of-mouth
  • Personal recommendation
  • Increasing value, staying potential
  • Sudden big group of new users
  • Reaction to news article, advertisement
  • Lower staying potential

11
User Retention
  • Will the user come back?
  • Initial impression
  • Did the user know what to expect?
  • Obvious site depth, new material
  • Usefulness
  • Also entertainment value
  • Consistent quality
  • User retention ? user community growth

12
Community Growth
  • Initially slow, exponential
  • Inter-user interaction tighter community
  • User material, comments
  • Adds value to the site
  • Even 3rd party side-channels
  • E.g. a forum about a site
  • Increasing noise/content ratio
  • Can cause community withdrawal
  • Must be moderated
  • Interaction first helps, then threatens

13
Site Growth
  • Community growth
  • Exponential traffic increase
  • Esp. with on-site user interaction
  • Old content growth archives
  • Storage and processing increase
  • Little traffic increase
  • New content volume growth
  • Usually slow
  • Careful that users can handle it

14
Site Evolution
  • Purpose of a site two views
  • Site-owners view
  • User community view may be different
  • Failure to adapt causes disappointment
  • Adaptation too
  • User majority vs. vocal users
  • Unwillingness to adapt causes competition
  • Optimal user experienceWow, a cool new
    feature, might be useful!

15
Site Infrastructure Growth
  • Must keep up with traffic
  • Must scale up
  • Optimal user experience It was OK before but
    it is better now
  • Failure to keep up causes disappointment

16
Funding Growth
  • Initial and ongoing funding necessary
  • Exponential growth gets expensive quickly
  • Income often from community
  • Estimating community size peak
  • Not investing too much
  • Debt now, income later
  • Bankruptcy common
  • Even for personal pages
  • Down-sizing unheard of

17
Identify the Performance Problem
  • Remote admin CPU usage 10HDD transfers 300kB/s
    (max 40MB)But everything seems slow
  • 1000 parallel users are OK But at 1100 responses
    can take a lot of time
  • Machine with 32 CPUsBenchmark 1000 transactions
    / minuteEach request is still visibly slow

18
Scaling Up
  • Users complain about slow response
  • Bottlenecks
  • Processor usage too much processing
  • Memory server should never have to swap
  • Slow disk processor has to wait
  • Slow, long network connection

19
Temporary Crises
  • Sudden interest spurred by article
  • Flash crowds, slashdot effect
  • Or orchestrated DDOS attack
  • Or just everybody waking up
  • How often will this happen?
  • Up to once in a while just let it pass
  • Often must be dealt with

20
Staying Up During Crisis
  • Immediate survival requires fast solution
  • Limit most expensive functionality
  • Generating up-to-date pages
  • Database search
  • Serving many different pages
  • Serving big (media) files, even images
  • Just a quick and dirty hack

21
Unnecessary Processing
  • Using ASP/SSI only for static includes
  • E.g. same header, footer etc.
  • Handle includes when publishing
  • Running same query too often
  • While data changes infrequently
  • Compute result on data change
  • Optimizing code, queries
  • SQL, programming optimization techniques

22
In-Kernel Server
  • TUX, kHTTPd
  • Only static files
  • Other requests passed to normal server
  • Removing context-switch overhead
  • Only one in-memory copy of file
  • Good servers can do direct HDD ? NIC
  • Using hardware to its limits
  • More than twice as fast as normal server SW

23
Distributing the Load
  • When you really need more power
  • Mirroring
  • Load balancing
  • Partitioning the application
  • Content-delivery networks

24
Mirroring
  • Multiple front-end servers
  • Back-end, if necessary, must be logically single
  • Database, original data source
  • Local or geographically distributed
  • Manual balancing
  • Please select from our mirrors list
  • Automatic load balancing
  • JavaScript random selection
  • Network infrastructure

25
Local Load Balancing
  • Server farm
  • DNS random selection
  • Users aware of the number of servers
  • TCP balancing proxies
  • One IP address
  • TCP connection routing
  • HTTP-aware routers
  • Aware of cookies, sessions

26
Partitioning the Application
  • Getting closer to customer
  • Partitioning by region
  • Different products in different regions
  • Optimized machines for various tasks
  • Example
  • Static content geographically distributed
  • Dynamic content on one powerful server

27
Global Load Balancing
  • Distributed mirrors
  • Selecting mirror closest to client
  • Not random
  • Initial HTTP redirect
  • Depending on clients location
  • Client aware of redirect, but simple
  • DNS techniques
  • Result depending on clients location
  • Akamai DNS

28
Content Distribution Networks
  • Massive global network presence
  • Mirroring in multiple locations
  • Routing around network outages
  • Outsourcing network infastructure
  • Fraction of the cost
  • Akamai a well-known example

29
Bandwidth Distribution
  • Serving big data (software, media)
  • Static mirroring
  • Peer-to-peer technologies
  • Streaming (audio, video webcasts)
  • Multicast

30
Peer-to-peer Mirroring
  • Bittorrent (etc.)
  • Splitting big file into small parts
  • Giving parts to network of nodes
  • They will trade what they have
  • Sharing the cost

31
Multicast
  • IP multicast
  • Single stream splitting at routers
  • Requires special clients
  • Combining with content distribution networks
  • Local servers are the special clients
  • Emulating IP multicast using a graph of servers

32
Scaling Users to Growing Sites
  • When volume of information on site grows
  • Daily volume, also interesting archive volume
  • Better navigation
  • Restructuring the site
  • Archives, search, browse
  • Limiting the volume
  • User rating and moderation of content
  • Providing excerpts of longer articles
  • Information feed (Atom, RSS), even multiple

33
Scaling Admins to Growing Sites
  • Adding editors, other roles, avoiding conflicts
  • Publishing system queue on news sites
  • Various rights, levels of editors
  • Content Management System
  • Contains all data
  • Presents them in consistent form
  • Simplifies changing navigation
  • Gives admin overview of changes
  • Has ugly URIs? http//example.org/?doc137

34
Examples
  • Google
  • CNN
  • eBay
  • Geocities
  • Slashdot
  • Mandriva

35
Google
  • Own servers in many places
  • Akamai DNS to route
  • Region-dependent backend data

36
CNN
  • Mostly static content
  • Content-distribution network

37
eBay
  • Static data distributed
  • pics.ebaystatic.com
  • Regional servers
  • Manual ebay.com, ebay.at
  • Dealing with user community distribution

38
Geocities
  • Local load balancing
  • How it may be done
  • Dynamic portioning by sets of users
  • Not to have too big servers
  • Simple mirroring of the whole thing not effective
  • Multiple mirrored servers for each portion
  • To provide reliability
  • For easier management

39
Slashdot
  • Local load balancing
  • Fall-back to static pages on big load

40
Mandriva Linux
  • Serving Linux CD, DVD images
  • Explicit, manual mirroring
  • Also bittorrent

41
(Re)Sources
  • http//www.zakon.org/robert/internet/timeline/
  • http//news.netcraft.com/archives/web_server_surve
    y.html
  • http//www.internetworldstats.com/stats.htm
  • http//www.clickz.com/stats/sectors/geographics/ar
    ticle.php/5911_151151
  • http//cia.gov/cia/publications/factbook/
  • http//www.akamai.com/en/html/services/overview.ht
    ml

42
Summary
  • Different kinds of heavy load
  • Different kinds of handling the load
  • Also social and financial scalability
  • Think ahead!

43
Next Class
  • Distributed computing using the Web
  • RESTful applications
  • Web Services
  • SOAP
  • WSDL
  • Other interesting specs

44
Next Classes
  • What would you like to hear?
  • Interesting topics, open discussion?
  • How about HTML?
  • Write me your ideas, can get there in January
  • Would you like to make presentations?
  • 4-5 people, 15 minutes each, extra bonus points
  • Submit abstracts to me

45
Homework
  • Name 3 different types of heavy website use
  • Real-life scenarios
  • On real websites
  • With short explanations
  • How it occurs, how to deal with it
  • Bonus point for originality

46
Write a Comment
User Comments (0)
About PowerShow.com