Fast Data at Massive Scale - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Data at Massive Scale

Description:

Basic tools are parallelism and clustering. Clustering is a latency/throughput tradeoff ... Your best tool is parallelism. Look at your data. Build tools to ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 21
Provided by: rober869
Category:

less

Transcript and Presenter's Notes

Title: Fast Data at Massive Scale


1
(No Transcript)
2
Fast Data at Massive Scale
  • Lessons Learned at Facebook
  • Bobby Johnson

3
Me
  • Director of Engineering
  • Scaling and Performance
  • Site Security
  • Site Reliability
  • Distributed Systems
  • Development tools
  • Customer Service Tools
  • Took Facebook from 7M users to 120M.

4
(No Transcript)
5
Architecture
Load Balancer (assigns a web server)
Other services Search, Feed, etc (ignore for now)
Web Server (PHP assembles data)
Memcache (fast)
Database (slow, persistent)
6
  • 1/2 the time is in PHP
  • 1/4 is in memcache
  • 1/8 is in database

7
One year ago, almost half the time was memcache
8
Network Incast
memcache
memcache
memcache
memcache
Switch
Many Small Get Requests
PHP Client
9
Network Incast
memcache
memcache
memcache
memcache
Switch
Many big data packets
PHP Client
10
Clustering
memcache
10 objects
PHP Client
1 round trip for 10 objects
11
Clustering
memcache
memcache
5 objects
5 objects
PHP Client
  • 2 round trips total
  • 1 round trip per server
  • longest request is 5

12
Clustering
memcache
memcache
memcache
3 objects
4 objects
3 objects
PHP Client
  • 3 round trips total
  • 1 round trip per server
  • longest request is 4

13
Clustering
  • If objects are small, round trips dominate so you
    want objects clustered
  • If objects are large, transfer time dominates so
    you want objects distributed
  • In a web application you will almost always be
    dealing with small objects

14
Caching
  • Basic tools are parallelism and clustering
  • Clustering is a latency/throughput tradeoff
  • Application code must be aware
  • Networking is a burst problem
  • Dropped packets kill you
  • TCP quick ack

15
PHP CPU
16
Application Improvements
17
know what your libraries do
  • results get_search_results( needle )
  • foreach ( results as result )
  • if ( is_pending_friend( resultid ) )
  • // well change the links based on this
  • resultpending true

18
know what your libraries do
  • function is_pending_friend( id )
  • // this is short-lived, so dont cache
  • expensive_db_query( id )

19
Databases
  • Tend to be slower than lighter weight
    alternatives, so avoid using them
  • If you do use them partition them right from the
    start
  • If a query is _really_ slow, like a few seconds
    or a few minutes, you probably have a bug where
    youre scanning a table
  • The db should have a command to tell you what
    index its using for a query, and how many rows
    its examining

20
General Lessons
  • Your best tool is parallelism
  • Look at your data
  • Build tools to look at your data
  • Dont make assumptions about what components are
    doing
  • Algorithmic and system improvements are almost
    always better than micro-optimization
Write a Comment
User Comments (0)
About PowerShow.com