JavaZone 2005 Talk - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

JavaZone 2005 Talk

Description:

Can partition large problems, so throughput beats peak performance. Stuff Breaks ... our case, output (word, '1') once per word in the document 'document1', 'to ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 50
Provided by: knutmagn
Category:
Tags: javazone | talk

less

Transcript and Presenter's Notes

Title: JavaZone 2005 Talk


1
Java _at_ Google JavaZone 2005
Knut Magne Risvik Google Inc. September 14, 2005
2
Presentation Outline
  • Background Googles mission and computing
    platform.
  • GFS and MapReduce Ebony and Ivory of our
    Infrastructure
  • Java for computing Coupling infrastructure and
    Java
  • Java in Google products apps and middle-tiers
  • The Java expertise at Google We host Java
    leadership.
  • Giving it back Google contributions to Java
  • Closing notes and QA Swags for good questions.

3
Googles Mission
To organize the worlds information and make it
universally accessible and useful
4
Explosive Computational Requirements
  • Every Google service sees continuing growth in
    computational needs
  • More queries More users, happier users
  • More data Bigger web, mailbox, blog, etc.
  • Better results Find the right information, and
    find it faster

more data
more queries
better results
5
A Simple Challenge For Our Computing Platform
  • Create worlds largest computing infrastructure
  • Make sure we can afford it
  • Need to drive efficiency of the computing
    infrastructure to unprecedented levels

6
Many Interesting Challenges
  • Server design and architecture
  • Power efficiency
  • System software
  • Large scale networking
  • Performance tuning and optimization
  • System management and repairs automation

7
Design Philosophy
  • Single-machine performance does not matter
  • The problems we are interested in are too large
    for any single system
  • Can partition large problems, so throughput beats
    peak performance
  • Stuff Breaks
  • If you have one server, it may stay up three
    years (1,000 days)
  • If you have 1,000 servers, expect to lose one a
    day
  • Ultra-reliable hardware makes programmers lazy
  • A reliable platform will still fail software
    still needs to be fault-tolerant
  • Fault-tolerant software beats fault-tolerant
    hardware

8
Why Use Commodity PCs?
  • Single high-end 8-way Intel server
  • IBM eserver xSeries 440
  • 8 2-GHz Xeon, 64 GB RAM, 8 TB of disk
  • 758,000
  • Commodity machines
  • Rack of 88 machines
  • 176 2-GHz Xeons, 176 GB RAM, 7 TB of disk
  • 278,000
  • 1/3X price, 22X CPU, 3X RAM, 1X disk
  • Sources racksaver.com, TPC-C performance
    results, both from late 2002

9
When Ultra-reliable Machines Wont Help
10
Take-home lesson Murphy was right
11
google.stanford.edu (circa 1997)
12
Lego Disk Case
13
google.com (1999)
14
Google Data Center (circa 2000)
15
google.com (new data center 2001)
16
google.com (3 days later)
17
When Servers Sleep (2004)
18
Google Query Serving Infrastructure
19
Reliable Building Blocks
  • Need to store data reliably
  • Need to run jobs on pools of machines
  • Need to make it easy to apply lots of
    computational resources to problems
  • In-house solutions
  • Storage Google File System (GFS)
  • Job scheduling Global Work Queue (GWQ)
  • MapReduce simplified large-scale data processing

20
Google File System - GFS
  • Master manages metadata
  • Data transfers happen directly between
    clients/chunkservers
  • Files broken into chunks (typically 64 MB)
  • Chunks triplicated across three machines for
    safety

21
GoogleFile API access to GFS
  • GoogleFile. Public API with two roles
  • Creational class. Static methods to obtain
    InputStream, OutputStream and GoogleChannel on
    top of a Google file.
  • File manipulation. A subset of the methods
    provided by the java.io.File class.
  • GoogleInputStream. Implements the read method.
  • GoogleOutputStream. Extends java.io.OutputStream,
    write method.
  • GoogleChannel. This is a public class. It
    implements the ByteChannel interface and a subset
    of the methods in the FileChannel class. This
    class provides random access.
  • GoogleFile.Stats.
  • The JNI Layer is implemented by the class
    FileImpl and set of SWIG JNI wrappers generated
    during the build process.

22
GFS Usage at Google
  • 30 Clusters
  • Clusters as large as 2000 chunkservers
  • Petabyte-sized filesystems
  • 2000 MB/s sustained read/write load
  • All in the presence of HW failures
  • More information can be found
  • The Google File System Sanjay Ghemawat, Howard
    Gobioff, and Shun-Tak Leung 19th ACM Symposium
    on Operating Systems Principles
  • (http//labs.google.com/papers/gfs.html)

23
MapReduce Large Scale Data Processing
  • Many tasks Process lots of data to produce other
    data
  • Want to use hundreds or thousands of CPUs, and it
    has to be easy
  • MapReduce provides, for programs following a
    particular programming model
  • Automatic parallelization and distribution
  • Fault-tolerance
  • I/O scheduling
  • Status and monitoring

24
Example Word Frequencies in Web Pages
  • A typical exercise for a new engineer in his or
    her first week
  • Have files with one document per record
  • Specify a map function that takes a key/value
    pairkey document namevalue document text
  • Output of map function is (potentially many)
    key/value pairs.In our case, output (word, 1)
    once per word in the document

document1, to be or not to be
to, 1 be, 1 or, 1
25
Example continued word frequencies in web pages
  • MapReduce library gathers together all pairs with
    the same key
  • The reduce function combines the values for a
    keyIn our case, compute the sum
  • Output of reduce (usually 0 or 1 value) paired
    with key and saved

be, 2 not, 1 or, 1 to, 2
26
Example Pseudo-code
  • map(String input_key, String input_value) //
    input_key document name // input_value
    document contents for each word w in
    input_values EmitIntermediate(w, "1")
  • Reduce(String key, Iterator intermediate_values)
    // key a word, same for input and output //
    intermediate_values a list of counts int
    result 0 for each v in intermediate_values
    result ParseInt(v) Emit(AsString(result))
  • Total 80 lines of code

27
Typical Google Cluster
  • 100s/1000s of 2-CPU x86 machines, 2-4 GB of
    memory
  • Limited bisection bandwidth
  • Storage local IDE disks and Google File System
    (GFS)
  • GFS running on the same machines provides
    reliable, replicated storage of input and output
    data
  • Job scheduling system jobs made up of tasks,
    scheduler assigns tasks to machines

28
Execution
GFS Google File System
Map task 1
Map task 2
Map task 3
map
map
map
k1v
k2v
k1v
k3v
k1v
k4v
Shuffle and Sort
Reduce task 1
Reduce task 2
k1v,v,v
k3,v
k2v
k4,v
reduce
reduce
GFS Google File System
29
Optimizations
  • Shuffle stage is pipelined with mapping
  • Many more tasks than machines, for load balancing
  • Locality map tasks scheduled near the data they
    read
  • Backup copies of map reduce tasks (avoids
    stragglers)
  • Compress intermediate data
  • Re-execute tasks on machine failure

30
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
31
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
32
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
33
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
34
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
35
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
36
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
37
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
38
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
39
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
40
MapReduce status MR_Indexer-beta6-large-2003_10_2
8_00_03
41
Results
  • Using 1800 machines
  • MR_Grep scanned 1 terabyte in 100 seconds
  • MR_Sort sorted 1 terabyte of 100 byte records in
    14 minutes
  • Rewrote Google's production indexing system
  • a sequence of 7, 10, 14, 17, 21, 24 MapReductions
  • simpler
  • more robust
  • faster
  • more scalable

42
Usage in March 2005
Number of jobs 72,229
Average completion time 934 secs
Machine days used 358,528 days 1 millennium
Input data read 12,571 TB
Intermediate data 2,756 TB
Output data written 941 TB
Average worker machines 232
Average worker deaths per job 1.9
Average map tasks per job 3097
Average reduce tasks per job 144
Unique map implementations 309
Unique reduce implementations 235
Unique map/reduce combinations 411
43
Widely applicable at Google
  • Implemented as a C library linked to user
    programs
  • Java JNI interface similar to GFS API.
  • Can read and write many different data types
  • Example uses

web access log stats web link-graph
reversal inverted index construction statistical
machine translation
distributed grepdistributed sortterm-vector per
hostdocument clusteringmachine learning...
44
Conclusion
  • MapReduce has proven to be a useful abstraction
  • Greatly simplifies large-scale computations at
    Google
  • Fun to use focus on problem, let library deal
    with messy details
  • MapReduce Simplified Data Processing on Large Clu
    sters Jeffrey Dean and Sanjay GhemawatOSDI'04
    Sixth Symposium on Operating System Design and
    Implementation
  • (Search Google for MapReduce)

45
Java in Google Applications
46
Java expertise _at_ Google
  • Joshua Bloch - Collections Framework, Java 5.0
    language enhancements, java.math, Author of
    "Effective Java," Coauthor of "Java Puzzlers."
  • Neal Gafter - Lead developer of javac,
    implementor of Java 5.0 language enhancements,
    "Coauthor of "Java Puzzlers."
  • Robert Griesmer - Architect and technical lead of
    the HotSpot JVM.
  • Doug Kramer - Javadoc architect, Java platform
    documentation lead.
  • Tim Lindholm - Original member of the Java
    project, key contributor to the Java programming
    language, implementor of the classic JVM,
    coauthor of "The Java Viutual Machine
    Specification."
  • Michael "madbot" McCloskey - Designer and
    implementer of java.util.regexp.
  • Srdjan Mitrovic - Co-implementor of the HotSpot
    JVM.
  • David Stoutamire - Technical lead for Java
    performance, designer and implementer of parallel
    garbage collection.
  • Frank Yellin - Original member of the Java
    project, Co-implementor of classic JVM, KVM and
    CLDC,  Coauthor of "The Java Viutual Machine
    specification."

47
Giving it back JCP Expert Groups
  • Executive Committe for J2SE/J2EE
  • JSR 166X Concurrency Utilities (continuing)
    http//www.jcp.org/en/jsr/detail?id166
  • JSR 199 Java Compiler API http//www.jcp.org/en/j
    sr/detail?id199
  • JSR 220 Enterprise JavaBeans 3.
    http//www.jcp.org/en/jsr/detail?id220
  • JSR 250 Common Annotations for the Java Platform
    http//www.jcp.org/en/jsr/detail?id250
  • JSR 260 Javadoc Tag Technology Update
    http//www.jcp.org/en/jsr/detail?id260
  • JSR 269 Pluggable Annotation Processing API
    http//www.jcp.org/en/jsr/detail?id269
  • JSR 270 J2SE 6.0 ("Mustang") Release Contents
    http//www.jcp.org/en/jsr/detail?id270Google
    representative gafter
  • JSR 273 Design-Time API for JavaBeans JBDT
    http//www.jcp.org/en/jsr/detail?id273
  • JSR 274 The BeanShell Scripting Language
    http//www.jcp.org/en/jsr/detail?id274
  • JSR 277 Java Module System http//www.jcp.org/en/
    jsr/detail?id277

48
Closing Notes
  • Google Computing infrastructure
  • Java is becoming a first class citizen at Google
  • Essential native interfaces being built
  • API design extremely important at our scale, the
    Java expertise is driving general API work
  • Google brings high-scale industrial experience
    into JCP expert groups.

49
QA
Knut Magne Risvik Google Inc. September 14, 2005
Write a Comment
User Comments (0)
About PowerShow.com