Bigtable: A Distributed Storage System for Structured Data - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Bigtable: A Distributed Storage System for Structured Data

Description:

Chubby. A highly available, persistent distributed lock service. five replicas with one master ... Chubby. tablet servers. 4/17/07. W. Eballar CS775. 11 ... – PowerPoint PPT presentation

Number of Views:596
Avg rating:3.0/5.0
Slides: 19
Provided by: winniee6
Learn more at: https://www.cs.odu.edu
Category:

less

Transcript and Presenter's Notes

Title: Bigtable: A Distributed Storage System for Structured Data


1
Bigtable A Distributed Storage System for
Structured Data
  • by Fay Chang, Jeffrey Dean, Sanjay Ghemawat,
    Wilson C. Hsieh, Deborah A. Wallach, Mike
    Burrows, Tushar Chandra, Andrew Fikes, and Robert
    E. Gruber
  • Google, Inc.

2
Bigtable Defined
  • Bigtable is a data model that is
  • Scalable
  • Distributed
  • Flexible
  • High-performance
  • Highly available

3
Bigtable as a database
  • Multidimensional sorted map
  • Indexed row, column, timestamp
  • eg. languageID
  • Data as uninterpreted byte strings
  • Bigtable API

Figure from paper
4
Bigtable API
  • Clients can
  • read rows and row subsets
  • write/delete (atomic single-row transactions)
  • run scripts in server space (Sawzall)
  • Create/delete tables/column families
  • Metadata for clusters, tables, column families

5
Google Infrastructure
  • Cluster management
  • Google File System (GFS)
  • Chubby
  • Google SSTable file format

6
Chubby
  • A highly available, persistent distributed lock
    service
  • five replicas with one master
  • live as long as a majority can communicate
  • Tasks
  • ensures that there only one active server master
  • discovers data servers finalize deaths
  • stores Bigtable schema information

7
Bigtable SSTable

8
Bigtable Components
master
  • Library linked into client
  • Tablet Server
  • manages tablets
  • handles client read/writes
  • Master Server
  • manages tablet servers
  • tablet assignment

M
1
2
n
tablet servers
9
Bigtable Tablet
Figure from paper

SSTable
10
Tablet Assignment
master

Chubby
M
TS1 TS2 TSn M
1
2
n
tablet servers
11
Tablet Serving
Figure from paper

12
Compactions
Figure from paper
  • Minor Compaction
  • Merging Compaction

13
Refinements
  • Locality groups
  • Compression
  • Caching
  • Bloom filters
  • Commit-log implementation
  • Tablet Recovery
  • Exploiting SSTable immutability

14
Google Earth

15
Google Earth
  • One table to preprocesses data
  • handles raw images
  • holds 70 terabytes of data
  • 1 row per geographic segment
  • Set of tables to serve client data
  • provides image location
  • small 500 GB
  • in memory columns to reduce latency

16
Lessons
  • Large distributed systems many types of failure
  • Delay adding new features
  • Need for system-level monitoring
  • Value of simple design
  • code and design clarity are of immense help in
    code maintenance and debugging."

17
Bigtable Development Notes
  • As of August 2006
  • 388 production Bigtable clusters
  • 24,500 combined total tablet servers
  • 2.5-year effort
  • 7 person years of design implementation
  • 100K lines of code
  • Used by 60 Google projects

18
References
  • http//www.usenix.org/events/osdi06/
  • http//www.usenix.org/events/osdi06/tech/chang.htm
    l
  • https//www2.blogger.com/comment.g?blogID36203843
    postID116234368988171763
Write a Comment
User Comments (0)
About PowerShow.com