Advanced data management - PowerPoint PPT Presentation

1 / 62

About This Presentation

Title:

Advanced data management

Description:

Make distributed system as easy to use and manage as a ... DEC or SGI or Sequent 16x nodes. Shared Disk Cluster. an array of nodes. all shared common disks ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 63

Provided by: jiahe

Category:

more less

Transcript and Presenter's Notes

Title: Advanced data management

1
Advanced data management

Jiaheng Lu
Department of Computer Science
Renmin University of China
www.jiahenglu.net

2
Cloud computing
3
(No Transcript)
4
Distributed system
5
OutlineConcepts and Terminology

What is Distributed
Distributed data objects
Distributed execution
Three tier architectures
Transaction concepts

6
Whats a Distributed System?

Centralized
everything in one place
stand-alone PC or Mainframe
Distributed
some parts remote
distributed users
distributed execution
distributed data

7
Transparency in Distributed Systems

Make distributed system as easy to use and manage
as a centralized system
Give a Single-System Image
Location transparency
hide fact that object is remote
hide fact that object has moved
hide fact that object is partitioned or
replicated
Name doesnt change if object is replicated,
partitioned or moved.

8
Naming- The basics

Objects have
Globally Unique Identifier (GUIDs)
location(s) address(es)
name(s)
addresses can change
objects can have many names
Names are context dependent
(Jim _at_ KGB Jim _at_ CIA)
Many naming systems
UNC \\node\device\dir\dir\dir\object
Internet http//node.domain.root/dir/dir/dir/obje
ct
LDAP ldap//ldap.domain.root/oorg,cUS,cndir

guid
James
9
Name Serversin Distributed Systems
North

Name servers translate names context to
address ( GUID)
Name servers are partitioned (subtrees of name
space)
Name servers replicate root of name tree
Name servers form a hierarchy
Distributed data from hell
high read traffic
high reliability availability
autonomy

root
Northern names
South
root
Southern names
10
Autonomy in Distributed Systems

Owner of site (or node, or application, or
database)Wants to control it
If my part is working , must be able to access
manage it (reorganize, upgrade, add user,)
Autonomy is
Essential
Difficult to implement.
Conflicts with global consistency
examples naming, authentication, admin

11
Security The Basics

Authentication server subject Authenticator gt
(Yes token) No
Security matrix
who can do what to whom
Access control list is column of matrix
who is authenticated ID
In a distributed system, who and what and
whom are distributed objects

12
Security in Distributed Systems

Security domain nodes with a shared security
server.
Security domains can have trust relationships
A trusts B A believes B when it says this is
Jim_at_B
Security domains form a hierarchy.
Delegation passing authority to a server when A
asks B to do something (e.g. print a file, read a
database)B may need As authority
Autonomy requires
each node is an authenticator
each node does own security checks
Internet Today
no trust among domains (fire walls, many
passwords)
trust based on digital signatures

13
Clusters The Ideal Distributed System.

Cluster is distributed system BUT single
location
manager
security policy
relatively homogeneous
communications is
high bandwidth
low latency
low error rate

Clusters use distributed system techniques for
load distribution
storage
execution
growth
fault tolerance

14
Cluster Shared What?

Shared Memory Multiprocessor
Multiple processors, one memory
all devices are local
DEC or SGI or Sequent 16x nodes
Shared Disk Cluster
an array of nodes
all shared common disks
VAXcluster Oracle
Shared Nothing Cluster
each device local to a node
ownership may change
Tandem, SP2, Wolfpack

15
OutlineConcepts and Terminology

Why Distribute
Distributed data objects
Partitioned
Replicated
Distributed execution
Three tier architectures
Transaction concepts

16
Partitioned Data Break file into disjoint groups
Orders

Exploit data access locality
Put data near consumer
Less network traffic
Better response time
Better availability
Owner controls data autonomy
Spread Load
data or traffic may exceed single store

N.A. S.A. Europe Asia
17
How to Partition Data?

How to Partition
by attribute or
random or
by source or
by use
Problem to find it must have
Directory (replicated) or
Algorithm
Encourages attribute-based partitioning

N.A. S.A. Europe Asia
18
Naming- The basics

Objects have
Globally Unique Identifier (GUIDs)
location(s) address(es)
name(s)
addresses can change
objects can have many names
Names are context dependent
(Jim _at_ KGB Jim _at_ CIA)
Many naming systems
UNC \\node\device\dir\dir\dir\object
Internet http//node.domain.root/dir/dir/dir/obje
ct
LDAP ldap//ldap.domain.root/oorg,cUS,cndir

guid
James
19
Autonomy in Distributed Systems

Owner of site (or node, or application, or
database)Wants to control it
If my part is working , must be able to access
manage it (reorganize, upgrade, add user,)
Autonomy is
Essential
Difficult to implement.
Conflicts with global consistency
examples naming, authentication, admin

20
Security The Basics

Authentication server subject Authenticator gt
(Yes token) No
Security matrix
who can do what to whom
Access control list is column of matrix
who is authenticated ID
In a distributed system, who and what and
whom are distributed objects

21
Security in Distributed Systems

Security domain nodes with a shared security
server.
Security domains can have trust relationships
A trusts B A believes B when it says this is
Jim_at_B
Security domains form a hierarchy.
Delegation passing authority to a server when A
asks B to do something (e.g. print a file, read a
database)B may need As authority
Autonomy requires
each node is an authenticator
each node does own security checks
Internet Today
no trust among domains (fire walls, many
passwords)
trust based on digital signatures

22
Clusters The Ideal Distributed System.

Cluster is distributed system BUT single
location
manager
security policy
relatively homogeneous
communications is
high bandwidth
low latency
low error rate

Clusters use distributed system techniques for
load distribution
storage
execution
growth
fault tolerance

23
Cluster Shared What?

Shared Memory Multiprocessor
Multiple processors, one memory
all devices are local
DEC or SGI or Sequent 16x nodes
Shared Disk Cluster
an array of nodes
all shared common disks
VAXcluster Oracle
Shared Nothing Cluster
each device local to a node
ownership may change
Tandem, SP2, Wolfpack

24
Partitioned Data Break file into disjoint groups
Orders

Exploit data access locality
Put data near consumer
Less network traffic
Better response time
Better availability
Owner controls data autonomy
Spread Load
data or traffic may exceed single store

N.A. S.A. Europe Asia
25
How to Partition Data?

How to Partition
by attribute or
random or
by source or
by use
Problem to find it must have
Directory (replicated) or
Algorithm
Encourages attribute-based partitioning

N.A. S.A. Europe Asia
26
Replicated DataPlace fragment at many sites

Pros
Improves availability
Disconnected (mobile) operation
Distributes load
Reads are cheaper
Cons
N times more updates
N times more storage
Placement strategies
Dynamic cache on demand
Static place specific

Catalog
27
Updating Replicated Data

When a replica is updated, how do changes
propagate?
Master copy, many slave copies (SQL Server)
always know the correct value (master)
change propagation can be
transactional
as soon as possible
periodic
on demand
Symmetric, and anytime (Access)
allows mobile (disconnected) updates
updates propagated ASAP, periodic, on demand
non-serializable
colliding updates must be reconciled.
hard to know real value

28
OutlineConcepts and Terminology

Why Distribute
Distributed data objects
Partitioned
Replicated
Distributed execution
remote procedure call
queues
Three tier architectures
Transaction concepts

29
Distributed ExecutionThreads and Messages

Thread is Execution unit(software analog of
cpumemory)
Threads execute at a node
Threads communicate via
Shared memory (local)
Messages (local and remote)

messages
30
Peer-to-Peer or Client-Server

Peer-to-Peer is symmetric
Either side can send
Client-server
client sends requests
server sends responses
simple subset of peer-to-peer

request
response
31
Connection-less or Connected

Connected (sessions)
open - request/reply - close
client authenticated once
Messages arrive in order
Can send many replies (e.g. FTP)
Server has client context (context sensitive)
e.g. Winsock and ODBC
HTTP adding connections

Connection-less
request contains
client id
client context
work request
client authenticated on each message
only a single response message
e.g. HTTP, NFS v1

32
Remote Procedure Call The key to transparency

Object may be local or remote
Methods on object work wherever it is.
Local invocation

33
Remote Procedure Call The key to transparency

Remote invocation

y pObj-gtf(x)
f()
return val
y val
34
Object Request Broker (ORB)

Registers Servers
Manages pools of servers
Connects clients to servers
Does Naming, request-level authorization,
Provides transaction coordination (new feature)
Old names
Transaction Processing Monitor,
Web server,
NetWare

Object-Request Broker
35
OutlineConcepts and Terminology

Why Distributed
Distributed data objects
Distributed execution
remote procedure call
queues
Three tier architectures
what
why
Transaction concepts

36
Client/Server Interactions All can be done with
RPC
C
S

Request-Response response may be many messages
Conversational server keeps client context
Dispatcherthree-tier complex operation at
server
Queuedde-couples client from serverallows
disconnected operation

C
S
S
S
C
S
S
C
S
37
Queued Request/Response

Time-decouples client and server
Three Transactions
Almost real time, ASAP processing
Communicate at each others convenienceAllows
mobile (disconnected) operation
Disk queues survive client server failures

Client
Server
38
Why Queued Processing?

Prioritize requestsambulance dispatcher favors
high-priority calls
Manage Workflows
Deferred processing in mobile apps

39
Google Cloud computing techniques
40
The Google File System
41
The Google File System (GFS)

A scalable distributed file system for large
distributed data intensive applications
Multiple GFS clusters are currently deployed.
The largest ones have
1000 storage nodes
300 TeraBytes of disk storage
heavily accessed by hundreds of clients on
distinct machines

42
Introduction

Shares many same goals as previous distributed
file systems
performance, scalability, reliability, etc
GFS design has been driven by four key
observation of Google application workloads and
technological environment

43
Intro Observations 1

1. Component failures are the norm
constant monitoring, error detection, fault
tolerance and automatic recovery are integral to
the system
2. Huge files (by traditional standards)
Multi GB files are common
I/O operations and blocks sizes must be revisited

44
Intro Observations 2

3. Most files are mutated by appending new data
This is the focus of performance optimization and
atomicity guarantees
4. Co-designing the applications and APIs
benefits overall system by increasing flexibility

45
The Design

Cluster consists of a single master and multiple
chunkservers and is accessed by multiple clients

46
The Master

Maintains all file system metadata.
names space, access control info, file to chunk
mappings, chunk (including replicas) location,
etc.
Periodically communicates with chunkservers in
HeartBeat messages to give instructions and check
state

47
The Master

Helps make sophisticated chunk placement and
replication decision, using global knowledge
For reading and writing, client contacts Master
to get chunk locations, then deals directly with
chunkservers
Master is not a bottleneck for reads/writes

48
Chunkservers

Files are broken into chunks. Each chunk has a
immutable globally unique 64-bit chunk-handle.
handle is assigned by the master at chunk
creation
Chunk size is 64 MB
Each chunk is replicated on 3 (default) servers

49
Clients

Linked to apps using the file system API.
Communicates with master and chunkservers for
reading and writing
Master interactions only for metadata
Chunkserver interactions for data
Only caches metadata information
Data is too large to cache.

50
Chunk Locations

Master does not keep a persistent record of
locations of chunks and replicas.
Polls chunkservers at startup, and when new
chunkservers join/leave for this.
Stays up to date by controlling placement of new
chunks and through HeartBeat messages (when
monitoring chunkservers)

51
Operation Log

Record of all critical metadata changes
Stored on Master and replicated on other machines
Defines order of concurrent operations
Changes not visible to clients until they
propigate to all chunk replicas
Also used to recover the file system state

52
System Interactions Leases and Mutation Order

Leases maintain a mutation order across all chunk
replicas
Master grants a lease to a replica, called the
primary
The primary choses the serial mutation order, and
all replicas follow this order
Minimizes management overhead for the Master

53
System Interactions Leases and Mutation Order
54
Atomic Record Append

Client specifies the data to write GFS chooses
and returns the offset it writes to and appends
the data to each replica at least once
Heavily used by Googles Distributed
applications.
No need for a distributed lock manager
GFS choses the offset, not the client

55
Atomic Record Append How?

Follows similar control flow as mutations
Primary tells secondary replicas to append at the
same offset as the primary
If a replica append fails at any replica, it is
retried by the client.
So replicas of the same chunk may contain
different data, including duplicates, whole or in
part, of the same record

56
Atomic Record Append How?

GFS does not guarantee that all replicas are
bitwise identical.
Only guarantees that data is written at least
once in an atomic unit.
Data must be written at the same offset for all
chunk replicas for success to be reported.

57
Replica Placement

Placement policy maximizes data reliability and
network bandwidth
Spread replicas not only across machines, but
also across racks
Guards against machine failures, and racks
getting damaged or going offline
Reads for a chunk exploit aggregate bandwidth of
multiple racks
Writes have to flow through multiple racks
tradeoff made willingly

58
Chunk creation

created and placed by master.
placed on chunkservers with below average disk
utilization
limit number of recent creations on a
chunkserver
with creations comes lots of writes

59
Detecting Stale Replicas

Master has a chunk version number to distinguish
up to date and stale replicas
Increase version when granting a lease
If a replica is not available, its version is not
increased
master detects stale replicas when a chunkservers
report chunks and versions
Remove stale replicas during garbage collection

60
Garbage collection

When a client deletes a file, master logs it like
other changes and changes filename to a hidden
file.
Master removes files hidden for longer than 3
days when scanning file system name space
metadata is also erased
During HeartBeat messages, the chunkservers send
the master a subset of its chunks, and the
master tells it which files have no metadata.
Chunkserver removes these files on its own

61
Fault ToleranceHigh Availability

Fast recovery
Master and chunkservers can restart in seconds
Chunk Replication
Master Replication
shadow masters provide read-only access when
primary master is down
mutations not done until recorded on all master
replicas

62
Fault ToleranceData Integrity

Chunkservers use checksums to detect corrupt data
Since replicas are not bitwise identical,
chunkservers maintain their own checksums
For reads, chunkserver verifies checksum before
sending chunk
Update checksums during writes

63
QA Thanks

Write a Comment

User Comments (0)