TDD:%20Topics%20in%20Distributed%20Databases

About This Presentation

Title:

TDD:%20Topics%20in%20Distributed%20Databases

Description:

Distributed databases ... Mediator. wrapper. wrapper. wrapper. DB. DB ... Storage: database DB is horizontally fragmented, based on branch-name: NYC, ... – PowerPoint PPT presentation

Number of Views:133

Avg rating:3.0/5.0

Slides: 48

Provided by: homepage7

Category:

more less

Transcript and Presenter's Notes

Title: TDD:%20Topics%20in%20Distributed%20Databases

1
TDD Topics in Distributed Databases

Distributed Databases
Distributed database
Distributed query processing joins and non-join
queries
Updating distributed data

2
Distributed databases

Data is stored in several sites (nodes),
geographically or administratively across
multiple systems
Each site is running an independent DBMS
What do we get?
Increased availability and reliability
Increased parallelism

Data centers

Complications
Catalog management distributed data independence
and distributed transaction atomicity
Query processing and optimization replication
and fragmentation
Increased update costs, concurrency control
locking, deadlock, commit protocol, recovery

3
Architectures
3
4
Homogeneous vs heterogeneous systems

Homogeneous identical DBMS, aware of each other,
cooperate
Heterogeneous different schemas/DBMS
Multidatabase system uniform logical view of the
data -- common schema
difficult, yet common system is typically
gradually developed

5
Architectures

Client-server client (user interface, front
end), server (DBMS)
Client ships query to a server (query shipping)
All query processing at server

client
client
client-server
server
server
server
6
Architectures

Collaborating server query can span several
servers

Middleware
Coordinator queries and transactions across
servers

7
Warehouse architecture
client applications
data warehouse
integrator
monitor/wrapper
monitor/wrapper
monitor/wrapper
XML
RDB
OODB
8
Monitor/wrapper

A monitor/wrapper for each data source
translation translate an information source into
a common integrating model
change detection detect changes to the
underlying data source and propagate the changes
to the integrator
active databases (triggers condition, event,
action)
logged sources inspecting logs
periodic polling, periodic dumps/snapshots
Data cleaning
detect erroneous/incomplete information to ensure
validity
back flushing return cleaned data to the source

9
Integrator

Receive change notifications from the
wrapper/monitors and reflect the changes in the
data warehouse.
Typically a rule-based engine
merging information (data fusion)
handling references
Data cleaning
removing redundancies and inconsistencies
inserting default values
blocking sources

10
When to use data warehouse

Problem potential inconsistencies with the
sources.
Commonly used for relatively static data
when clients require specific, predicable portion
of the available information
when clients require high query performance but
not necessarily the most recent state of the
information
when clients want summarized/aggregated
information such as historical information
Examples
scientific data
historical enterprise data
caching frequently requested information

11
Data warehouse vs. materialized views

materialized view is over an individual
structured database, while a warehouse is over a
collection of heterogeneous, distributed data
sources
materialized view typically has the same form as
in the underlying database, while a warehouse
stores highly integrated and summarized data
materialized view modifications occur within the
same transaction updating its underlying
database, while a warehouse may have to deal with
independent sources
sources simply report changes
sources may not have locking capability
integrator is loosely coupled with the sources

12
Mediated system architecture

Virtual approach data is not stored in the
middle tier

client applications
Mediator
wrapper
wrapper
wrapper
XML
RDB
OODB
13
Lazy vs. eager approaches

Lazy approach (mediated systems)
accept a query, determine the appropriate set of
data sources, generate sub-queries for each data
source
obtain results from the data sources, perform
translation, filtering and composing, and return
the final answer
Eager approach (warehouses)
information from each source that may be of
interest is extracted in advance, translated,
filtered, merged with relevant sources, and
stored in a repository
query is evaluated directly against the
repository, without accessing the original
information sources

14
Data warehouse vs. mediated systems

Efficiency
response time at the warehouse, queries can be
answered efficiently without accessing original
data sources. Advantageous when data sources are
slow, expensive or periodically unavailable, or
when translation, filtering and merging require
significant processing
space warehousing consumes extra storage space
Extensibility warehouse
consistency with the sources warehouse data may
become out of date
applicability
warehouses for high query performance and static
data
mediated systems for information that changes
rapidly

15
Distributed data storage -- replication

Fragments of a relation are replicated at several
sites R is fragmented into R1, R2, R3
Why?
Increase availability/reliability if one site
fails
Increase parallelism faster query evaluation
Increase overhead on updates consistency
Dynamic issues synchronous vs. asynchronous

Primary copy e.g.,
Bank an account at the site in which it was
opened
Airline an flight at the site from which it
originates

R1
R2
R3
R2
Site 2
Site 1
16
Distributed data storage -- fragmentation

A relation R may be fragmented or partitioned
Horizontal
Vertical lossless join
Question how to reconstruct the original R?

fragmentation determined by local ownership
NYC
EDI
17
Transparency, independence

Distributed data transparency (independence)
location (name) transparency
fragmentation
replication transparency (catalog management)
Transaction atomicity across several sites
All changes persist if the transaction commits
None persists if the transaction aborts

Data independency and transaction atomicity are
not supported currently the users have to be
aware of where data is located
18
Distributed query processing joins and non-join
queries
18
19
Distributed query processing and optimization

New challenges
Data transmission costs (network)
parallelism
Choice of replicas lowest transmission cost
Fragmentation to reconstruct the original
relation
Query decomposition query rewriting/unfolding
depending on how data is fragmented/replicated

decomposition
sub-query
sub-query
sub-query
20
Non-join queries

Schema account(acc-num, branch-name, balance)
Query Q select from account where
branch-name EDI
Storage database DB is horizontally fragmented,
based on branch-name NYC, Philly, EDI, denoted
by DB1, , DBn
DB DB1 ? ? DBn
Processing
Rewrite Q into Q(DB1) ? ? Q(DBn)
Q(DBi) is empty if branch-name ltgt EDI
Q(DB1), where DB1 is the EDI branch
Q(DB1) Q(DB1)
Q select from account

21
Simple join processing data shipping

R1 R2 R3
where Ri is at site i, S0 is the site where the
query is issued
Option 1 send copies of R1, R2, R3 to S0 and
compute the joins at S0
Option 2
Send R1 to S2, compute temp1 ? R1 R2 at S2
Send temp1 to S3, compute temp2 ? R3 temp1
at S3
Send temp2 to S0
Decision on strategies
The volume of data shipped
The cost of transmitting a block
Relative speed of processing at each site

22
Semijoin reduce communication costs

R1 R2, where Ri is at site i,
Compute temp1 ? ? (R1 ? R2) R1 at site 1
projection on join attributes only assume R1
smaller
Ship temp1 to site 2, instead of the entire
relation of R1
Compute temp2 ? R2 temp1 at S2
Ship temp2 to site 1
compute result ? R1 temp2 at S1
Effectiveness
If sufficiently small fraction of the relation of
R2 contributes to the join
Additional computation cost may be higher than
the savings in communication costs

23
Bloomjoin reduce communication costs

R1 R2, where Ri is at site i,
Compute a bit vector of size k by hashing ? (R1
? R2) R1
bit set to 1 if some tuple hashes to it
smaller than the projection (constant size)
Ship the vector to site 2, instead of the entire
relation of R1
Hash R2 using the same hashing function
Ship to site 1 only those tuples of R2 that also
hash to 1, temp1
compute result ? R1 temp1 at S1
Effectiveness
Less communication costs bit-vector vs
projection
The size of the reduction by hashing may be
larger than that of projection
Question set difference?

24
exploring parallelism

Consider R1 R2 R3 R4, where Ri is
at site i
temp1 ? R1 R2, by shipping R1 to site 2
temp2 ? R3 R4, by shipping R3 to site 4
result ? temp1 temp2 -- pipelining
Question R1 R2 R3, using
Pipelined parallelism
Semi-join
both

parallel
25
Distributed query optimization

The volume of data shipped
The cost of transmitting a block
Relative speed of processing at each site
Site selection replication
Two-step optimization
At compile time, generate a query plan along
the same lines as centralized DBMS
Every time before the query is executed,
transform the plan and carry out site selection
(determine where the operators are to be
executed) dynamic, just site selection

26
Practice validation of functional dependencies

A functional dependency (FD) defined on schema R
X ? Y
For any instance D of R, D satisfies the FD if
for any pair of tuples t and t, if tX tX,
then tY tY
Violations of the FD in D
t there exists t in D, such that tX
tX, but tY ? tY
Now suppose that D is fragmented and distributed
Develop an algorithm that given fragmented and
distributed D and an FD, computes all the
violations of the FD in D
semijoin
bloomjoin
Questions what can we do if we are given a set
of FDs to validate?

horizontally or vertically
Minimize data shipment
27
Practice relational operators

Consider a relation R that is vertically
partitioned and distributed across n sites
Develop an algorithm to implement
?A R,
?C R
by using
semijoin
bloomjoin
Column-oriented DBMS store tables as sections of
columns of data, rather than rows (tuples) good
for, eg, certain aggregate queries

28
Practice relational operators

Consider relations R1 and R2 that are
horizontally partitioned and distributed across n
sites
Develop an algorithm to implement R1 R1.A
R2.B R2 by using
semijoin
bloomjoin
Question is your algorithm parallel scalable?
That is, the more processors are used, the faster
it is

29
Updating distributed data
29
30
Updating Distributed data

Fragmentation an update may go across several
sites
Local transaction
Global transaction
Replication multiple copies of the same data --
consistency

updates
propagate
31
System structure

Local transaction manager either local
transaction or part of a global transaction
Maintain a log for recovery
Concurrency control for transactions at the site
Transaction coordinator (not in centralized DBMS)
Start the execution of a transaction
Break a transaction into a number of
sub-transactions and distribute them to the
appropriate sites
Coordinate the termination of the transaction
Commit at all sites
Abort at all sites

32
Two-Phase Commit protocol (2PC) Phase 1

Transaction T the transaction coordinator is at C

P1
(1) ltprepare Tgt
log
C
ltready Tgt
P2
(2) ltready Tgt
prepare T
if all responses are ltreadygt
commit T
P3
(2) ltabort Tgt
abort T
ltno Tgt
log
if one of the responses is ltabortgt
log
33
Two-Phase Commit protocol (2PC) Phase 2

Transaction T the transaction coordinator is at
C

P1
(3) ltcommit Tgt
log
C
P2
ltready Tgt
(4) ltack Tgt
ltcommit Tgt
prepare T
commit T
P3
(4) ltack Tgt
complete T
ltready Tgt
log
ltcommit Tgt
Similarly for abort
log
34
Comments on 2PC

Two rounds of communication both initiated at
the coordinator
Voting
Terminating
Any site can decide to abort a transaction
Every message reflects a decision by the sender
The decision is recorded in the log to ensure the
decision survives any failure transaction id
The log at each participating site the id of the
coordinator
The log at the coordinator ids of the
participating sites

35
Concurrency control

Single local manager at a chosen site
Simple implementation and deadlock handling
Bottleneck
Vulnerability if the site fails
Distributed lock manager each site has one to
update data item D at site j
Send request to the lock manager at site j
Request is either granted or delayed
Deadlock handling is hard
Major complication replicated data

36
replication

Synchronous replication all copies must be
updated before the transaction commits
data distribution transparent, consistent
expensive
Asynchronous replication copies are periodically
updated
Allow modifying transaction to commit before all
copies have been changed
users are aware of data distribution, consistency
issues
Cheaper current products follow this approach
Peer-to-peer replication (master copies)
Primary site replication (only the primary is
updateable)

37
Synchronous replication

Majority approach -- voting data item D
replicated at n sites
A lock request is sent to more than one-half of
the sites
D is locked if the majority vote yes, write n/2
1 copies
Each copy maintains a version number
Expensive
2(n/2 1) messages for lock
Read at least n/2 1 copies to make sure it is
current
Deadlock is more complicated if only one copy is
locked

38
Synchronous replication (cont.)

Majority approach -- voting
Biased protocol read-any write-all.
Shared lock (read) simply requests a lock on D
at one site that contains a copy of D
Exclusive lock (write) lock on all sites that
contain a copy
Less overhead on read, expensive on write
Commonly used approach to synchronous replication

39
Synchronous replication -- exercise

A distributed system uses the majority (voting)
approach to update data replicas. Suppose that a
data item D is replicated at 4 different sites
S1, S2, S3, S4. What should be done if a site S
wants to
Write D
Read D

40
Synchronous replication -- answer

A distributed system uses majority the (voting)
approach to update data replicas. Suppose that a
data item D is replicated at 4 different sites
S1, S2, S3, S4. What should be done if a site S
wants to
Write D
The site S sends a lock request to any 3 of S1,
S2, S3, S4
The write operation is conducted if the lock is
granted by the lock manager of all the 3 sites
otherwise it is delayed until the lock can be
granted
Read D
The site S reads copies of D from at least 3
sites
It picks the copy with the highest version number

41
Asynchronous replication

Primary site choose exactly one copy, residing
at a primary site
A lock request is sent to the primary site
Replicas at other sites may not be updated they
are secondary to the primary copy
Simple to implement
Main issues
D becomes inaccessible if the primary site fails
Propagation of changes from the primary site to
others

42
Asynchronous replication (cont.)

Primary site
Peer-to-peer more than one of the copies can be
a master
Change to a master copy must be propagated to
others
Conflicts of changes to two copies have to be
resolved
Best used when conflicts do not arise e.g.,
Each master site owns a distinct fragment
Updating rights owned by one master at a time

43
Distributed deadlock detection

Recall wait-for graph
Nodes transactions
Edges T1 ? T2 if T1 requests a resource being
held by T2
Local wait-for graph
Nodes all transactions local or
holding/requesting data item local to the site
T1 ? T2 if T1 (at site 1) requests a resource
being held by T2 (at site 2)
Global wait-for graph union of local wait-for
graphs
Deadlock if it contains a cycle

T1
T2
T2
T4
T5
T3
T3
global
Site 2
Site 1
44
False cycles

Due to communication delay
Site 1 local wait-for graph has T1 ? T2
T2 releases the resource deletion of the edge
Site 2 T2 requests the resource again addition
of T2 ? T1
Cycle if insert T2 ? T1 arrives before removal of
T1 ? T2
Centralized deadlock detection deadlock
detection coordinator
Constructs/maintains global wait-for graph
Detect cycles
If it finds a cycle, chose a victim to be rolled
back
Distributed deadlock manager? More expensive

T1
T2
T1
T2
T1
T2
global
Site 1
Site 2
45
Summary and review

Homogeneous vs heterogeneous systems
Replication and fragmentation. Pros and cons of
replication. How to reconstruct a fragmented
relation (vertical, horizontal)?
Simple join (data shipping), semijoin, bloomjoin
set-difference? Intersection? Aggregation?
Transaction manager and coordinator.
Responsibilities?
Describe 2PC. Recovery coordinator and
participating sites
Replication
majority, read-any write-all,
primary site, peer-to-peer
Local wait-for graph, global wait-for graph,
deadlock detection, deadlock handling

46
Reading list

MapReduce tutorial
http//hadoop.apache.org/docs/r1.2.1/mapred_tutori
al.html
Take a look at the following
Cassandra, http//en.wikipedia.org/wiki/Apache_Ca
ssandra
Clusterpoint, http//en.wikipedia.org/wiki/Cluste
rpoint
Riak, http//en.wikipedia.org/wiki/Riak

Reading for the next week

Pregel a system for large-scale graph processing
http//kowshik.github.io/JPregel/pregel_paper.pdf
Distributed GraphLab A Framework for Machine
Learning in the Cloud, http//vldb.org/pvldb/vol5/
p716_yuchenglow_vldb2012.pdf
PowerGraph Distributed Graph-Parallel
Computation on Natural Graphs, http//select.cs.cm
u.edu/publications/paperdir/osdi2012-gonzalez-low-
gu-bickson-guestrin.pdf
GraphChi Large-Scale Graph Computation on Just a
PC, http//select.cs.cmu.edu/publications/paperdir
/osdi2012-kyrola-blelloch-guestrin.pdf
Performance Guarantees for Distributed
Reachability Queries, http//vldb.org/pvldb/vol5/p
1304_wenfeifan_vldb2012.pdf
W. Fan, X. Wang, and Y. Wu. Distributed Graph
Simulation Impossibility and Possibility. VLDB
2014. (parallel model)
http//homepages.inf.ed.ac.uk/wenfei/papers/vldb1
4-impossibility.pdf