TDD:%20Topics%20in%20Distributed%20Databases - PowerPoint PPT Presentation

About This Presentation
Title:

TDD:%20Topics%20in%20Distributed%20Databases

Description:

Distributed databases ... Mediator. wrapper. wrapper. wrapper. DB. DB ... Storage: database DB is horizontally fragmented, based on branch-name: NYC, ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 48
Provided by: homepage7
Category:

less

Transcript and Presenter's Notes

Title: TDD:%20Topics%20in%20Distributed%20Databases


1
TDD Topics in Distributed Databases
  • Distributed Databases
  • Distributed database
  • Distributed query processing joins and non-join
    queries
  • Updating distributed data

2
Distributed databases
  • Data is stored in several sites (nodes),
    geographically or administratively across
    multiple systems
  • Each site is running an independent DBMS
  • What do we get?
  • Increased availability and reliability
  • Increased parallelism

Data centers
  • Complications
  • Catalog management distributed data independence
    and distributed transaction atomicity
  • Query processing and optimization replication
    and fragmentation
  • Increased update costs, concurrency control
    locking, deadlock, commit protocol, recovery

3
Architectures
3
4
Homogeneous vs heterogeneous systems
  • Homogeneous identical DBMS, aware of each other,
    cooperate
  • Heterogeneous different schemas/DBMS
  • Multidatabase system uniform logical view of the
    data -- common schema
  • difficult, yet common system is typically
    gradually developed

5
Architectures
  • Client-server client (user interface, front
    end), server (DBMS)
  • Client ships query to a server (query shipping)
  • All query processing at server

client
client
client-server
server
server
server
6
Architectures
  • Collaborating server query can span several
    servers
  • Middleware
  • Coordinator queries and transactions across
    servers

7
Warehouse architecture
client applications
data warehouse
integrator
monitor/wrapper
monitor/wrapper
monitor/wrapper
XML
RDB
OODB
8
Monitor/wrapper
  • A monitor/wrapper for each data source
  • translation translate an information source into
    a common integrating model
  • change detection detect changes to the
    underlying data source and propagate the changes
    to the integrator
  • active databases (triggers condition, event,
    action)
  • logged sources inspecting logs
  • periodic polling, periodic dumps/snapshots
  • Data cleaning
  • detect erroneous/incomplete information to ensure
    validity
  • back flushing return cleaned data to the source

9
Integrator
  • Receive change notifications from the
    wrapper/monitors and reflect the changes in the
    data warehouse.
  • Typically a rule-based engine
  • merging information (data fusion)
  • handling references
  • Data cleaning
  • removing redundancies and inconsistencies
  • inserting default values
  • blocking sources

10
When to use data warehouse
  • Problem potential inconsistencies with the
    sources.
  • Commonly used for relatively static data
  • when clients require specific, predicable portion
    of the available information
  • when clients require high query performance but
    not necessarily the most recent state of the
    information
  • when clients want summarized/aggregated
    information such as historical information
  • Examples
  • scientific data
  • historical enterprise data
  • caching frequently requested information

11
Data warehouse vs. materialized views
  • materialized view is over an individual
    structured database, while a warehouse is over a
    collection of heterogeneous, distributed data
    sources
  • materialized view typically has the same form as
    in the underlying database, while a warehouse
    stores highly integrated and summarized data
  • materialized view modifications occur within the
    same transaction updating its underlying
    database, while a warehouse may have to deal with
    independent sources
  • sources simply report changes
  • sources may not have locking capability
  • integrator is loosely coupled with the sources

12
Mediated system architecture
  • Virtual approach data is not stored in the
    middle tier

client applications
Mediator
wrapper
wrapper
wrapper
XML
RDB
OODB
13
Lazy vs. eager approaches
  • Lazy approach (mediated systems)
  • accept a query, determine the appropriate set of
    data sources, generate sub-queries for each data
    source
  • obtain results from the data sources, perform
    translation, filtering and composing, and return
    the final answer
  • Eager approach (warehouses)
  • information from each source that may be of
    interest is extracted in advance, translated,
    filtered, merged with relevant sources, and
    stored in a repository
  • query is evaluated directly against the
    repository, without accessing the original
    information sources

14
Data warehouse vs. mediated systems
  • Efficiency
  • response time at the warehouse, queries can be
    answered efficiently without accessing original
    data sources. Advantageous when data sources are
    slow, expensive or periodically unavailable, or
    when translation, filtering and merging require
    significant processing
  • space warehousing consumes extra storage space
  • Extensibility warehouse
  • consistency with the sources warehouse data may
    become out of date
  • applicability
  • warehouses for high query performance and static
    data
  • mediated systems for information that changes
    rapidly

15
Distributed data storage -- replication
  • Fragments of a relation are replicated at several
    sites R is fragmented into R1, R2, R3
  • Why?
  • Increase availability/reliability if one site
    fails
  • Increase parallelism faster query evaluation
  • Increase overhead on updates consistency
  • Dynamic issues synchronous vs. asynchronous
  • Primary copy e.g.,
  • Bank an account at the site in which it was
    opened
  • Airline an flight at the site from which it
    originates

R1
R2
R3
R2
Site 2
Site 1
16
Distributed data storage -- fragmentation
  • A relation R may be fragmented or partitioned
  • Horizontal
  • Vertical lossless join
  • Question how to reconstruct the original R?

fragmentation determined by local ownership
NYC
EDI
17
Transparency, independence
  • Distributed data transparency (independence)
  • location (name) transparency
  • fragmentation
  • replication transparency (catalog management)
  • Transaction atomicity across several sites
  • All changes persist if the transaction commits
  • None persists if the transaction aborts

Data independency and transaction atomicity are
not supported currently the users have to be
aware of where data is located
18
Distributed query processing joins and non-join
queries
18
19
Distributed query processing and optimization
  • New challenges
  • Data transmission costs (network)
  • parallelism
  • Choice of replicas lowest transmission cost
  • Fragmentation to reconstruct the original
    relation
  • Query decomposition query rewriting/unfolding
  • depending on how data is fragmented/replicated

decomposition
sub-query
sub-query
sub-query
20
Non-join queries
  • Schema account(acc-num, branch-name, balance)
  • Query Q select from account where
    branch-name EDI
  • Storage database DB is horizontally fragmented,
    based on branch-name NYC, Philly, EDI, denoted
    by DB1, , DBn
  • DB DB1 ? ? DBn
  • Processing
  • Rewrite Q into Q(DB1) ? ? Q(DBn)
  • Q(DBi) is empty if branch-name ltgt EDI
  • Q(DB1), where DB1 is the EDI branch
  • Q(DB1) Q(DB1)
  • Q select from account

21
Simple join processing data shipping
  • R1 R2 R3
  • where Ri is at site i, S0 is the site where the
    query is issued
  • Option 1 send copies of R1, R2, R3 to S0 and
    compute the joins at S0
  • Option 2
  • Send R1 to S2, compute temp1 ? R1 R2 at S2
  • Send temp1 to S3, compute temp2 ? R3 temp1
    at S3
  • Send temp2 to S0
  • Decision on strategies
  • The volume of data shipped
  • The cost of transmitting a block
  • Relative speed of processing at each site

22
Semijoin reduce communication costs
  • R1 R2, where Ri is at site i,
  • Compute temp1 ? ? (R1 ? R2) R1 at site 1
  • projection on join attributes only assume R1
    smaller
  • Ship temp1 to site 2, instead of the entire
    relation of R1
  • Compute temp2 ? R2 temp1 at S2
  • Ship temp2 to site 1
  • compute result ? R1 temp2 at S1
  • Effectiveness
  • If sufficiently small fraction of the relation of
    R2 contributes to the join
  • Additional computation cost may be higher than
    the savings in communication costs

23
Bloomjoin reduce communication costs
  • R1 R2, where Ri is at site i,
  • Compute a bit vector of size k by hashing ? (R1
    ? R2) R1
  • bit set to 1 if some tuple hashes to it
  • smaller than the projection (constant size)
  • Ship the vector to site 2, instead of the entire
    relation of R1
  • Hash R2 using the same hashing function
  • Ship to site 1 only those tuples of R2 that also
    hash to 1, temp1
  • compute result ? R1 temp1 at S1
  • Effectiveness
  • Less communication costs bit-vector vs
    projection
  • The size of the reduction by hashing may be
    larger than that of projection
  • Question set difference?

24
exploring parallelism
  • Consider R1 R2 R3 R4, where Ri is
    at site i
  • temp1 ? R1 R2, by shipping R1 to site 2
  • temp2 ? R3 R4, by shipping R3 to site 4
  • result ? temp1 temp2 -- pipelining
  • Question R1 R2 R3, using
  • Pipelined parallelism
  • Semi-join
  • both

parallel
25
Distributed query optimization
  • The volume of data shipped
  • The cost of transmitting a block
  • Relative speed of processing at each site
  • Site selection replication
  • Two-step optimization
  • At compile time, generate a query plan along
    the same lines as centralized DBMS
  • Every time before the query is executed,
    transform the plan and carry out site selection
    (determine where the operators are to be
    executed) dynamic, just site selection

26
Practice validation of functional dependencies
  • A functional dependency (FD) defined on schema R
    X ? Y
  • For any instance D of R, D satisfies the FD if
    for any pair of tuples t and t, if tX tX,
    then tY tY
  • Violations of the FD in D
  • t there exists t in D, such that tX
    tX, but tY ? tY
  • Now suppose that D is fragmented and distributed
  • Develop an algorithm that given fragmented and
    distributed D and an FD, computes all the
    violations of the FD in D
  • semijoin
  • bloomjoin
  • Questions what can we do if we are given a set
    of FDs to validate?

horizontally or vertically
Minimize data shipment
27
Practice relational operators
  • Consider a relation R that is vertically
    partitioned and distributed across n sites
  • Develop an algorithm to implement
  • ?A R,
  • ?C R
  • by using
  • semijoin
  • bloomjoin
  • Column-oriented DBMS store tables as sections of
    columns of data, rather than rows (tuples) good
    for, eg, certain aggregate queries

28
Practice relational operators
  • Consider relations R1 and R2 that are
    horizontally partitioned and distributed across n
    sites
  • Develop an algorithm to implement R1 R1.A
    R2.B R2 by using
  • semijoin
  • bloomjoin
  • Question is your algorithm parallel scalable?
    That is, the more processors are used, the faster
    it is

29
Updating distributed data
29
30
Updating Distributed data
  • Fragmentation an update may go across several
    sites
  • Local transaction
  • Global transaction
  • Replication multiple copies of the same data --
    consistency

updates
propagate
31
System structure
  • Local transaction manager either local
    transaction or part of a global transaction
  • Maintain a log for recovery
  • Concurrency control for transactions at the site
  • Transaction coordinator (not in centralized DBMS)
  • Start the execution of a transaction
  • Break a transaction into a number of
    sub-transactions and distribute them to the
    appropriate sites
  • Coordinate the termination of the transaction
  • Commit at all sites
  • Abort at all sites

32
Two-Phase Commit protocol (2PC) Phase 1
  • Transaction T the transaction coordinator is at C

P1
(1) ltprepare Tgt
log
C
ltready Tgt
P2
(2) ltready Tgt
prepare T
if all responses are ltreadygt
commit T
P3
(2) ltabort Tgt
abort T
ltno Tgt
log
if one of the responses is ltabortgt
log
33
Two-Phase Commit protocol (2PC) Phase 2
  • Transaction T the transaction coordinator is at
    C

P1
(3) ltcommit Tgt
log
C
P2
ltready Tgt
(4) ltack Tgt
ltcommit Tgt
prepare T
commit T
P3
(4) ltack Tgt
complete T
ltready Tgt
log
ltcommit Tgt
Similarly for abort
log
34
Comments on 2PC
  • Two rounds of communication both initiated at
    the coordinator
  • Voting
  • Terminating
  • Any site can decide to abort a transaction
  • Every message reflects a decision by the sender
  • The decision is recorded in the log to ensure the
    decision survives any failure transaction id
  • The log at each participating site the id of the
    coordinator
  • The log at the coordinator ids of the
    participating sites

35
Concurrency control
  • Single local manager at a chosen site
  • Simple implementation and deadlock handling
  • Bottleneck
  • Vulnerability if the site fails
  • Distributed lock manager each site has one to
    update data item D at site j
  • Send request to the lock manager at site j
  • Request is either granted or delayed
  • Deadlock handling is hard
  • Major complication replicated data

36
replication
  • Synchronous replication all copies must be
    updated before the transaction commits
  • data distribution transparent, consistent
  • expensive
  • Asynchronous replication copies are periodically
    updated
  • Allow modifying transaction to commit before all
    copies have been changed
  • users are aware of data distribution, consistency
    issues
  • Cheaper current products follow this approach
  • Peer-to-peer replication (master copies)
  • Primary site replication (only the primary is
    updateable)

37
Synchronous replication
  • Majority approach -- voting data item D
    replicated at n sites
  • A lock request is sent to more than one-half of
    the sites
  • D is locked if the majority vote yes, write n/2
    1 copies
  • Each copy maintains a version number
  • Expensive
  • 2(n/2 1) messages for lock
  • Read at least n/2 1 copies to make sure it is
    current
  • Deadlock is more complicated if only one copy is
    locked

38
Synchronous replication (cont.)
  • Majority approach -- voting
  • Biased protocol read-any write-all.
  • Shared lock (read) simply requests a lock on D
    at one site that contains a copy of D
  • Exclusive lock (write) lock on all sites that
    contain a copy
  • Less overhead on read, expensive on write
  • Commonly used approach to synchronous replication

39
Synchronous replication -- exercise
  • A distributed system uses the majority (voting)
    approach to update data replicas. Suppose that a
    data item D is replicated at 4 different sites
    S1, S2, S3, S4. What should be done if a site S
    wants to
  • Write D
  • Read D

40
Synchronous replication -- answer
  • A distributed system uses majority the (voting)
    approach to update data replicas. Suppose that a
    data item D is replicated at 4 different sites
    S1, S2, S3, S4. What should be done if a site S
    wants to
  • Write D
  • The site S sends a lock request to any 3 of S1,
    S2, S3, S4
  • The write operation is conducted if the lock is
    granted by the lock manager of all the 3 sites
    otherwise it is delayed until the lock can be
    granted
  • Read D
  • The site S reads copies of D from at least 3
    sites
  • It picks the copy with the highest version number

41
Asynchronous replication
  • Primary site choose exactly one copy, residing
    at a primary site
  • A lock request is sent to the primary site
  • Replicas at other sites may not be updated they
    are secondary to the primary copy
  • Simple to implement
  • Main issues
  • D becomes inaccessible if the primary site fails
  • Propagation of changes from the primary site to
    others

42
Asynchronous replication (cont.)
  • Primary site
  • Peer-to-peer more than one of the copies can be
    a master
  • Change to a master copy must be propagated to
    others
  • Conflicts of changes to two copies have to be
    resolved
  • Best used when conflicts do not arise e.g.,
  • Each master site owns a distinct fragment
  • Updating rights owned by one master at a time

43
Distributed deadlock detection
  • Recall wait-for graph
  • Nodes transactions
  • Edges T1 ? T2 if T1 requests a resource being
    held by T2
  • Local wait-for graph
  • Nodes all transactions local or
    holding/requesting data item local to the site
  • T1 ? T2 if T1 (at site 1) requests a resource
    being held by T2 (at site 2)
  • Global wait-for graph union of local wait-for
    graphs
  • Deadlock if it contains a cycle

T1
T2
T2
T4
T5
T3
T3
global
Site 2
Site 1
44
False cycles
  • Due to communication delay
  • Site 1 local wait-for graph has T1 ? T2
  • T2 releases the resource deletion of the edge
  • Site 2 T2 requests the resource again addition
    of T2 ? T1
  • Cycle if insert T2 ? T1 arrives before removal of
    T1 ? T2
  • Centralized deadlock detection deadlock
    detection coordinator
  • Constructs/maintains global wait-for graph
  • Detect cycles
  • If it finds a cycle, chose a victim to be rolled
    back
  • Distributed deadlock manager? More expensive

T1
T2
T1
T2
T1
T2
global
Site 1
Site 2
45
Summary and review
  • Homogeneous vs heterogeneous systems
  • Replication and fragmentation. Pros and cons of
    replication. How to reconstruct a fragmented
    relation (vertical, horizontal)?
  • Simple join (data shipping), semijoin, bloomjoin
  • set-difference? Intersection? Aggregation?
  • Transaction manager and coordinator.
    Responsibilities?
  • Describe 2PC. Recovery coordinator and
    participating sites
  • Replication
  • majority, read-any write-all,
  • primary site, peer-to-peer
  • Local wait-for graph, global wait-for graph,
    deadlock detection, deadlock handling

46
Reading list
  • MapReduce tutorial
  • http//hadoop.apache.org/docs/r1.2.1/mapred_tutori
    al.html
  • Take a look at the following
  • Cassandra, http//en.wikipedia.org/wiki/Apache_Ca
    ssandra
  • Clusterpoint, http//en.wikipedia.org/wiki/Cluste
    rpoint
  • Riak, http//en.wikipedia.org/wiki/Riak

47
  • Reading for the next week
  • Pregel a system for large-scale graph processing
  • http//kowshik.github.io/JPregel/pregel_paper.pdf
  • Distributed GraphLab A Framework for Machine
    Learning in the Cloud, http//vldb.org/pvldb/vol5/
    p716_yuchenglow_vldb2012.pdf
  • PowerGraph Distributed Graph-Parallel
    Computation on Natural Graphs, http//select.cs.cm
    u.edu/publications/paperdir/osdi2012-gonzalez-low-
    gu-bickson-guestrin.pdf
  • GraphChi Large-Scale Graph Computation on Just a
    PC, http//select.cs.cmu.edu/publications/paperdir
    /osdi2012-kyrola-blelloch-guestrin.pdf
  • Performance Guarantees for Distributed
    Reachability Queries, http//vldb.org/pvldb/vol5/p
    1304_wenfeifan_vldb2012.pdf
  • W. Fan, X. Wang, and Y. Wu. Distributed Graph
    Simulation Impossibility and Possibility. VLDB
    2014. (parallel model)
  • http//homepages.inf.ed.ac.uk/wenfei/papers/vldb1
    4-impossibility.pdf

Pick two papers and write reviews
Write a Comment
User Comments (0)
About PowerShow.com