Title: Distributed Database research group
1Distributed Database research group
In the name of God
- Instructor Dr. M. Rahgouzar
- Samira Tasharofi
- Reza Basseda
2Outline
- Introduction
- Distributed Data Storage
- Distributed Transaction
- Commit Protocols
- Concurrency Control in Distributed Database
- Availability
- Distributed Query Processing
- Heterogeneous Distributed Databases
- Directory Systems
- Conclusion
- Acknowledgement
3Introduction
- Distributed computing
- Consists of a number of processing elements (not
necessarily homogeneous ) that interconnected by
a computer network and that co-operate in
performing their assigned tasks - Distributed Database
- Database whose relations reside on different
sites - Database some of whose relations are replicated
at different sites - Database whose relations are split between
different sites
4Introduction
- Distributed database management system
- Software system that permits the management of
distributed database system - Advantages
- Local autonomy
- Improved performance (by proper fragmentation)
- Improved reliability/availability (by
replication) - Greater expandability
- Greater shareability
5Introduction (Cont.)
- Disadvantages
- Higher complexity
- Higher software and hardware cost
- Synchronization and co-ordination among the sites
- Higher maintenance overhead in case of
replication - Greater security problem
6Distributed Data Storage
- On Dynamic Fragmentation of Distributed Databases
Using Partial Replication D. Pinto, G. Torres
,2002 - Fragmentation
- Three kinds of fragmentation Vertical
Fragmentation, Horizontal Fragmentation and Mixed
Fragmentation - Solution
- horizontal fragmentation with partial replication
- RBy2(bound) Algorithm
- 1. For each query requested, a slave computer
increments a counter (ctr) for the user that have
made the request. - 2. If ctr reaches bound number (parameter of this
algorithm), then this computer is a candidate to
have a set of records replicated and need to
follow steps 3 and 4, else step 5. - 3. Request the set of records that the user is
asking for and save this information into the
slave database. - 4. Reset the user local counter to cero.
- 5. end
- Local access database before sending the query to
the master computer
7Distributed Data Storage (Cont.)
- The algorithm is based on two techniques
- Slave-Master Search, to provide fast access to
database on user queries - Replication slave-master (two computers) in order
to get availability - Allows database availability even when the
connection between slave and master database is
broken - Proved on local network
- Future Work
- Check how the algorithm works under dial-up
connection
8Distributed Data Storage (Cont.)
- Transparent Data Relocation in Highly Available
Distributed Systems,S. Voulgaris, M.V. Steen, A.
Baggio, and G. Ballintjn, 2002 - Management issue of distributed services
redistribution of non-replicated data among the
servers comprising a distributed service - Redistribute the data without disrupting the
services availability - Solution Base
- Shipping the data records that need to be
relocated to their new hosting server - Updating the servers mapping information to
reflect the new configuration of the distributed
service
9Distributed Data Storage (Cont.)
- Solution For a Single Redistribution
- Initialization
- Distribute new mapping M
- Record Relocation
- Termination
- Replace M with M
- Solution for Overlapping Redistributions
- Per-server Sequential Redistribution
- Using redistribution R2 after R1 completed
- Per-server Mixed but Ordered Redistributions
- The server ships each record as soon as possible,
based on the virtual mapping with first
preference - Direct Shipping to Final Destination
10Distributed Data Storage (Cont.)
- Advantages
- low delays in the servicing of client requests
during a configuration change - Adding no significant processing requirement to
the servers involved, and terminates in a timely
fashion - conceptual simplicity
- sequential concurrent versions
11Transaction Management
- Ensuring Relaxed Atomicity for Flexible
Transaction in Multidatabase Systems, A.Zhang,
M.Nodine, B.Bhargava, O.Bukhres, ACM - Global transaction A set of sub transactions,
where each sub transaction is a transaction
accessing the data items at a single local site - Flexible global transaction Specifying
Definitions of - Execution ordering dependencies between two sub
transaction - Alternating dependencies between two subsets of
sub transactions - Eliminate prepare-to-commit stage
- Sub transaction
- Retriable
- Compensable
- Pivot
- In global transactions at most one sub
transaction can be pivot
12Transaction Management (Cont.)
- Semi-atomicity in Flexible Transaction Weaker
than atomicity in global transaction - All its sub transaction in one lt-rpo commit and
all attempted sub transactions not in the
committed lt-rpo are either aborted or have their
effects undone - No partial effects of its sub transactions remain
permanent in local database - Ensuring semi-atomicity
- Using retry and compensetable techniques
- Flexible transaction Advantages
- Enhances the scope of global transaction
management beyond that offered by the traditional
global transaction model - Blocking that caused by two-phase commit can be
prevented
13Transaction Management (Cont.)
- Global scheduling for flexible Transactions in
Heterogeneous distributed Database Systems, A.
Zhang, M. Nodine, B. Bhargava - Global Serializability
- If the projection of committed local, flexible
and surplus transactions is conflict equivalent
to some serial execution of these transactions - Compensation-interference free
- For any sub transaction tj which is serialized
between a subtransaction ti and its compensating
transaction cti in S, WC(ti) /\ AC(tj) 0
14Transaction Management (Cont.)
- F-serializability
- Prevents the flexible transactions which are
serialized between a flexible transaction and its
compensating sub transactions to affect any data
items that have been updated by flexible
transaction (global serializable and compensation
free) - Avoids unnecessary abort or compensation
- Scheduling Protocols
- Stored Sub transaction Execution Graph (SSEG)
- Avoid cascading abort
15Commit Protocols
- A Two-Phase Commit Protocol for Mobile Wireless
Environment, N. Nouali, A. Doucet, H. Drias, 2005 - IF the traditional 2PC is executed in mobile
environment, disconnections will increase the
number of, may be wrong, abortion decisions of
transaction because if a FH tries to communicate
with it a disconnected MH this will cause a
failure - Disconnections are not exceptions but rather are
part of the normal mode of operation, so they
should not be treated as failures
16Commit Protocols (Cont.)
- The case of mobile client and fixed servers,
Fixed Coordinator - To mitigate the unforeseeable breakdowns, the
client must force-write the identity and location
information of the coordinator (commit-BS) just
before sending the commit-request - Only one force-write is needed to record the
coordinator information during the entire
execution of Atomic commitment Protocol (ACP)
17Commit Protocols (Cont.)
- The case of mobile client and mobile servers
- The participant-agent is responsible of
transmitting the result to the participant at
reconnection time and also of keeping logs and
eventually recovering in the case of failure - When participant registers to a new BS, the
participant MH (or mobile participant) informs
its participant-agent about its new location - Workload is shifted to the fixed part of the
network thus preserving processing power and
communication resources and minimizing traffic
cost over the wireless links
18Commit Protocols (Cont.)
- Reducing the Latency of Non-Blocking Commitment
using Optimism and Replication, R.J.Peris,
M.P.Martinez,G.Alonso, S.Arevalo, 2001 - 2Phase Commit Blocking
- 3Phase Commit
- Sending too much messages
- Low performance
- Flushing log records adds to the overall latency
as messages cannot be sent or responded to before
writing to the log - Delay is reduced by allowing sites to send
messages instead of flushing log records
19Commit Protocols (Cont.)
- Solution
- To use the main memory of a replicated group as
stable memory instead of a mirrored log with
careful writes - The group of the participating transaction
managers (the TM group) - A replicated group providing the commit service
that acts as coordinator (the CS group) - A participant does not wait to flush its log,
instead it uniformly multicasts its vote together
with its log entry
20Commit Protocols (Cont.)
- If the message corresponds to the last vote, and
all were yes votes, the transaction is
optimistically committed, and the fact is
communicated to the TM group - The optimistic commit changes the locks held by
the transaction to opt-mode - Do not allow to the holding transaction to
commit until the transaction that released them,
definitively commits
21Commit Protocols (Cont.)
- The message is optimistically delivered right
away without waiting for the stabilization of the
message (e.g., waiting for the message to be
received by all the members of the group)
22Commit Protocols (Cont.)
- The PROMPT Real-Time Commit Protocol,
J.H.Harista, K.Ramamritham, R.Gupta,IEEE,1999 - Distributed commit processing can have
considerably more effect than distributed data
processing on real-time performance - PROMPT
- New commit protocol in real-time distributed
transactions - Preventing borrowers from continuing to execute
if their associated lenders had not been received
their decisions was addressed by incorporating an
additional bit and message that informed the
master about the borrowing state and the
completion of borrowing by a cohort
23Commit Protocols (Cont.)
- Features
- Controlled optimistic access to uncommitted data
- Reduce data inaccessibility and priority
inversion - Active abort
- Cohorts inform the master as soon as they decide
to abort locally - Silent kill
- Aborts due to deadline misses that occur before
the master has initiated the commit protocol are
implemented silently - Healthy-Lending
- Health factor deadline of transaction
- Using HF to decide whether transaction can lend
its data (best choice)
24Commit Protocols (Cont.)
- One-Phase Real-Time Commit Protocols, P.Saha,
1999 - Comparing one-phase protocols (e.g. EP) and
PROMPT, the best-performing two-phase real-time
commit protocol - For parallel distributed transaction, EP
outperforms PROMPT - For sequential distributed transaction, EP
perform rather poorly - For high workload cases EP performs better
- Future Works
- Addressing the security considerations in
Multi-Level Secure (MLS) distributed RTDBS - Combination of EP and PROMPT
25Concurrency Control in Distributed Databases
- Distributed Concurrency Control Performance A
Study of Algorithms, Distribution, and
Replication - Michael J. Carey Miron Livny
- It express Distributed Concurrency Control
Algorithms and evaluate their performance in some
of conditions. At the start, It describe
Concurrency Control classic algorithms such as
2PL, Wound-Wait, Basic Timestamp ordering and
discussed on the structure of distributed
concurrency control algorithm. Then it suggest a
basic model for DDB and has some experiments with
those algorithm and evaluate the model.
26Concurrency Control in Distributed Databases
(Cont.)
- Concurrency Control in Distributed Database
Systems - PHILIP A. BERNSTEIN NATHAN GOODMAN
- It explain mythological proofs for Distributed
Concurrency Control Algorithms and theoretically
evaluate them. It define serializablity in DDB
and define a formal language to formulate
transactions in DDB. Then it express each of
classic algorithms in his formal language and
prove their correctness and completeness.
27Concurrency Control in Distributed Databases
(Cont.)
- Concurrency Control in Distributed
Object-Oriented Database Systems - Kjetil Nørvag Olav Sandsta and Kjell
Bratbergsengen - The simulation results in this paper is a
comparison of performance and response times for
two concurrency control algorithms, timestamp
ordering and two-phase locking. The simulations
have been run with different number of nodes,
network types, data declustering and workloads.
The results show that for a mix of small and long
transactions, the throughput is significantly
higher for a system with a timestamp ordering
scheduler than for a system with a two-phase
locking scheduler. - Implementing Atomic Actions on Decentralized Data
- DAVID P. REED
- Its a general survey on Concurrency Control in
DDB and describe classic method.
28Concurrency Control in Distributed Databases
(Cont.)
- Dynamic Voting Algorithms for Maintaining the
Consistency of a Replicated Database - SUSHIL JAJODIA DAVID MUTCHLER
- The best known pessimistic algorithm, voting, is
a static algorithm, meaning that all potential
distinguished partitions can be listed in
advance. It presents a dynamic extension of
voting called dynamic voting. This algorithm
permits updates in a partition provided it
contains more than half of the up-to-date copies
of the replicated file. It also presents an
extension of dynamic voting called dynamic voting
with linearly ordered copies (abbreviated as
dynamic-linear). These algorithms are dynamic
because the order in which past distinguished
partitions were created plays a role in the
selection of the next distinguished partition.
29Concurrency Control in Distributed Databases
(Cont.)
- Deadlock Detection in Distributed Databases
- EDGAR KNAPP
- This paper is concerned only with the aspect of
deadlock detection. Recent developments in the
area of distributed deadlock detection algorithms
are surveyed, with a special emphasis on their
relation to distributed DBSs. The paper
introduces a uniform framework for the discussion
of these algorithms. The abstraction achieved
this way allows us to talk about the algorithms
in terms of the underlying theoretical concepts,
instead of just giving a phenomenon-logical
description of the workings of the algorithms.
30Concurrency Control in Distributed Databases
(Cont.)
- MODELS OF A VERY LARGE DISTRIBUTED DATABASE
- Mark Blakey
- The best known pessimistic algorithm, voting, is
a static algorithm, meaning that all potential
distinguished partitions can be listed in
advance. It presents a dynamic extension of
voting called dynamic voting. This algorithm
permits updates in a partition provided it
contains more than half of the up-to-date copies
of the replicated file. It also presents an
extension of dynamic voting called dynamic voting
with linearly ordered copies (abbreviated as
dynamic-linear). These algorithms are dynamic
because the order in which past distinguished
partitions were created plays a role in the
selection of the next distinguished partition.
31Concurrency Control in Distributed Databases
(Cont.)
- Performance Study of a Centralized Concurrency
Control Algorithm for Distributed Database
Systems using SIMULA - K. Viswanathan Iyer L. M. Patnaik
- One objective of this paper is to elaborate the
simulation methodology using SIMULA. Detailed
studies have been carried out on a centralized CC
algorithm and its modified version. The results
compare well with a previously reported study on
these algorithms. Here, additional results
concerning the update intensiveness of
transactions and the degree of conflict are
obtained. The degree of conflict is
quantitatively measured and it is seen to be a
useful performance index. It seems that, It is
going to formulate the effectiveness of
Concurrency Control Algorithm and it focused on
the behavior of a class of performance index.
32Availability
- Maintaining Availability in Partitioned
Replicated Databases - AMR EL ABBADI SAM TOUEG
- It describes a new replica control protocol that
allows the accessing of data in spite of site
failures and network partitioning. It claims that
this protocol provides the database designer with
a large degree of flexibility in deciding the
degree of data availability, as well as the cost
of accessing data. - Providing High Availability Using Lazy
Replication - RIVKA LADIN BARBARA LISKOV SANJAY GHEMAWAT
- This paper describes a new technique that
supports causal order. An operation call is
executed at just one replica updating of other
replicas happens by lazy exchange of gossip
messageshence the name lazy replication. The
replicated service continues to provide service
in spite of node failures and network partitions.
33Distributed Query Processing
- Query Brokers for Distributed And Flexible Query
Evaluation - Tuyet-Trinh yu Christine Collet
- This paper provides an approach for designing
query processor of a DDB by using hierarchical
mediators and using query brokers which translate
a global query in DDB context to local queries - Query Decomposition, Optimization and Processing
in Multidatabase Systems - Cem Evrendilek Asuman Dogac
- This paper suggest an approach to decomposing
queries in a optimized manner. In this way, we
need to dynamically calculate cost of query
processing in every sites for all of the sub
queries and using this factors in calculating
minimum cost.
34Distributed Query Processing (Cont.)
- Dynamically Distributed Query Evaluation
- Trevor Jim Dan Suciu
- This paper provides an approach for evaluation of
queries over the web and a directory system
dynamically. It provides a language for
explaining information requirements over a multi
database system. It uses this language for
defining a DDB and its queries in a formal way.
So it suggest an algorithm for dynamic query
evaluation in DDB and by this logic it proves its
algorithms correctness.
35Distributed Query Processing (Cont.)
- Database Connectivity Using an Agent-Based
Mediator System - Larry M. Stephenes Michael N Huhns
- This paper provides an Agent-Based approach for
managing a DDB. It uses Agents as proactive
components which include KB about system and have
reaction to topology changes to manage query
processing and concurrency control , . It uses
KQML and a specific coordination strategy for
this system.
36Distributed Query Processing (Cont.)
- Optimizing Equijoin Queries In Distributed
Databases Where Relations Are t-lash Partitioned - DENNIS SHASHA TSONG-LI WANG
- It studies the optimization problem that arises
when the query processor must repartition the
relations and intermediate results participating
in a multi join query. Using estimates of the
sizes of intermediate relations, it shows (1)
optimum solutions for closed chain queries (2)
the NP-completeness of the optimization problem
for star, tree, and general graph queries and
(3) effective heuristics for these hard cases.
37Conclusion
- DDB is a mature topic and many model provided for
expressing its approaches such as concurrency
control, availability , - Various models using to prove correctness of its
approaches in concurrency controls , - It is faced with many of DB problems in a new
viewpoint because of its distribution - Legacy system and wrapper design to
38Conclusion (Cont.)
- High availability by increasing replication vs.
performance of transaction management and
concurrency control A trade off - DDB system Concurrency control open problems
- Improving recent approaches to increase
availability with performance - Improving distributed query evaluation over large
distributions
39Conclusion (Cont.)
- Tuple routing over a distribution and data
distribution with high performance and
performance evaluation factors in DDB - Using autonomous components to manage a DDB and
using AI in peer-to-peer DDB -
40Acknowledgement
41References
- Silbershots et al , Database System Concepts
4th edition , McGraw-Hill, 2002