Module 2 Association Rules - PowerPoint PPT Presentation

About This Presentation
Title:

Module 2 Association Rules

Description:

Chapter 13 Replica Management in Grids 13.1 Motivation 13.2 Replica Architecture 13.3 Grid Replica Access Protocol (GRAP) 13.4 Handling Multiple Partitioning – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 33
Provided by: Dr231233
Category:

less

Transcript and Presenter's Notes

Title: Module 2 Association Rules


1
Chapter 13Replica Management in Grids
13.1 Motivation 13.2 Replica Architecture 13.3
Grid Replica Access Protocol (GRAP) 13.4
Handling Multiple Partitioning 13.5
Summary 13.6 Bibliographical Notes 13.7 Exercises
2
Replica Management in Grids
  • Grid databases operate in data-intensive complex
    distributed applications
  • Data is replicated among few sites for quick
    access
  • Various replica management protocols e.g. ROWA,
    primary copy, quorum-based protocols etc., are
    proposed for distributed databases
  • Replica control protocols must manage replicated
    data properly to ensure consistency of replicated
    copies of the data
  • This chapter deals with write transactions in
    replicated grid environment

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
3
13.1 Motivation
  • Example 1
  • Consider a group of researcher gathering data to
    study global warming.
  • The group is geographically distributed
  • Data is collected locally, but to run the
    experiment, it needs to access other data
  • Considering the huge amount of data gathered,
    databases are replicated at participating sites
    for performance reasons.
  • If any site runs the experiment, then the result
    must be updated in all the participants in
    synchronous manner. If the results of the global
    warming studies are not strictly synchronized
    (i.e. 1SR) between sites, other database sites
    may read incorrect values and take wrong input
    for their experiments, thus producing undesirable
    and unreliable results.
  • Example 2
  • Any sort of collaborative computing (e.g.
    collaborative simulation or collaborative
    optimisation process) needs to access up-to-date
    data.
  • If a distributed optimisation process does not
    have access to the latest data, it may lead to
    wastage of computing processes due to repeated
    iteration of the optimisation process, or in some
    cases may even lead to incorrect results.
  • Thus, applications working in collaborative
    computing environment, needs synchronous and
    strict-consistency between distributed database
    sites. Strict consistency can be achieved by
    using 1SR.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
4
13.2 Replica Architecture
  • High Level Replica Management Architecture

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
5
13.2 Replica Architecture (Contd)
  • Most of the research in replica control has
    focused on read-only queries
  • Hence only lower three layers data transfer
    protocol, replica catalogue and replica manager
    would be sufficient for replica management
  • Only the lower 3 services are not sufficient to
    maintain the correctness of data in presence of
    write transactions
  • Note synchronization and replica control is used
    interchangeably

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
6
13.2 Replica Architecture (Contd)
  • Considering the distributed nature if Grids,
    quorum based replica control protocols are most
    suited in this environment
  • Issues with implementing traditional replica
    control in Grid

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
7
13.2 Replica Architecture (Contd)
  • Say, database DB1 is replicated at 3 sites, Site
    1, 2 and 3
  • Say, the network is partitioned as site 1 in
    one partition and site 2, site 3 in the other
    partition (Partition P1 in the figure)
  • Transaction Ti which modifies an object in DB1 is
    submitted at site 3
  • Write quorum of 2 can be obtained from site 2 and
    3
  • Updated data object must also be written at 2
    sites to satisfy write quorum
  • Say both site initially decide to commit and
    hence the global decision was made. But due to
    local constraints Site 2 decides to abort Tis
    sub-transaction. This unwanted scenario could
    have been identified should the global DBMS was
    present. But it is not possible in Grid DBMS to
    identify such a scenario due to autonomy of sites

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
8
13.2 Replica Architecture (Contd)
  • This leaves site 2 with stale data
  • Now, lets say that Partition P1 is repaired and
    Partition P2 occurs with site 1, site 2 in one
    partition and site 3 in another
  • A Transaction Tj arrives at site 1 and reads the
    same data object
  • Quorum can be obtained from sites 1 and 2
  • Unfortunately both replicas have stale copy of
    data
  • Thus, we see that autonomy can lead to
    inconsistent database state

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
9
13.3 Grid Replica Access Protocol (GRAP)
  • The problem discussed earlier cannot be solved at
    individual site level but must need input from
    the middleware
  • Metadata service from the middleware is used
  • Metadata service stores information physical
    database sites connected to the Grid
  • It also store the mapping details of logical to
    physical database
  • A pointer in metadata service is added that will
    point to the latest replica of data
  • Pointer is of the form timestamp.site_id (TS.SID)
  • Timestamp points to the latest copy and site_id
    points to the site that stores this replica
  • At least one site must have the latest copy
  • Grid Replica Access Protocol (GRAP) ensures
    consistency of data in autonomous and
    heterogeneous Grid database environment

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
10
13.3 GRAP (Contd)
  • Following points are highlighted
  • Part of the distributed transaction (at local
    site) may decide to abort autonomously
  • TS.SID is updated only if the write quorum could
    be obtained
  • Local DB site is able to manage timestamps via
    the interface to Grid
  • Read Transaction Operation for GRAP
  • GRAP is based on quorum consensus protocol
  • QR is the read quorum and QW is write quorum
  • Majority consensus is used, that is both (2?QW)
    and (QRQW) should be greater than total vote for
    the replica

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
11
13.3 GRAP (Contd)
  • Following steps are executed for read
    transactions in GRAP
  • Step-1 If read quorum, QR, could not be
    collected for the read operation the transaction
    must abort
  • Step-2 If QR can be collected then the
    transaction chooses, from the metadata service,
    the site that has highest timestamp for the
    collected quorum.
  • Step-3 Corresponding site IDs for the highest
    timestamp in QR is then found from TS.SID
  • Step-4 Data is read from the local site whose
    timestamp at Grid middleware matches with local
    timestamp of the replica
  • It is possible that none of SID obtained from
    step-(3) has matching timestamps in step-(4). If
    the number of such SID is 0 then read cannot be
    performed immediately because all sites with the
    latest copy of replica may be down. Step-(4) is
    important because some of the local replica may
    have decided to abort after the global commit
    decision. Hence to obtain a latest replica,
    timestamp of the metadata service and local copy
    of the replica must match. This will be clear,
    when the algorithm for write transaction of GRAP
    is discussed.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
12
13.3 GRAP (Contd)
  • GRAP algorithm for read transaction

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
13
13.3 GRAP (Contd)
  • GRAP algorithm for Write transaction
  • The algorithm for write transaction of GRAP is
    explained as follows
  • Step -1 A submitted transaction tries to collect
    the write quorum (QW). If QW could not be
    collected the transaction aborts.
  • Step -2 If QW is obtained and the site where the
    transaction was submitted (originator) decides to
    commit, then the transaction finds the maximum
    timestamp from QW for that data in the metadata
    service (at Grid middleware).
  • Step -3 The TS.SID for that replica is then
    updated in metadata service reflecting the latest
    update in the data. The TS is set to a new
    maximum reflecting the latest replica of the data
    item and SID is set to the site ID of the
    originator. The originators local timestamp is
    also updated to match the metadata services new
    timestamp.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
14
13.3 GRAP (Contd)
  • Step 4 Other sites (participants) in the quorum
    must also be monitored for their final decisions,
    as due to autonomy restrictions, the commitment
    of the coordinator does not mean participants
    commitment. The following two cases are possible
  • Step 4A If the participant decides to commit
    The TS.SID is updated in normal way, i.e. the TS
    will be set to the maximum timestamp decided by
    the metadata service for the originator, and the
    SID will be the site ID of the corresponding
    participant. The timestamp is updated at both
    locations, at Grid middlewares metadata service
    as well as at the local participating sites
    replica.
  • Step 4B If the participant decides to abort Due
    to any local conflict if the participant decides
    to abort it must be handled so that the replica
    is not corrupted. In this case the timestamp (TS
    of TS.SID) of middlewares metadata service is
    still updated as usual to reflect that the update
    of one of the replicas has taken place, but SID
    is updated to point to the originator site
    instead of pointing to the participant (which
    decided to abort). The local timestamp of the
    replica is also not updated. This helps the read
    transactions to avoid reading stale data in
    future (as discussed in step-(4) of reading
    transaction algorithm, that metadatas timestamp
    and local replicas timestamp must match).

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
15
13.3 GRAP (Contd)
  • Step-(4b) helps the quorum-based system to
    operate correctly, even if some of the replica
    decides to abort, with the help of the TS.SID
    pointer
  • The quorum at Grid level is still valid because
    the timestamp at the metadata service has been
    updated to reflect successful completion of the
    transaction at the Grid
  • Thus the metadata information of the site that
    had to abort its local transaction, points the
    latest replica at the originator of the
    transaction and not to participant site itself.
    This is the reason why, though the site may have
    participated in the quorum but it may still not
    have any matching timestamps in step-(4) of read
    transaction. Thus if the participant aborts its
    subtransaction, the SID will point to the site
    having the latest replica, typically the
    originator. The transaction can successfully
    complete only if at least the originator commits
    successfully. If the originator aborts, then one
    participant cannot point to the other
    participant, because other participants may abort
    later due to local conflict.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
16
13.3 GRAP (Contd)
  • GRAP algorithm for write transaction

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
17
13.3 GRAP (Contd)
  • GRAP algorithm for write transaction (Contd)

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
18
13.3 GRAP (Contd)
  • Revisiting the example
  • Same scenario as the previous example is
    discussed to show how GRAP avoids reading stale
    data

Replicated database with GRAP protocol
D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
19
13.3 GRAP (Contd)
  • Say, timestamp for all replicas are 0 in the
    beginning and the site IDs are 1,2,3 etc.
  • A transaction, Ti, arrives at site 3 to write a
    data item. After obtaining the write quorum (step
    (1) of write transaction of GRAP), site 3 decides
    to commit but site 2 decides to abort their
    respective cohorts.
  • Since the quorum was obtained, the timestamp at
    Grid level, TS, will be increased to reflect the
    latest replica of the data (step (2) of write
    transaction of GRAP).
  • Since site 3 has decided to commit, the local
    timestamp will also be increased to match the
    Grid TS and the SID is set to 3 (same as site
    ID). This confirms that cohort of the transaction
    at site 3 is already committed (step (3) and (4a)
    of write transaction of GRAP).

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
20
13.3 GRAP (Contd)
  • Thus the (TS.SID, local timestamp) for site 3 is
    (1.3, 1). This indicates that the maximum
    timestamp TS at Grid middleware and local
    timestamp are same (i.e. 1) and hence the latest
    replica could be found at the same site
  • But as site 2 decided to abort its part of the
    transaction, the local timestamp is not changed
    and the SID points to the originator of the
    transaction (step (4b) of write transaction of
    GRAP), i.e. site 3.
  • Now, say P1 is repaired and partitioning P2
    occurs.
  • Tj arrives during P2. Tj can obtain the read
    quorum (QR), as site 1 and site 2 are available
    (step (1) of read transaction of GRAP).
  • The maximum timestamp at Grid level is 1 and it
    belongs to site 2 (step (2) of read transaction
    of GRAP). But site-2 has stale data.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
21
13.3 GRAP (Contd)
  • GRAP avoids reading the stale data at site 2 and
    redirects the transaction to obtain a quorum that
    has a site with latest value of data as follows
  • The pair (TS.SID, local timestamp) for site 2 is
    (1.3, 0). The maximum timestamp at Grid
    middleware is found at site 2, which implies that
    site 2 had participated in the quorum for latest
    update of the data item. But since the TS at the
    middleware do not match with the local timestamp,
    this indicates that site 2 does not contain the
    latest replica of the data item.
  • SID at site 2 points to the site that contains
    the latest replica, i.e. site 3 (step (3) of read
    transaction of GRAP). Site 3 could not be reached
    due to the network partitioning hence the
    transaction must either wait or abort (depends on
    application semantics). Thus GRAP prevents Tj
    from reading stale data. Under normal
    circumstances, since Tj had already obtained QR,
    it would have read the stale value of the
    replicated data

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
22
13.3 GRAP (Contd)
  • Correctness of GRAP
  • Correctness of GRAP protocol can be proved from
    following lemmas and theorem
  • Lemma 13.1 Two write transactions on any replica
    will be strictly ordered, avoid write-write
    conflict and the write quorum will always have a
    latest copy of the data item.
  • Lemma 13.2 Any transaction Ti will always read
    the latest copy of a replica.
  • Theorem 13.1 Grid Replica Access Protocol (GRAP)
    produces 1-copy serializable (1SR) schedules.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
23
13.4 Handling Multiple Partitioning
  • Network partitioning is a phenomenon which
    prevents communication between two sets of sites
    in a distributed architecture
  • Considering the global nature of grids, network
    failure may occur, which will lead to network
    partitioning
  • Network partitioning limits execution of
    transactions in replicated databases because
    quorum may not be obtained
  • ROWA and ROWA-A protocols cannot handle network
    partitioning
  • Primary copy protocol can only handle network
    partitioning if the primary copy is in the
    partition
  • Quorum-based protocols can best handle network
    partitioning, but only simple network
    partitioning (2 partitions)
  • Majority consensus based quorums cannot handle
    multiple network partitioning because in case of
    multiple network partitioning, basic quorum
    rules, QR QW gt Q and 2?QW gt Q, cannot be
    satisfied.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
24
13.4 Handling Multiple Partitioning (Contd)
  • Contingency GRAP
  • To handle multiple partitioning, the concept of
    contingency quorum is introduced
  • If network partitioning is detected and normal
    quorum can not be obtained GRAP collects a
    contingency quorum (Contingency GRAP)
  • Any partition needs to have at least one
    up-to-date copy of the data to serve the
    transaction

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
25
13.4 Handling Multiple Partitioning (Contd)
  • Read Transaction Operation for Contingency GRAP
  • Step 1 After the read transaction arrives at the
    site, it checks for the normal read quorum at
    Grid middleware. If the normal read quorum is
    obtained, normal GRAP operation continues.
  • Step 2 If normal read quorum cannot be obtained,
    then the transaction chooses the highest
    timestamp from the metadata service of Grid
    middleware (similar to step-(2) of normal GRAP).
  • Step 3 If the maximum timestamp at metadata
    service does not match with any of the local
    replicas timestamp in that partition for the
    data item where the transaction originated, the
    transaction must either wait or abort. This
    indicates that the partition does not have the
    latest version of the data item.
  • Step 4 If the timestamp of any of the local
    replicas from that partition and the timestamp of
    metadata service matches, then it could be
    assured that the latest copy of the replica is in
    the partition and that replica is read.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
26
13.4 Handling Multiple Partitioning (Contd)
  • Write Transaction Operation for Contingency GRAP
  • Contingency GRAP allows the transaction to write
    fewer number of sites than required in the quorum
    and maintains a log to guarantee consistency of
    data
  • Following steps are followed
  • Step 1 The transaction first tries to collect
    the normal write quorum, if obtained, normal GRAP
    continues. If a normal quorum could not be
    obtained, then Contingency GRAP starts
  • Step 2 Contingency GRAP chooses the highest
    timestamp from the metadata service and checks it
    against the sites in the partition. If the
    metadata services timestamp cannot find any
    matching timestamp in the partition, then the
    write transaction has to abort or wait till the
    partition is repaired (this is an application
    dependant decision). This implies that the latest
    copy of the data is not in the partition and
    hence the transaction cannot write the data until
    the partition is fixed or the quorum is obtained.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
27
13.4 Handling Multiple Partitioning (Contd)
  • Step 3 If the matching timestamp between
    metadata service and any sites local timestamp
    in the partition is found, then the transaction
    can proceed with the update at the site where the
    two timestamps match. Because it is assured that
    the latest version of the replica is in the
    partition. If the timestamp does not match, but
    the SID points to a site which is in the same
    partition, even then the transaction can be
    assured to update the latest copy. The sites that
    are written/updated during the Contingency GRAP
    are logged in the log file.
  • Step 4 If the originator site decides to commit
    the transaction, it updates the TS.SID (at
    metadata service). The TS is increased to a new
    maximum and SID points to the originator. Local
    timestamp of the site is also increased to match
    with the TS of the Grid middleware.
  • Step 5 Other replica sites in the partition
    (participants) also follow the same procedure if
    they decide to commit, i.e. the SID is set to the
    respective participant and the local timestamp is
    set to match with the new TS at middleware. But
    the SID points to the originator and the local
    timestamp is not increased for any site that
    decides to locally abort the write transaction.
  • Step 6 The number and detail of sites
    participating in the contingency update process
    is updated in the log. This is an important step,
    because the number of sites being updated does
    not form a quorum. Thus, after the partitioning
    is repaired, the log is used to propagate updates
    to additional sites that will form a quorum. Once
    the quorum is formed, normal GRAP operation can
    resume.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
28
13.4 Handling Multiple Partitioning (Contd)
  • Comparison of Replica Management Protocols

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
29
13.4 Handling Multiple Partitioning (Contd)
  • In the table properties of GRAP look very similar
    to that of majority consensus protocol. But the
    main difference between the two is that majority
    consensus protocol can lead to inconsistent
    database state due to autonomy of Grid database
    sites, while GRAP is designed to support
    autonomous sites. Contingency GRAP can handle
    multiple network partitioning. While the network
    is partitioned (multiple), Contingency GRAP
    updates less number of sites, required by the
    quorum, and keeps a record. Read operations can
    be performed at all partitions having the latest
    replica copy of the data (verified by the
    middleware).

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
30
13.4 Handling Multiple Partitioning (Contd)
  • Correctness of Contingency GRAP
  • Correctness of Contingency GRAP protocol can be
    proved from following lemmas and theorem
  • Lemma 13.3 Two write operations are ordered in
    presence of multiple partitioning
  • Lemma 13.4 Any transaction will always read the
    latest copy of the replica
  • Theorem 13.2 Contingency GRAP produces 1-SR
    schedules

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
31
13.5 Summary
  • Replica synchronization protocol is studied in
    presence of write transactions
  • A replica synchronisation protocol for an
    autonomous Grid environment is introduced. Due to
    autonomy of sites, participants can revert the
    global decision due to local conflicts. GRAP
    protocol ensures that a transaction reads the
    latest version of the data item
  • Contingency GRAP protocol is used to sustain
    multiple network partitioning. Looking at the
    global nature of Grids, it is important to
    address multiple network partitioning issues

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel
High-Performance Parallel Database Processing and
Grid Databases, John Wiley Sons, 2008
32
Continue to Chapter 14
Write a Comment
User Comments (0)
About PowerShow.com