Distributed Database Design - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Database Design

Description:

B.Casey P3 Manager. J.Jones P4 Manager. Institut f r ... The query for increasing the budget of the CAD/CAM project by 10%, which would be specified as ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 33
Provided by: peterb4
Category:

less

Transcript and Presenter's Notes

Title: Distributed Database Design


1
Distributed Database Design Semantic Data
Control
  • Univ.-Prof. Dr. Peter Brezany
  • Institut für Scientific Computing
  • Universität Wien
  • Tel. 4277 39425
  • Sprechstunde Di, 13.00-14.00
  • LV-Portal www.par.univie.ac.at/brezany/teach/gck
    fk/06ss/300658.html

2
Distributed Database Design
3
A General Framework Considered in the Distributed
Database Design Problem
  • The organization of distributed systems can be
    investigated along 3 dimensions
  • Level of Sharing
  • (a) no sharing each application and its
    data execute at one site, and there
  • is no communication with any other
    program or access to any data file
  • at other sites (not very common today)

(b) data sharing all the programs are
replicated at all the sites, but data files
are not (c) data-plus-program sharing
a program at a given site can request a
service from another program at a se-
cond site, which, in turn, may have to
access a data file located at a 3. site.
4
A General Framework Considered in the Distributed
Database Design Problem (2)
2. Access patterns (along this dimension, the
relationship between distributed database
design and query processing is established)
(a) static (they do not change over time) the
design is easier for static
environments unfortunately, it is difficult to
find many real-life applications that
would be classified as static. (b) dynamic
3. Level of knowledge about the access pattern
(AP) behavior (a) no knowledge (a theoretical
possibility)
it is very difficult to design a distr.
database that can effectively cope
with this situation (b) complete knowledge
AP can be re- sonably (almost precicely)
predicted (c) partial information there
are deviations from the
predictions
5
Alternative Design Strategies
  • Top-Down approach
  • Bottom-Up approach
  • In most database designs, the two approaches
    may need to be applied to complement one another.
  • It is a joint function of the database,
    enterprise, and application system administrators
    (or of the administrator performing all three
    roles)

6
Top-Down Design Process
7
Bottom-Up Design Process
  • Top-down design is a suitable approach when a
    database system is being designed from scratch.
  • Commonly, a number of databases already exist,
    and the design task involves integrating them
    into one database. ? the bottom-up approach is
    suited for this type of environment.
  • The starting point is individual local conceptual
    schemas.
  • The process consists of integrating local schemas
    into the global conceptual schemas.
  • This type of environment exists primarily in the
    context of heterogeneous databases.

8
Distribution Design Issues - Fragmentation
  • Why fragment at all?
  • How should we fragment?
  • How much should we fragment?
  • Is there any way to test the correctness of
    decomposition?
  • How should we allocate?
  • What is necessary information for fragmentation
    and allocation?

9
Reasons for Fragmentation
  • The important issue is the appropriate unit of
    distribution a relation is not a suitable unit
    why?
  • Application views are usually subsets of
    relations, therefore the locality of accesses of
    applications is defined not on entire relations
    but on their subsets ? it is only natural to
    consider subsets of relations as distribution
    units.
  • If the applications that have views defined on a
    given relation reside at different sites, two
    alternatives can be followed, with the entire
    relation being the unit of distribution
  • The relation is not replicated ? this results
    into in an unnecessarily high volume of remote
    data accesses.
  • The relation is replicated at all or some of the
    sites where the applications reside ? problems
    in executing updates and may not be desirable if
    storage is limited.
  • The decomposition of a relation into fragments,
    each being treated as a unit, permits a large
    number of transactions to execute concurrently
    and intraquery concurrency is enabled.
  • Disadvantages of fragmentation
  • Applications whose views are defined on more than
    one fragment may suffer performance degradation.
  • Integrity checking (semantic data control)
    Attributes participating in a dependency may be
    decomposed into different fragments which might
    be allocated to different sites.

10
Fragmentation AlternativesHorizontal and
Vertical Fragmentation
Example Database
11
Example of Horizontal Fragmentation
12
Example of Vertical Fragmentation
Remark The fragmentation may, of course, be
nested. If the nestings are of
different types, one gets hybrid fragmentation.
13
Hybrid (Mixed) Fragmentation
Relation A
Fragment H1
Fragment H2V2
Fragment H2V1H1
Fragment H2V1H2
Fragment H2V1H3
14
Correctness Rules of Fragmentation
  • Completness. If a relation R is decomposed into
    fragments R1, R2, ..., Rn, each data item that
    can be found in R can also be found in one or
    more of Ris.
  • Reconstruction. If a relation R is decomposed
    into a set fragments FR R1, R2, ..., Rn, it
    should be possible to define a relational
    operator ? such that
  • R ?Ri, ?Ri ? FR
  • The operator ? will be different for for the
    different forms of fragmentation. The
    reconstructability of the relation from its
    fragments ensures that constraints defined on the
    data in the form of dependencies are preserved.
  • Disjointness. If a relation R is horizontally
    decomposed into fragments R1, R2, ..., Rn and
    data item di is in Rj, it is not in any other
    fragment Rk (k ? j).If relation R is vertically
    decomposed, its primary key attributes are
    typically repeated in all its fragments ?
    disjointness is defined only on the nonprimary
    key attributes of a relation.

15
Allokation
  • Sei F F1, ..., Fn eine Menge von Fragmenten,
    S S1, ..., Sm ein Netzwerk gegeben durch die
    Menge seiner Sites, und Q Q1, ..., Qp die
    Menge der relevanten Anwendungen.
  • Allokationsproblem
  • Was ist die optimale Zuordnung von F zu S bzgl.
    Q ?
  • Optimaltätskriterium
  • Minimalität der Kosten gegeben durch die
    Speicherkosten der Fi an den Sites Sj, der
    Anfragekosten für Fi an Site Sj, der
    Änderungskosten der Fi an allen Sites an den sie
    gespeichert sind, und die Kosten der
    Datenkommunikation.
  • Performanz im Sinne von Antwortzeiten oder
    Systemdurchsatz.

16
Sites
R1
R1,1
S1
R2
R2,1
R
R1,2
R3
R4
Globale Relation
S2
R2,2
Fragmente
Fragmente und ihre Allokation
Allokation an den Sites
R3,3
S3
R4,3
17
Zusätzliche Beispiele
Beispiel 1
horizontale Fragmentierung
18
Beispiel 2
abgeleitete horizontale Fragmentierung
19
Beispiel 3
vertikale Fragmentierung
20
Semantic Data Control
21
Main Objectives
  • View management
  • Security control
  • Semantic integrity control.
  • The above items can be defined by rules that
    the system automatically enforces. The violation
    of some such rule by a user program (a set of
    database operations) generally implies the
    rejection of the effects of that program.
  • The rules are defined by the DB administration.
  • The cost of enforcing semantic data control,
    which is high in terms of resource utilization in
    a centralized DBMS, can be prohibitive in a
    distributed environment.
  • The rules must be stored in a catalog (directory)
    ? efficient management in a distributed
    environment is important

22
View Management
  • In a rel. DB system a view is a virtual relation,
    defined as the result of a query on base
    relations (or real relations), but not
    materialized like a base relation, which is
    stored in the DB.
  • An external schema can be defined as a set of
    views and/or base relations.
  • Besides their use in external schemas, views are
    useful for ensuring data security in a simple way
    ? if users may only access the database through
    views, they cannot see or manipulate the hidden
    data, which are therefore secure.
  • In a distributed DBMS, a view can be derived from
    distributed relations, and the access to a view
    requires the execution of the distributed query
    corresponding to the view definition.
  • An important issue in a distributed DBMS is to
    make view materialization efficient.

23
Views in Centralized DBMSs
  • Example 1
  • The view of system analysts (SYSAN) derived from
    relation EMP (ENO,ENAME,TITLE), can be defined by
    the following SQL query
  • CREATE VIEW SYSAN(ENO, ENAME)
  • AS SELECT ENO, ENAME
  • FROM EMP
  • WHERE TITLE Syst. Anal.
  • The single effect is the storage of the view
    defintion in the catalog. No other information
    needs not be recorded. Therefore, the result to
    the query defining the view (i.e., a relation
    having the attributes ENO and ENAME for the
    system analysts as shown in the figure below) is
    not produced. However, the view SYSAN can be
    manipulated as a base relation.

ENO ENAME E2 M.Smith E5 B. Casey E8 J.Jones
24
Views in Centralized DBMSs (cont.)
  • Example 2
  • The query Find the names of all the system
    analysts with their
  • project number and responsibility(ies)
  • involving the view SYSAN and relation
    ASG(ENO,PNO,RESP,DUR) can be expressed as
  • SELECT ENAME, PNO, RESP
  • FROM SYSAN, ASG
  • WHERE SYSAN.ENO ASG.ENO
  • Mapping a query expresed on views into a query
    expressed on base relations can be done by query
    modification at compile time.
  • The preceding can be modified to
  • SELECT ENAME, PNO, RESP
  • FROM EMP, ASG
  • WHERE EMP.ENO ASG.ENO
  • AND TITLE Syst. Anal.

ENAME PNO RESP M.Smith P1 Analyst M.Smith
P2 Analyst B.Casey P3 Manager J.Jones
P4 Manager
25
Views in Distributed DBMSs
  • The definition of a view is similar as in
    centralized systems distinction, a view may be
    derived from fragmented relations stored at
    different sites.
  • View definitions can be centralized at one site,
    partially duplicate, or fully duplicated.
  • Processing a query expressed on views
    distributed query processor.
  • Views derived from distributed relations may be
    costly to evaluate.
  • Since in a given organization it is likely that
    many users access the same views, some proposals
    have been made to optimize view derivation.
  • An alternative solution is to avoid view
    derivation by maintaining actual versions of the
    views snapshots. A snapshot represents a
    particular state of the database and is therefore
    static, meaning that it does not reflect updates
    to base relations.

26
Data Security
  • Data security includes 2 aspects
  • Data protection. It is required to prevent
    unauthorized users from understanding the
    physical content of data.
  • Authorization control. It must guarantee that
    only authorized users perform operations they are
    allowed to perform on DB.
  • Checking whether a given triple (user, operation,
    object) can be allowed to proceed.
  • The introduction of a user (user name,
    password). The user name uniquely identifies the
    users of that name in the system, while the
    password, known only to the users of that name,
    authenticates the users.
  • Distributed authorization control
  • Additional complexity objects and subjects are
    distributed
  • remote user authentication
  • management of distributed authorization rules
  • handling of views and of user groups (DB
    administration is simplified)

27
Semantic Integrity Control
  • How to guarantee database consistency?
  • A DB state is said to be consistent if the DB
    satisfies a set of constraints, called semantic
    integrity constraints.
  • Mantaining a consistent DB requires various
    mechanisms concurrency control, reliability,
    protection, and semantic integrity control (SIC).
  • SIC ensures DB consistency by rejecting update
    programs which lead to inconsistent DB states, or
    by activating specific actions on the DB state,
    which compensate for the effects of the update
    programs.
  • SI constraints are rules that represent the
    knowledge about the properties of an application.
  • Structural constraints (e.g. unique key
    constraints in the rel. model)
  • Behavioral constraints regulate the application
    behavior.

28
Centralized Semantic Integrity Control
  • The language for expression and manipulating
    integrity assertions. Examples
  • Employee number in relation EMP cannot be null
    ENO NOT NULL in EMP
  • The pair (ENO,PNO) is the unique key in relation
    ASG (ENO,PNO) UNIQUE IN ASG
  • Enforcement mechanism that performs specific
    actions to enforce DB integrity and updates.
    Examples
  • The query for increasing the budget of the
    CAD/CAM project by 10, which would be specified
    as
  • UPDATE PROJ SET BUDGET BUDGET 1.1
  • WHERE PNAME CAD/CAM
  • will be transformed into the following query
    in order to enforce the domain constraint
  • CHECK ON PROJ (BUDGET gt 500000 AND
    BUDGET lt 1000000)
  • UPDATE PROJ SET BUDGET BUDGET 1.1
  • WHERE PNAME CAD/CAM
    AND NEW.BUDGET gt 500000
  • AND NEW.BUDGET lt
    1000000)

29
Distributed Semantic Integrity Control
  • Individual assertions. They refer only to tuples
    to be updated independently of the rest of the
    database. Example CHECK ON PROJ (BUDGET gt
    500000 AND BUDGET lt 1000000). The assertion
    definition is sent to all other sites that
    contain fragments of the relation involved in the
    assertion. The assertion must be compatible with
    the relation data at each site. Compatibility can
    be checked at two levels predicate and data.
    First, predicate compatibility is verified by
    comparing the assertion predicate with the
    fragment predicate. An asertion C is not
    compatible with a fragment predicate p if C is
    true implies that p is false, and is
    compatible with p otherwise. If noncompatibility
    is found at one of the sites, the assertion
    definition is globally rejected because tuples of
    that fragment do not satisfy the integration
    constraints. Second, if predicate compatibility
    has been found, the assertion is tested against
    the instance of the fragment. If it is not
    satisfied by that instance, the assertion is also
    globally rejected. If compatibility is found, the
    assertion is stored at each site.

30
Distributed Individual Assertions Example
Consider relation EMP, horizontally fragmented
across three sites using the predicates p1
0 lt ENO lt E3 p2 E3 lt ENO lt E6 p3
ENO gt E6 and the domain assertion C ENO lt
E4. Assertion is compatible with p1 (if C is
true, p1 is true) and p2 (if C is true, p2 is
not necessarily false), but is not with p3 (if C
is true, then p3 is false). Therefore, assertion
C should be globally rejected because the tuples
at site 3 cannot satisfy C, and thus relation
EMP does not satisfy C.
31
Set-Oriented Assertions
  • They include single-relation multivariable
    constraints such as functional dependency
    (Example 1) and multirelation multi-variable
    constraints such as foreign key constraints
    (Example 2)
  • Example 1 The employee number functionally
    determines the employee name.
  • ENO IN EMP DETERMINES ENAME
  • Example 2 The project number PNO in relation ASG
    is a foreign key matching the primary key PNO of
    relation PROJ. In other words, a project referred
    to in relation ASG must exist in relation PROJ.
  • PNO IN ASG REFERENCES PNO IN PROJ

32
Assertions Involving Aggregates
  • Example The total duration for all employees in
    the CAD/CAM project is less than 100.
  • CHECK ON gASG, jPROJ (SUM(g.DUR WHERE
    g.PNOj.PNO) lt 100
  • IF j.PNAMECAD/CAM)
  • These assertions are among the most costly to
    test because they require the calculations of the
    aggregate functions.
Write a Comment
User Comments (0)
About PowerShow.com