Distributed Query Processing using different Semijoin operations. - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed Query Processing using different Semijoin operations.

Description:

1.1 What is distributed database system? A distributed database system is characterized by the distribution of the system components of hardware ,control and data. – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 27
Provided by: jam6175
Category:

less

Transcript and Presenter's Notes

Title: Distributed Query Processing using different Semijoin operations.


1
Distributed Query Processing using different
Semijoin operations.


  • Presented By
  • Jamal
    Uddin Ahamed

  • Friday,March12,2004

2
Presentation Outline
  • 1.Overview.
  • 2.Semijoin Operation.
  • 3. Different semijoin operations.
  • a. 2 way semijoin.
  • b.Hash Semijoin.
  • c.Domain Specific Semijoin.
  • d. Composite semijoin.
  • 4. References.
  • 5.Questions and Answer.

3
1.1 What is distributed database system?
  • A distributed database system is characterized by
    the distribution of the system components of
    hardware ,control and data. For this research, a
    distributed system is a collection of independent
    computers interconnected via point-to-point
    communication lines.

4
1.2 Node Characteristics
  • Each computer , known as a node in the
  • network, has a processing capability, a
  • data storage capability, and is capable
  • of operating autonomously in the system.
  • Each node contains a version of a
  • distributed DBMS.

5
1.3 What is distributed query processing?
  • The retrieval of data from different sites in a
    network is known as distributed query processing.

6
1.4 Phases of distributed query processing with a
semijoin operator.
  • Initial Local processing (Selections and Projects
    are processed at each site.)
  • Semijoin processing ( A semijoin program) is
    derived from the remaining join operations and
    executed to reduce the size of the relations in a
    cost-effective way)
  • Final processing (all relations involved are
    transmitted to final site and all joins are
    performed there.)

7
2.1 Semijoin
  • A semijoin from Ri to Rj on attribute A can be
    denoted as Rj? Ri .It is used to reduce the
    data transmission cost.
  • Computing steps
  • Project Ri on attribute A (RiA ) and ship this
    projection ( a semijoin projection) from the site
    of Ri to the site of Rj
  • Reduce Rj to Rj by eliminating tuples where
    attribute A are not matching any value in RiA .

8
2.2 Example
  • Example (semijoin s R1A?R2)

Benefit (s) 6 -2 4 Cost (s) 3 Cost
effectiveness D(s) B(s)-C(s) gt0
9
3.a.1 Definition of 2 way semijoin.
  • 2-way Semijoinan extended version of the
    semijoin
  • Definition A 2-way semijoin (t) of Ri and Rj on
    attribute A can be denoted as
  • Ri?A?Rj RiA?Rj, RjA?Ri
  • So t reduces Ri and Rj to Ri and Rj
    respectively.

10
3.a.2 Properties of 2 way semijoin.
  • Computing steps
  • Send Ri A from site i to site j
  • Reduce Rj to Rj by eliminating tuples whose
    attribute A are not matching any of Ri A and at
    the same time partition Ri A to Ri Am (match
    one of Rj A) and Ri Anm(Ri A- Ri Am)
  • Send min(Ri Am , Ri Anm) back to site i
  • Reduce Ri to Ri using Ri Am (or Ri Anm) .
  • Evaluation
  • Benefit B(t) S(Ri ) - S(Ri ) S(Rj) -
    S(Rj)
  • Cost C(t) S(Ri A ) minS(Ri Am ) ,
    S( Ri Anm)
  • If the benefit exceeds the cost (D(t) gt0) then it
    is called a cost-effective 2-way semioin

11
3.a.3 2-way semijoin example.
12
3.a.4 Semijoin Vs 2-way semijoin.
  • -It is an extended version of semijoin.
  • It has more reduction power than semijoin.
  • The propagation of reduction effects by the 2-way
    semijoin is further than by the semijoin.

13
3.b.1 Hash-semijoin operator.
  • Main idea use a search filter which represents
    the semijoin projection with a small bit array .
  • Definition
  • The hash-semijoin of Ri and Rj is denoted Rj?
    Ri. It is computed as follow
  • The Semijoin projection of Ri is represented as a
    bit array
  • Shipping this bit array to the site of Rj
  • finally, the tuples of Rj are screened by the
    search filter.

14
3.b.2 hash semijoin example.
R2
R1
15
3.b.3 Semijoin Vs Hash Semijoin.
  • Advantages
  • Hash-semijoin is more cost-effective than
    semijoin
  • The search filter in the hash-semijoin achieves
    considerable savings in the cost of a semijoin
    operation
  • Limitation
  • Only works on execution tree
  • Tightly related with the hash functions

16
3.c.1 What is horizontally partitioned database
  • We can call a distributed database system is
    horizontally partitioned (or fragmented) if the
    relations can be split horizontally into several
    disjoint sets of tuples, which are called
    horizontal fragments.

17
3.c.2 Horizontally partitioned database
system.(Example)
EMP1 1?D-no ?10
EMP
E-no E-name D-no
101 johnson 01
103 jordan 03
105 erving 01
E-no E-name D-no
101 johnson 01
103 jordan 03
105 erving 01
109 jabbar 12
110 sampson 14
141 chang 16
?
EMP2 11?D-no ?20
E-no E-name D-no
109 jabbar 12
110 sampson 14
141 chang 16
18
3.c.3 Horizontally partitioned database
system.(Properties)
  • A fragmented relation Ri can be constructed by
    performing a union operation on all its fragment.
  • Ri Uk Rik
  • There is commutative rule between the binary
    operations join and union for fragmented
    relations a join between two fragmented relation
    R1 and R2 is equivalent to a union over the joins
    between each fragment of R1 and each fragment of
    R2.
  • Mathematically
  • (U R1k)AB (U R2m) U(R1kAB R2m)
  • k m k.m

19
3.c.4 Why cant we use regular semjoin between
two fragment to reduce the size of
fragments?(Continue)
  • We consider a joint RiAB Rj between two
    fragmented relations Ri and Rj. We want to
    reduce the size of Rik, a fragment of Ri , by
    semijoin before it is sent to the final
    processing site. We cannot perform the semijoin
  • Rik? AB Rjm
  • between Rik and any fragment Rjm of Rj
    without considering the other fragment Rjm of Rj
    , because the join operation dictates that no
    tuple of a relation can be eliminate before it is
    compare with all tupls of the other joining
    relation which may be contribute to the join.

20
Example
EMP1 1?D-no ?10
sal 101?E-no ?105
E-no E-name D-no
101 johnson 01
103 jordan 03
135 erving 01
E-no Sal D-no
101 1000 12
102 2000 03
105 3000 11
D-no
01
03
12
14
16
EMP2 11?D-no ?20
sal 105?E-no ?110
E-no E-name D-no
109 jabbar 12
110 sampson 14
141 chang 16
E-no Sal D-no
107 1000 12
107 2000 03
110 3000 11
21
3.c.5 Definition of Domain Specific Semijoin.
  • The domain-specific semijoin operation, Rik(
    AB Rjm, where A and B are the joining
    attributes and Rik, Rjm are two fragments of the
    joining relation Ri and Rj respectively, is
    defined as follows
  • Rik( AB Rjm rr? Rik r.A ? Rjm B
    U(DomRj.B-DomRjm.B)
  • Where Rik is the restricted fragment and Rjm
    is the restricting fragment. We also called Ri
    the restricted relation and Rj is the restricting
    relation of the domain-specific semijoin.

22
3.d.1 Definition of Composite Semijoin.
  • Composite Semijoin a semijoin in which the
    projection and the transimssion involve multiple
    columns (attrs).

23
3.d.2 Example of Composite Semijoin.
R2
R1
A1 A2 Non-join Attr
1 aa -
1 bb -
2 cc -
3 cc -
A1 A2 Non-join Attr
1 cc -
1 aa -
2 bb -
3 bb -
A1 A2 Non-join Attr
1 aa -
No False loop!!
24
3.d.3 Semijoin Vs Composite Semijoin.
  • Composite semijoins in a query processing
    algorithm is likely to result in substantial RT
    reduction.
  • Composite semijoins should not always be used. If
    it results greater RT, ignore it.
  • Strategy with composite semijoins is at least as
    good as that without composite semijoins.

25
References
  1. Using 2-way semijoin in distributed query
    processing. By Hyunchul Kang and Nick
    Roussopoulos.
  2. Improving distributed query processing by
    hash-semijoins. By Judy Tseng and Arbee Chen.
  3. Domain Specific SemijoinA new operation for
    distributed query processing. By Jason Chen and
    Victor Li.
  4. Composite Semijoin in distributed query
    processing. By William Perrizio and Chun Chen

26
Comments Questions??
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com