Thesis presentation - PowerPoint PPT Presentation

About This Presentation
Title:

Thesis presentation

Description:

... 19107.00 4236.00 0.19 0.21 20000.00 120000.00 20000.00 4.00 22422.00 3315.00 0.19 0.17 20000.00 140000.00 20000.00 4.00 24956.00 2534.00 0.18 0.13 20000.00 ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 33
Provided by: Witold6
Category:

less

Transcript and Presenter's Notes

Title: Thesis presentation


1
Interoperability of a Scalable Distributed Data
Manager with an Object-relational DBMS
  • Thesis presentation
  • Yakham NDIAYE
  •   November, 13the 2001

2
Objective
  • Develop techniques for the interoperability of a
    DBMS with an external SDDS file.
  • Examine various architectural issues, making such
    a coupling the most efficient.
  • Validate our technical choices by the prototyping
    and the experimental performances analysis.
  • Our approach is at the crossing the main memory
    DBMS, the object-relational-DBMS with the foreign
    functions, and the distributed/parallel DBMS.

3
Plan
  • Multicomputers
  • SDDSs
  • AMOS-II DB2 DBMSs
  • Coupling SDDS and AMOS-II
  • Coupling SDDS and DB2
  • Experimental analysis 
  • Conclusion

4
Multicomputers
  • A collection of loosely coupled computers
  • Computers inter-connected by high-speed local
    area networks.
  • Cost/Performance
  • offers potentially storage and processing
    capabilities rivaling a supercomputer at a
    fraction of the cost.
  • New architectural concepts
  • offer to applications the cumulated CPU and
    storage capabilities of a large number of
    inter-connected computers.

5
SDDS
  • New data structures specifically for
    Multicomputers
  • Data are structured
  • - records with keys
  • parallel scans function shipping
  • Data are on servers
  • - waiting for access
  • Overflowing servers split into new servers
  • - appended to the file without informing the
    clients
  • Queries come from multiple autonomous clients
  • - Access initiators
  • - Not using any centralized directory for access
    computations
  • See for more http//ceria.dauphine.fr

6
AMOS-II DBMS
  • AMOS-II Active Mediating Object System
  • A main memory database system.
  • Declarative query language AMOSQL.
  • External data sources capability.
  • External program interfaces AMOS-II using
  • - Call-level interface (call-in)
  • - Foreign functions (call-out)
  • See the AMOS-II page for more
  • http//www.dis.uu.se/udbl/

7
DB2 Universal Database 
  • IBM object-relational DBMS
  •  DB2 Universal Database .
  • Typical representative of a commercial
    relational-object DBMS.
  • Capabilities to handle external data through the
    user-defined functions (UDF).

8
Coupling Strategies
  • AMOS-SDDS Strategy
  • - for a scalable RAM file supporting database
    queries
  • - Use a DBMS for manipulations best handled
    through by the query language  
  • - Direct fast data access for manipulations not
    supported well, or at all, by a DBMS
  • - Distributed queries processing with functions
    shipping.

9
AMOS-SDDS System
AMOS-SDDS scalable parallel query processing
10
Coupling Strategies
  • SD-AMOS Strategy
  • - Uses AMOS-II as the memory manager at each
    SDDS storage site
  • - Scalable generalization of a parallel DBMS
  • - Data partitioning becomes dynamic.

11
SD-AMOS System
SD-AMOS scalable parallel query processing
12
Couplage SDDS DB2
  • DB2-SDDS Strategy
  • - Coupling of a DBMS with an external data
    repository with direct fast data access.
  • - Use of a SDDS file by a DBMS like an external
    data repository.
  • - Offer to the user an interface more elaborate
    than that of SDDS manager, in particular by his
    query language .

13
Coupling SDDS DB2
DB2-SDDS Overall Architecture
Register a user-defined external table function
CREATE FUNCTION scan(Varchar(20)) RETURNS TABLE
(ssn integer, name Varchar(20), city
Varchar(20)) EXTERNAL NAME interface!fullscan'
14
Coupling SDDS DB2
Foreign functions to access SDDS records from DB2
range(cleMin, cleMax) -gt liste enregistrements
dont cleMin lt clé lt cleMax scan(nom_fichier)-gt
liste de tous les enregistrements du fichier
Sample queries  - Parallel scan All SDDS
records. select from table( scan(fichier) )
as table_sdds(SSN, NAME,CITY) - Range
query SDDS records where key between 1 and
100. select from table( range(1, 100) ) as
table_sdds(SSN, NAME,CITY) order by Name
15
The Hardware
  • Six Pentium III 700 MHz with 256 MB of RAM
    running Windows 2000
  • On a 100Mbit/s Ethernet network.
  • One site is used as Client and the five other as
    Servers
  • We run many servers at the same machine (up to 3
    per machine).
  • File scaled from 1 to 15 servers.

16
Benchmark queries
  • Benchmark data
  • Table Person (SS, Name, City).
  • Size 20,000 to 300,000 tuples of 25 bytes.
  • 50 Cities.
  • Random distribution.
  • Benchmark query  couples of persons in the
    same city  
  • Query 1, the file resides at a single AMOS-II.
  • Query 2, the file resides at AMOS-SDDS.
  • Join evaluation Two strategies.
  • Measures
  • - Speed-up Scale-up
  • Processing time of aggregate functions

17
Server Query Processing
  • E-strategy
  • Data stay external to AMOS
  • within the SDDS bucket
  • Custom foreign functions perform the query
  • I-strategy
  • Data are dynamically imported into AMOS-II
  • Possibly with the local index creation
  • Deleted after the processing
  • Good for joins
  • AMOS performs the query

18
Speed-up  
Elapsed time of Query 2 according to the strategy
for a file of 20,000 records, distributed over 1
to 5 servers.
I-Strategy for Query 2 elapsed time
E-Strategy for Query 2 elapsed time
Elapsed time per tuple of Query 2 according to
the strategy
19
Discussion  
  • The results showed an important advantage of
    I-Strategy on E-Strategy for the evaluation of
    the join query.
  • For 5 servers, the rate is 6 times for the nested
    loop, and 9 times if an index is creates.
  • The favorable result makes us study the scale-up
    characteristics of AMOS-SDDS on a file that
    scales up to 300,000 tuples.

20
Scaling the number of servers  
Q1 AMOS-SDDS join Q2 AMOS-SDDS join with
count.
Time per tuple (extrapolated for AMOS-SDDS)
Elapsed time of join queries to AMOS-SDDS
21
Scaling the number of servers  
Expected time per tuple of join queries to
AMOS-SDDS
  • Results are extrapolated to 1 server per machine.
  • - Basically, the CPU component of the elapsed
    time is divided by 3
  • The extrapolation of the processing time of the
    join query with count shows a linear scalability
    of the system.
  • Processing time per tuple remains constant
    (2.94ms) when the file size and the number of
    servers increase by the same factor.

22
Aggregate Function count
Elapsed time of aggregate functions Count under
AMOS-SDDS
Elapsed time over 100,000-tuple file on AMOS-SDDS
Elapsed times for AMOS-II 280ms
Elapsed time of aggregate function Count
23
Aggregate Function max
Elapsed time of aggregate functions Max under
AMOS-SDDS
Elapsed time over 100,000-tuple file on AMOS-SDDS
Elapsed times for AMOS-II 471ms
Elapsed time of aggregate function Max
24
Discussion  
  • Contrary to the join query, the external strategy
    is gaining for the evaluation of aggregate
    functions.
  • For count function, improvement is about 34
    times.
  • For max function, improvement is about 4 times.
  • Due to the importation cost and to a SDDS
    property the current number of records is a
    parameter of a bucket.
  • Linear Speed-up processing time decreases with
    the number of servers.
  • The use of the external functions can thus be
    very advantageous for certain kind of operations.

25
SD-AMOS performance measurements
Creation time of 3,000,000 records file. The
bucket size is 750,000 records of 100 bytes
Global and moving average insertion time of a
record
26
SD-AMOS performance measurements
Elapsed time of range query
Average time per tuple
27
Discussion  
  • The average insertion time of a record with the
    splits is of 0.15ms.
  • The average access time to a record on a
    distributed file is of 0.12ms.
  • - It is 100 times faster than that with a
    traditional file on disc.
  • Linear scalability The insertion time and the
    access time per tuple remains constant when the
    file size and the number of servers increase.

28
DB2-SDDS performance measurements
(i) access time to the data in a DB2 table, (ii)
access time to SDDS file from the DB2 external
functions (DB2-SDDS) and (iii) direct access time
to SDDS file from a SDDS client.
Elapsed time of range query
Time per tuple
29
Discussion  
  • Access time to SDDS file is much faster than the
    access time to a DB2 table 0.02ms versus 0.07ms.
  • Access time to external data from DB2 (0.08ms),
    is less fast than the access to the internal data
    (0.07ms).
  • Coupling cost
  • An application has
  • - fast direct access to the data
  • - through the DBMS, access by the query language

30
Conclusion  
  • We have coupled a SDDS manager with a main-memory
    DBMS AMOS-II and DB2 to improve the current
    technologies for high-performance databases and
    for the coupling with external data repositories.
  • The experiments we have reported in the Thesis
    prove the efficiency of the system.
  • AMOS-SDDS et DB2-SDDS use of a SDDS file by a
    DBMS and the parallel query processing on the
    server sites.
  • SD-AMOS appears as a scalable generalisation of
    a parallel main-memory DBMS where the data
    partitioning becomes automatic.

31
Future Work  
  • Other types of DBMS queries.
  • Client's scalable distributed query decomposer.
  • challenging appears the design of a scalable
    distributed query optimizer handling the dynamic
    data partitioning.

32
End
Thank You for Your Attention
CERIA Université Paris IX Dauphine
Yakham.Ndiaye_at_dauphine.fr
Write a Comment
User Comments (0)
About PowerShow.com