SDDS2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000

About This Presentation

Title:

SDDS2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000

Description:

1. SDDS-2000 : A Prototype System for Scalable Distributed ... SQL-Server, IIS, MsExchange, Frontpage, Netscape suites, Berkeley DB Library, LH-Server, Unify... – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 133

Provided by: lit49

Category:

more less

Transcript and Presenter's Notes

Title: SDDS2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000

1
SDDS-2000 A Prototype System for Scalable
Distributed Data Structures on Windows 2000

Witold Litwin
Witold.Litwin_at_dauphine.fr

2
Plan

What are SDDSs ?
Where are we in 2002 ?
LH Scalable Distributed Hash Partitioning
Scalable Distributed Range Partitioning RP
High-Availability LHRS RPRS
DBMS coupling AMOS-SDDS SD-AMOS
Architecture of SDDS-2000
Experimental performance results
Conclusion
Future work

3
What is an SDDS

A new type of data structure
Specifically for multicomputers
Designed for data intensive files
horizontal scalability to very large sizes
larger than any single-site file
parallel and distributed processing
especially in (distributed) RAM
Record access time better than for any disk file
100-300 ?s usually under Win 2000 (100 Mb/s net,
700 MHZ CPU, 100B 1 KB records)
access by multiple autonomous clients

4
Killer apps

Any traditional application using a large hash or
B-tree or k-d file
Access time to an RAM SDDS record is 100 faster
than to a disk record
Network storage servers (SNA and NAS)
DBMSs
WEB servers
Video servers
Real-time systems
High Perf. Comp.

5
Multicomputers

A collection of loosely coupled computers
common and/or preexisting hardware
share nothing architecture
message passing through high-speed net
(??????Mb/s)
Network multicomputers
use general purpose nets
LANs Ethernet, Token Ring, Fast Ethernet, SCI,
FDDI...
WANs ATM...
Switched multicomputers
use a bus, or a switch
e.g., IBM-SP2, Parsytec

6
Typical Network Multicomputer
Client
Server
Network segments
7
Why multicomputers ?

Potentially unbeatable price-performance ratio
Much cheaper and more powerful than
supercomputers
1500 WSs at HPL with 500 GB of RAM TBs of
disks
Potential computing power
file size
access and processing time
throughput
For more pro cons
Bill Gates at Microsoft Scalability Day
NOW project (UC Berkeley)
Tanenbaum "Distributed Operating Systems",
Prentice Hall, 1995
www.microoft.com White Papers from Business
Syst. Div.

8
Why SDDSs

Multicomputers need data structures and file
systems
Trivial extensions of traditional structures are
not best
hot-spots
scalability
parallel queries
distributed and autonomous clients
distributed RAM distance to data

9
Distance to data(Jim Gray)
10 msec
local disk
distant RAM (Ethernet)
100 ?sec
distant RAM (gigabit net)
1 ?sec
100 ns
RAM
10
Distance to Data(Jim Gray)
Moon
10 ms
local disk
8 d
distant RAM (Ethernet)
100 ?s
2 h
distant RAM (gigabit net)
10 ?s
10 m
RAM
1 ?s
1 m
11
Scalability Dimensions (Client view)
Scale-up
Sub-linear (usuel)
Operation time (distant RAM)
Linear (idéal)
operation / s
Data size servers and clients
12
Scalability Dimensions (Client view, SDDS
specific)
Tapes, juke-box
Sub-linear (usuel)
Scale-up
Operation time
Cluster- Computer
Disk Cache
Cache Disk
Linear
Multicomputer SDDS
Single comp.
Local RAM
Data size
Cluster-Computer- fix of servers
13
Scalability Dimensions (Client view)
Speed-up
operations/ s
Linear (ideal)
Sub-linear (usuel)
servers
14
What is an SDDS ?

Queries come from multiple autonomous clients
Data are on servers
Data are structured
records with keys ? objects with OIDs
more semantics than in Unix flat-file model
abstraction most popular with applications
parallel scans function shipping
Overflowing servers split into new servers

15
An SDDS
growth through splits under inserts
Servers
Clients
16
An SDDS
growth through splits under inserts
Servers
Clients
17
An SDDS
growth through splits under inserts
Servers
Clients
18
An SDDS
growth through splits under inserts
Servers
Clients
19
SDDS Addressing Principles

SDDS Clients
Are not informed about the splits.
Do not access any centralized directory for
record address computations
Have each a less or more adequate private image
of the actual file structure
Can make addressing errors
Sending queries or records to incorrect servers
Searching for a record that was moved elsewhere
by splits
Sending a record that should be elsewhere for
the same reason

20
What is an SDDS ?
SDDS Addressing Principles

Servers are able to forward the queries to the
correct address
perhaps in several messages
Servers may send Image Adjustment Messages
Clients do not make same error twice
Servers supports parallel scans
Sent out by multicast or unicast
With deterministic or probabilistic termination
See the SDDS talk papers for more
ceria.dauphine.fr/witold.html
Or the LH ACM-TODS paper (Dec. 96)

21
An SDDSClient Access
Clients
22
An SDDSClient Access
Clients
23
An SDDSClient Access
IAM
Clients
24
An SDDSClient Access
Clients
25
An SDDSClient Access
Clients
26
Known SDDSs
DS
SDDS (1993)
Classics
m-d trees
Hash
1-d tree
LH DDH Breitbart al
Disk
SDLSA
LHm, LHg
LHSA
Security
s-availability
LHs
LHRS
http//ceria.dauphine/SDDS-bibliograhie.html
27
LH (A classic)

Scalable distributed hash partitioning
Transparent for the applications
Unlike the current static schemes (i.e. DB2)
Generalizes the LH addressing schema
used in many products
SQL-Server, IIS, MsExchange, Frontpage, Netscape
suites, Berkeley DB Library, LH-Server, Unify...
Typical load factor 70 - 90
In practice, at most 2 forwarding messages
regardless of the size of the file
In general, 1 message/insert and 2
messages/search on the average
4 messages in the worst case
Several variants are known
LHLH is most studied

28
Overview of LH

Extensible hash algorithm
Widely used, e.g.,
Netscape browser (100M copies)
LH-Server by AR (700K copies sold)
MS Frontpage, Exchange, IIS
tought in most DB and DS classes
address space expands
to avoid overflows access performance
deterioration
the file has buckets with capacity b gtgt 1
Hash by division hi c -gt c mod 2i N provides
the address h (c) of key c.
Buckets split through the replacement of hi
with h i1 i 0,1,..
On the average, b/2 keys move towards new bucket

29
Overview of LH

Basically, a split occurs when some bucket m
overflows
One splits bucket n, pointed by pointer n.
usually m ??n
n évolue 0, 0,1, 0,1,..,2, 0,1..,3, 0,..,7,
0,..,2i N, 0..
One consequence gt no index
characteristic of other EH schemes

30
LH File Evolution
N 1 b 4 i 0 h0 c -gt 20
35 12 7 15 24
0
h0 n 0
31
LH File Evolution
N 1 b 4 i 0 h1 c -gt 21
35 12 7 15 24
0
h1 n 0
32
LH File Evolution
N 1 b 4 i 1 h1 c -gt 21
35 7 15
12 24
0
1
h1 n 0
33
LH File Evolution
N 1 b 4 i 1 h1 c -gt 21
21 11 35 7 15
32 58 12 24
0
1
h1
h1
34
LH File Evolution
N 1 b 4 i 1 h2 c -gt 22
21 11 35 7 15
32 12 24
58
0
1
2
h2
h1
h2
35
LH File Evolution
33 21 11 35 7 15
N 1 b 4 i 1 h2 c -gt 22
32 12 24
58
0
1
2
h2
h1
h2
36
LH File Evolution
N 1 b 4 i 1 h2 c -gt 22
11 35 7 15
32 12 24
33 21
58
0
1
2
3
h2
h2
h2
h2
37
LH File Evolution
N 1 b 4 i 2 h2 c -gt 22
11 35 7 15
32 12 24
33 21
58
0
1
2
3
h2
h2
h2
h2
38
LH File Evolution

Etc
One starts h3 then h4 ...
The file can expand as much as needed
without too many overflows ever

39
Addressing Algorithm

a lt- h (i, c)
if n 0 then exit
else
if a lt n then a lt- h (i1, c)
end

40
LH

Property of LH
Given j i or j i 1, key c is in bucket m
iff
hj (c) m j i or j i 1
Verify yourself
Ideas for LH
LH addresing rule global rule for LH file
every bucket at a server
bucket level j in the header
Check the LH property when the key comes form a
client

41
LH file structure
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
42
LH file structure
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
43
LH split
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
44
LH split
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
45
LH split
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
46
LH Addressing Schema

Client
computes the LH address m of c using its image,
send c to bucket m
Server
Server a getting key c, a m in particular,
computes
a' hj (c)
if a' a then accept c
else a'' hj - 1 (c)
if a'' gt a and a'' lt a' then a' a''
send c to bucket a'

47
LH Addressing Schema

Client
computes the LH address m of c using its image,
send c to bucket m
Server
Server a getting key c, a m in particular,
computes
a' hj (c)
if a' a then accept c
else a'' hj - 1 (c)
if a'' gt a and a'' lt a' then a' a''
send c to bucket a'
See LNS93 for the (long) proof

Simple ?
48
Client Image Adjustement

The IAM consists of address a where the client
sent c and of j (a)
i' is presumed i in client's image.
n' is preumed value of pointer n in client's
image.
initially, i' n' 0.
if j gt i' then i' j - 1, n' a
1
if n' ??2i' then n' 0, i' i' 1
The algo. garantees that client image is within
the file LNS93
if there is no file contractions (merge)

49
LH addressing
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
15
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
50
LH addressing
servers
15
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
51
LH addressing
servers
15
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
j 4
n 3 i 3
n' 1, i' 3
n' 3, i' 2
Coordinator
Client
Client
52
LH addressing
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
9
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
53
LH addressing
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
9
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
54
LH addressing
servers
9
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
55
LH addressing
servers
9
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
j 4
n' 1, i' 3
n' 3, i' 2
Coordinator
Client
Client
56
Result

The distributed file can grow to even whole
Internet so that
every insert and search are done in four
messages (IAM included)
in general an insert is done in one message and
search in two messages
proof in LNS 93

57
10,000 inserts
Global cost
Client's cost
58
(No Transcript)
59
(No Transcript)
60
Inserts by two clients
61
Parallel Queries

A query Q for all buckets of file F with
independent local executions
every buckets should get Q exactly once
The basis for function shipping
fundamental for high-perf. DBMS appl.
Send Mode
multicast
not always possible or convenient
unicast
client may not know all the servers
severs have to forward the query
how ??

Image
File
62
LH Algorithm for Parallel Queries(unicast)

Client sends Q to every bucket a in the image
The message with Q has the message level j'
initialy j' i' if n' ????????i' else j' i'
1
bucket a (of level j ) copies Q to all its
children using the alg.
while j' lt j do
j' j' 1
forward (Q, j' ) to bucket a 2 j' - 1
endwhile
Prove it !

63
Termination of Parallel Query (multicast or
unicast)

How client C knows that last reply came ?
Deterministic Solution (expensive)
Every bucket sends its j, m and selected records
if any
m is its (logical) address
The client terminates when it has every m
fullfiling the condition
m 0,1..., 2 i n where
i min (j) and n min (m) where j i

i1
i
i1
n
64
Termination of Parallel Query (multicast or
unicast)

Probabilistic Termination ( may need less
messaging)
all and only buckets with selected records reply
after each reply C reinitialises a time-out T
C terminates when T expires
Practical choice of T is network and query
dependent
ex. 5 times Ethernet everage retry time
1-2 msec ?
experiments needed
Which termination is finally more useful in
practice ?
an open problem

65
LH variants

With/without load (factor) control
With/without the (split) coordinator
the former one was discussed
the latter one is a token-passing schema
bucket with the token is next to split
if an insert occurs, and file overload is guessed
several algs. for the decision
use cascading splits
See the talk on LH at http//ceria.dauphine.fr/

66
RP schemes

Produce scalable distributed 1-d ordered files
for range search
Each bucket (server) has the unique r ange of
keys it may contain
Ranges partition the key space
Ranges evolve dynamically through splits
Transparently for the application
Use RAM m-ary trees at each server
Like B-trees
Optimized for the RP split efficiency

67
Current PDBMS technology (Ex. Non-Stop SQL)

Static Range Partitioning
Done manually by DBA
Requires goods skills
Not scalable

68
RP schemes
69
High-availability SDDS schemes

Data remain available despite
any single server failure most of two server
failures
or any up to n-server failure
and some catastrophic failures
n scales with the file size
To offset the reliability decline which would
otherwise occur
Three principles for high-availability SDDS
schemes are currently known
mirroring (LHm)
striping (LHs)
grouping (LHg, LHsa, LHrs, RP rs)
Realize different performance trade-offs

70
LHRS Record Groups

LHRS records
LH data records parity records
Records with same rank r in the bucket group form
a record group
Each record group gets n parity records
Computed using Reed-Salomon erasure correction
codes
Additions ans multiplications in Galois Fields
See the Sigmod 2000 paper on the Web site for
details
r is the common key of these records
Each group supports unavailability of up to n of
its members

71
LHRS Record Groups
Data records
Parity records
72
LHRS Parity Management

An insert of data record with rank r creates or,
usually, updates parity records r
An update of data record with rank r updates
parity records r
A split recreates parity records
Data record usually change the rank after the
split

73
LHRS Scalable availability

Create 1 parity bucket per group until M 2i1
buckets
Then, at each split,
add 2nd parity bucket to each existing group
create 2 parity buckets for new groups until 2i2
buckets
etc.

74
LHRS Scalable availability
75
LHRS Scalable availability
76
LHRS Scalable availability
77
LHRS Scalable availability
78
LHRS Scalable availability
79
SDDS-2000 global architecture
80
SDDS-2000 global architecture
Applications
Applications
Applications
etc
UDP TCP
81
SDDS-2000 Client Architecture (RPc)

2 Modules
Send Module
Receive Module
Multithread Architecture
SendRequest
ReceiveRequest
AnalyzeResponse1..4
GetRequest
ReturnResponse
Synchronization Queues
Client Images
Flow control

82
SDDS-2000 Server Architecture (RPc)

Multithread architecture
Synchronization queues
Listen Thread for incoming requests
SendAck Thread for flow control
Work Threads for
request processing
response sendout
request forwarding
UDP for shorter messages (lt 64K)
TCP/IP for longer data exchanges
Several buckets of different SDDS files

83
AMOS-SDDS Architecture

For database queries
Especially parallel scans
Couples SDDS-2000 and Amos II
RAM OR-DBMS.
AMOSQL declarative query language
can be embedded into C and Lisp.
call-level interface (callin )
external procedures (functions) (callout)
See the AMOS-II talk papers for more
http//www.dis.uu.se/udbl/

84
AMOS-SDDS Architecture

SDDS is used as the distributed RAM storage
manager.
RP scheme for the scalable distributed range
partitioning.
Like in a B-tree, records are lexicographically
ordered according to their keys.
Supports efficiently the range queries.
Amos II provide a fast SDDS-based RAM OR-DBMS.
The callout capability realizes the AMOSQL
object-relational capability usually called
external or foreign functions.

85
AMOS-SDDS Architecture
AMOS-SDDS Architecture
86
AMOS-SDDS Architecture
AMOS-SDDS scalable distributed query processing
87
AMOS-SDDS Server Query Processing

E-strategy
Data stay external to AMOS
within the SDDS bucket
Custom foreign functions perform the query
I-strategy
Data are on-the-fly imported into AMOS-II
Perhaps with the local index creation
Good for joins
AMOS performs the query
Which strategy is preferable ?
Good question

88
SD-AMOS

Server storage manager is a full scale DBMS
AMOS-II in our case since it is a RAM DBMS
Could be any DBMS
SDDS-2000 provides the scalable distributed
partitioning schema
Server DBMS performs the splits
When et How ???
Client manages scalable query decomposition
execution
Easier said than done
The whole system generalizes the PDBMS technology
Static partitioning only

89
Scalability Analysis

Theoretical
To validate an SDDS
See the papers
To get an idea of system performance
Limited validity
Experimental
More accurate validation of design issues
Practical necessity
Costs orders of magnitude more of time and money

90
Experimental Configuration

6 machines 700 Mhz P3
100 Mb/s Ethernet
150 byte records

91
LH file creation
LH Scalability is confirmed
Time (ms)
with splits w / splits
of buckets
Time (ms)
with splits w / splits
of inserts
Time (ms)
Bucket size b 5.000 Flow control On
of inserts
Performance bound by the client processing speed
Ph. D Thesis of F. Bennour, 2000
92
LH Key search
Actual client image New client image IAMs
Time (ms)
2
3
4
5
File Size in records and servers
Performance bound by the client processing speed
93
LHRS Experimental Performance(Preliminary
results)
Insert time during the file creation (moving
average)
Time (ms)
Number of records
94
LHRS Experimental Performance(Preliminary
results)
File Creation Time
Time (s)
Number of records
95
LHRS Experimental Performance(Preliminary
results)

Normal key search
Unaffected by the parity calculus
0.3 ms per key search
Degraded key search
About 2 ms for the application
1.1 ms (k 4) for the record recovery
1 ms for the client time-out and the coordinator
action
Bucket recovery at the spare
0.3 ms per record (k 4)

96
LHRS Experimental Performance(Preliminary
results)
Insert time during the file creation (moving
average)
Time (ms)
Number of records
97
LHRS Experimental Performance(Preliminary
results)
File Creation Time
Time (s)
Number of records
98
LHRS Experimental Performance(Preliminary
results)

Normal key search
Unaffected by the parity calculus
0.3 ms per key search
Degraded key search
About 2 ms for the application
1.1 ms (k 4) for the record recovery
1 ms for the client time-out and the coordinator
action
Bucket recovery at the spare
0.3 ms per record (k 4)

99
Performance Analysis (RP)

Experimental Environment
Six Pentium III 700 MHz
Windows 2000
128 MB RAM extended later to 256 MB RAM
100 Mb/s Ethernet
Messages
180 bytes 80 for the header, 100 for the
record
Keys are random integers within some interval
Flow Control sliding window of 10 messages
Index
Capacity of an internal node 80 index elements
Capacity of a leaf 100 records

100
Performance Analysis

File Creation
Bucket capacity 50.000 records
150.000 random inserts by a single client
With flow control (FC) or without

File creation time
Average insert time
101
Discussion

Creation time is almost linearly scalable
Flow control is quite expensive
Losses without were negligible
Both schemes perform almost equally well
RPC slightly better
As one could expect
Insert time 30 faster than for a disk file
Insert time appears bound by the client speed

102
Performance AnalysisFile Creation

File created by 120.000 random inserts by 2
clients
Without flow control

Comparative file creation time by one or two
clients
File creation by two clients total time and per
insert
103
Discussion

Performance improves
Insert times appear bound by a server speed
More clients would not improve performance of a
server

104
Performance AnalysisSplit Time
Split times versus bucket capacity
105
Discusion

About linear scalability in function of bucket
size
Larger buckets are more efficient
Splitting is very efficient
Reaching as little as 40 ?s per record

106
Performance AnalysisKey Search

A single client sends 100.000 successful random
search requests
Flow control the client sends at most 10
requests without reply

Search time (ms)
107
Performance AnalysisKey Search

A single client sends 100.000 successful random
search requests
Flow control the client sends at most 10
requests without reply

Total search time
Search time per record
108
Discussion

Single search time about 30 times faster than for
a disk file
350 ?s per search
Search throughput more than 65 times faster than
that of a disk file
145 ?s per search
RPN appears again surprisingly efficient with
respect RPc for more buckets

109
Performance Analysis

Range Query
Deterministic termination
Parallel scan of the entire file with all the
100.000 records sent to the client

Range query total time
Range query time per record
110
Discussion

Range search appears also very efficient
Reaching 10 ?s per record delivered
More servers should further improve the
efficiency
Curves do not become flat yet

111
Range Query Parallel Execution Strategies
Study of MM. Tsangou (Master Th.) Prof. Samba
(U. Dakar)
Sc. 3 1 server at the time
Sc. 1,2 all servers together
Sc. 1 single connection request per server
Response Time (ms)
servers
112
File Size Limits

Bucket capacity 751K records, 196 MB
Number of inserts 3M
Flow control (FC) is necessary to limit the input
queue at each server

113
File Size Limits

Bucket capacity 751K records, 196 MB
Number of inserts 3M
GA Global Average MA Moving Average

114
Related Works
Suspicious
Comparative Analysis
115
AMOS-SDDS

Benchmark data
Table Pers (SS, Name, City)
Size 20.000 to 300.000 tuples
50 Cities
Random distribution
Benchmark queries
Join SS and Name of persons in the same city
Nested loop or Local index
Count Join Count couples in the same city
To determine the result transfer time to the
client
Count () Pers, Max (SS) from Pers
Measures
Scale-up Speed-up
Comparison to AMOS-II alone

116
Join best time per tuple
14.4 2.4 1.6
20.000 tuples in Pers
AMOS-II alone 13.5 Nested loop 2.25 Index
lookup
3,990,070 tuples produced
117
Join Count best time per tupleI-strategy
1.8 1.0
20.000 tuples in Pers
AMOS-II alone 13.5 Nested loop 2.25 Index
lookup
3,990,070 tuples produced
118
Join Speed-upI-strategy
20.000 tuples in Pers
AMOS-II alone 13.5 Nested loop 2.25 Index
lookup
3,990,070 tuples produced
119
Count Speed-up
341
E-strategy wins
100.000 tuples in Pers
AMOS-II alone 280 ms
120
Join Scale-up Performance

The file scales to 300.000 tuples
Spreading from 1 to 15 AMOS-SDDS Servers
Transparently for the application !
3 servers per machine
The poor men configuration has only 5 server
machines
Results are extrapolated to 1 server per machine
Basically, the CPU component of the elapsed time
time is divided by 3

121
Join Elapsed Time Scale-up
AMOS-SDDS I-Strategy with Index Lookup Join
122
Join Elapsed Time Scale-up
123
Join Time per Tuple Scale-up
Better scalability than any current P-DBMS could
provide
Join w. Count flat !
124
SD-AMOS File Creation
Insert Time (ms)
servers inserts
Global Avg. Moving Avg.
Flat unexpectedly fast insert time
b 4.000
125
SD-AMOS Large File Creation
Insert Time (ms)
Global Avg. Moving Avg.
servers inserts
Flat fast insert time remains
b 40.000
126
SD-AMOS Large File Search(time per record)
File of 300K Records
Client max process. speed is reached
127
SD-AMOS Very Large File Creation
Bucket size 750 K records Max file size 3M
records Record size 100 B
128
Conclusion

SDDS-2000 a prototype SDDS manager for Windows
multicomputer
Several variants of LH and RP
High-availability
Scalable distributed database query processing
AMOS-SDDS SD-AMOS

129
Conclusion

Experimental performance of SDDS schemes appears
in line with the expectations
Record search insert times in the range of a
fraction of a millisecond
About 30 to 100 times faster than a disk file
access performance
About ideal (linear) scalability
Including the query processing
Results prove the overall efficiency of SDDS-2000
system

130
Current Future Work

SDDS-2000 Implementation
High-Availability through RS-Codes
CERIA U. Santa Clara (Prof. Th. Schwarz)
Disk-based High-Availability SDDSs
CERIA IBM-Almaden (J. Menon)
Parallel Queries
U. Dakar (Prof. S. Ndiaye) CERIA
Concurrency Transactions
U. Dakar (Prof. T. Seck)
Overall Performance Analysis
SD-AMOS SD-DBMS in general
CERIA U. Uppsala (Prof. T. Risch) U. Dakar
SD-SQL-Server ?
Extent dependent basically on available funding

131
Credits

SDDS-2000 Implementation
CERIA Ph. D Students
F. Bennour (Now Post-Doc) (SDDS-2000 LH)
A. Wan Diene, Y. Ndiaye (SDDS-2000 RP
AMOS-SDDS SD-AMOS)
R. Moussa (RS-Parity Subsystem)
Master Student Thesis
At CERIA and coop. Universities
See Ceria Web page ceria.dauphine.fr
Partial Support for SDDS-2000 research
HPL, IBM Research, Microsoft Research

132
Problems Exercices

Install SDDS-2000 and experiment with the
interactive appl. Comment the experience on a few
pages.
Create your favorite appl. Using the .h
interfaces provided with the package.
Comment on on a few pages on LHrs goal and way
to work, on the basis of the Sigmod paper and the
WDAS-2002 paper by Rim Moussa.
Comment on a few pages on the strategies for a
scalable distributed hash joins according to D.
Schneider al paper. Your own ?
Can you propose how to deal with Theta-joins ?
You should split a table under one SQL Server
into two tables on two SQL Servers. You wish to
generate RP like range partitioning on the key
attribute(s). You wish to use as much as possible
standard SQL queries
How can you find the record with median (middle)
key ?
What are the catalog tables where you can find
the table size, the associated check constraints,
indexes, triggers, stored procedures to move to
the new node
How will you move the records that should leave
to the new node.
Idem for a stored funciton under Amos
Idem for Oracle
Idem for DB2