Title: SDDS2000 : A Prototype System for Scalable Distributed Data Structures on Windows 2000
1 SDDS-2000 A Prototype System for Scalable
Distributed Data Structures on Windows 2000
- Witold Litwin
- Witold.Litwin_at_dauphine.fr
2Plan
- What are SDDSs ?
- Where are we in 2002 ?
- LH Scalable Distributed Hash Partitioning
- Scalable Distributed Range Partitioning RP
- High-Availability LHRS RPRS
- DBMS coupling AMOS-SDDS SD-AMOS
- Architecture of SDDS-2000
- Experimental performance results
- Conclusion
- Future work
3What is an SDDS
- A new type of data structure
- Specifically for multicomputers
- Designed for data intensive files
- horizontal scalability to very large sizes
- larger than any single-site file
- parallel and distributed processing
- especially in (distributed) RAM
- Record access time better than for any disk file
- 100-300 ?s usually under Win 2000 (100 Mb/s net,
700 MHZ CPU, 100B 1 KB records) - access by multiple autonomous clients
4Killer apps
- Any traditional application using a large hash or
B-tree or k-d file - Access time to an RAM SDDS record is 100 faster
than to a disk record - Network storage servers (SNA and NAS)
- DBMSs
- WEB servers
- Video servers
- Real-time systems
- High Perf. Comp.
5Multicomputers
- A collection of loosely coupled computers
- common and/or preexisting hardware
- share nothing architecture
- message passing through high-speed net
(??????Mb/s) - Network multicomputers
- use general purpose nets
- LANs Ethernet, Token Ring, Fast Ethernet, SCI,
FDDI... - WANs ATM...
- Switched multicomputers
- use a bus, or a switch
- e.g., IBM-SP2, Parsytec
6Typical Network Multicomputer
Client
Server
Network segments
7Why multicomputers ?
- Potentially unbeatable price-performance ratio
- Much cheaper and more powerful than
supercomputers - 1500 WSs at HPL with 500 GB of RAM TBs of
disks - Potential computing power
- file size
- access and processing time
- throughput
- For more pro cons
- Bill Gates at Microsoft Scalability Day
- NOW project (UC Berkeley)
- Tanenbaum "Distributed Operating Systems",
Prentice Hall, 1995 - www.microoft.com White Papers from Business
Syst. Div.
8Why SDDSs
- Multicomputers need data structures and file
systems - Trivial extensions of traditional structures are
not best - hot-spots
- scalability
- parallel queries
- distributed and autonomous clients
- distributed RAM distance to data
9Distance to data(Jim Gray)
10 msec
local disk
distant RAM (Ethernet)
100 ?sec
distant RAM (gigabit net)
1 ?sec
100 ns
RAM
10Distance to Data(Jim Gray)
Moon
10 ms
local disk
8 d
distant RAM (Ethernet)
100 ?s
2 h
distant RAM (gigabit net)
10 ?s
10 m
RAM
1 ?s
1 m
11Scalability Dimensions (Client view)
Scale-up
Sub-linear (usuel)
Operation time (distant RAM)
Linear (idéal)
operation / s
Data size servers and clients
12Scalability Dimensions (Client view, SDDS
specific)
Tapes, juke-box
Sub-linear (usuel)
Scale-up
Operation time
Cluster- Computer
Disk Cache
Cache Disk
Linear
Multicomputer SDDS
Single comp.
Local RAM
Data size
Cluster-Computer- fix of servers
13Scalability Dimensions (Client view)
Speed-up
operations/ s
Linear (ideal)
Sub-linear (usuel)
servers
14What is an SDDS ?
- Queries come from multiple autonomous clients
- Data are on servers
- Data are structured
- records with keys ? objects with OIDs
- more semantics than in Unix flat-file model
- abstraction most popular with applications
- parallel scans function shipping
- Overflowing servers split into new servers
15An SDDS
growth through splits under inserts
Servers
Clients
16An SDDS
growth through splits under inserts
Servers
Clients
17An SDDS
growth through splits under inserts
Servers
Clients
18An SDDS
growth through splits under inserts
Servers
Clients
19SDDS Addressing Principles
- SDDS Clients
- Are not informed about the splits.
- Do not access any centralized directory for
record address computations - Have each a less or more adequate private image
of the actual file structure - Can make addressing errors
- Sending queries or records to incorrect servers
- Searching for a record that was moved elsewhere
by splits - Sending a record that should be elsewhere for
the same reason
20What is an SDDS ?
SDDS Addressing Principles
- Servers are able to forward the queries to the
correct address - perhaps in several messages
- Servers may send Image Adjustment Messages
- Clients do not make same error twice
- Servers supports parallel scans
- Sent out by multicast or unicast
- With deterministic or probabilistic termination
- See the SDDS talk papers for more
- ceria.dauphine.fr/witold.html
- Or the LH ACM-TODS paper (Dec. 96)
21An SDDSClient Access
Clients
22An SDDSClient Access
Clients
23An SDDSClient Access
IAM
Clients
24An SDDSClient Access
Clients
25An SDDSClient Access
Clients
26Known SDDSs
DS
SDDS (1993)
Classics
m-d trees
Hash
1-d tree
LH DDH Breitbart al
Disk
SDLSA
LHm, LHg
LHSA
Security
s-availability
LHs
LHRS
http//ceria.dauphine/SDDS-bibliograhie.html
27LH (A classic)
- Scalable distributed hash partitioning
- Transparent for the applications
- Unlike the current static schemes (i.e. DB2)
- Generalizes the LH addressing schema
- used in many products
- SQL-Server, IIS, MsExchange, Frontpage, Netscape
suites, Berkeley DB Library, LH-Server, Unify... - Typical load factor 70 - 90
- In practice, at most 2 forwarding messages
- regardless of the size of the file
- In general, 1 message/insert and 2
messages/search on the average - 4 messages in the worst case
- Several variants are known
- LHLH is most studied
28Overview of LH
- Extensible hash algorithm
- Widely used, e.g.,
- Netscape browser (100M copies)
- LH-Server by AR (700K copies sold)
- MS Frontpage, Exchange, IIS
- tought in most DB and DS classes
- address space expands
- to avoid overflows access performance
deterioration - the file has buckets with capacity b gtgt 1
- Hash by division hi c -gt c mod 2i N provides
the address h (c) of key c. - Buckets split through the replacement of hi
with h i1 i 0,1,.. - On the average, b/2 keys move towards new bucket
29Overview of LH
- Basically, a split occurs when some bucket m
overflows - One splits bucket n, pointed by pointer n.
- usually m ??n
- n évolue 0, 0,1, 0,1,..,2, 0,1..,3, 0,..,7,
0,..,2i N, 0.. - One consequence gt no index
- characteristic of other EH schemes
30LH File Evolution
N 1 b 4 i 0 h0 c -gt 20
35 12 7 15 24
0
h0 n 0
31LH File Evolution
N 1 b 4 i 0 h1 c -gt 21
35 12 7 15 24
0
h1 n 0
32LH File Evolution
N 1 b 4 i 1 h1 c -gt 21
35 7 15
12 24
0
1
h1 n 0
33LH File Evolution
N 1 b 4 i 1 h1 c -gt 21
21 11 35 7 15
32 58 12 24
0
1
h1
h1
34LH File Evolution
N 1 b 4 i 1 h2 c -gt 22
21 11 35 7 15
32 12 24
58
0
1
2
h2
h1
h2
35LH File Evolution
33 21 11 35 7 15
N 1 b 4 i 1 h2 c -gt 22
32 12 24
58
0
1
2
h2
h1
h2
36LH File Evolution
N 1 b 4 i 1 h2 c -gt 22
11 35 7 15
32 12 24
33 21
58
0
1
2
3
h2
h2
h2
h2
37LH File Evolution
N 1 b 4 i 2 h2 c -gt 22
11 35 7 15
32 12 24
33 21
58
0
1
2
3
h2
h2
h2
h2
38LH File Evolution
- Etc
- One starts h3 then h4 ...
- The file can expand as much as needed
- without too many overflows ever
39Addressing Algorithm
- a lt- h (i, c)
- if n 0 then exit
- else
- if a lt n then a lt- h (i1, c)
- end
40LH
- Property of LH
- Given j i or j i 1, key c is in bucket m
iff - hj (c) m j i or j i 1
- Verify yourself
- Ideas for LH
- LH addresing rule global rule for LH file
- every bucket at a server
- bucket level j in the header
- Check the LH property when the key comes form a
client
41LH file structure
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
42LH file structure
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
43LH split
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
44LH split
servers
j 4
j 4
j 3
j 3
j 4
j 4
0
1
2
7
8
9
n 2 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
45LH split
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
46LH Addressing Schema
- Client
- computes the LH address m of c using its image,
- send c to bucket m
- Server
- Server a getting key c, a m in particular,
computes - a' hj (c)
- if a' a then accept c
- else a'' hj - 1 (c)
- if a'' gt a and a'' lt a' then a' a''
- send c to bucket a'
47LH Addressing Schema
- Client
- computes the LH address m of c using its image,
- send c to bucket m
- Server
- Server a getting key c, a m in particular,
computes - a' hj (c)
- if a' a then accept c
- else a'' hj - 1 (c)
- if a'' gt a and a'' lt a' then a' a''
- send c to bucket a'
- See LNS93 for the (long) proof
Simple ?
48Client Image Adjustement
- The IAM consists of address a where the client
sent c and of j (a) - i' is presumed i in client's image.
- n' is preumed value of pointer n in client's
image. - initially, i' n' 0.
- if j gt i' then i' j - 1, n' a
1 - if n' ??2i' then n' 0, i' i' 1
- The algo. garantees that client image is within
the file LNS93 - if there is no file contractions (merge)
49LH addressing
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
15
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
50LH addressing
servers
15
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
51LH addressing
servers
15
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
j 4
n 3 i 3
n' 1, i' 3
n' 3, i' 2
Coordinator
Client
Client
52LH addressing
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
9
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
53LH addressing
servers
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
9
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
54LH addressing
servers
9
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
n' 0, i' 0
n' 3, i' 2
Coordinator
Client
Client
55LH addressing
servers
9
j 4
j 4
j 4
j 3
j 4
j 4
j 4
0
1
2
7
8
9
10
n 3 i 3
j 4
n' 1, i' 3
n' 3, i' 2
Coordinator
Client
Client
56Result
- The distributed file can grow to even whole
Internet so that - every insert and search are done in four
messages (IAM included) - in general an insert is done in one message and
search in two messages - proof in LNS 93
5710,000 inserts
Global cost
Client's cost
58(No Transcript)
59(No Transcript)
60Inserts by two clients
61Parallel Queries
- A query Q for all buckets of file F with
independent local executions - every buckets should get Q exactly once
- The basis for function shipping
- fundamental for high-perf. DBMS appl.
- Send Mode
- multicast
- not always possible or convenient
- unicast
- client may not know all the servers
- severs have to forward the query
- how ??
Image
File
62LH Algorithm for Parallel Queries(unicast)
- Client sends Q to every bucket a in the image
- The message with Q has the message level j'
- initialy j' i' if n' ????????i' else j' i'
1 - bucket a (of level j ) copies Q to all its
children using the alg. - while j' lt j do
- j' j' 1
- forward (Q, j' ) to bucket a 2 j' - 1
- endwhile
- Prove it !
63Termination of Parallel Query (multicast or
unicast)
- How client C knows that last reply came ?
- Deterministic Solution (expensive)
- Every bucket sends its j, m and selected records
if any - m is its (logical) address
- The client terminates when it has every m
fullfiling the condition - m 0,1..., 2 i n where
- i min (j) and n min (m) where j i
i1
i
i1
n
64Termination of Parallel Query (multicast or
unicast)
- Probabilistic Termination ( may need less
messaging) - all and only buckets with selected records reply
- after each reply C reinitialises a time-out T
- C terminates when T expires
- Practical choice of T is network and query
dependent - ex. 5 times Ethernet everage retry time
- 1-2 msec ?
- experiments needed
- Which termination is finally more useful in
practice ? - an open problem
65LH variants
- With/without load (factor) control
- With/without the (split) coordinator
- the former one was discussed
- the latter one is a token-passing schema
- bucket with the token is next to split
- if an insert occurs, and file overload is guessed
- several algs. for the decision
- use cascading splits
- See the talk on LH at http//ceria.dauphine.fr/
66RP schemes
- Produce scalable distributed 1-d ordered files
- for range search
- Each bucket (server) has the unique r ange of
keys it may contain - Ranges partition the key space
- Ranges evolve dynamically through splits
- Transparently for the application
- Use RAM m-ary trees at each server
- Like B-trees
- Optimized for the RP split efficiency
67Current PDBMS technology (Ex. Non-Stop SQL)
- Static Range Partitioning
- Done manually by DBA
- Requires goods skills
- Not scalable
68RP schemes
69High-availability SDDS schemes
- Data remain available despite
- any single server failure most of two server
failures - or any up to n-server failure
- and some catastrophic failures
- n scales with the file size
- To offset the reliability decline which would
otherwise occur - Three principles for high-availability SDDS
schemes are currently known - mirroring (LHm)
- striping (LHs)
- grouping (LHg, LHsa, LHrs, RP rs)
- Realize different performance trade-offs
70LHRS Record Groups
- LHRS records
- LH data records parity records
- Records with same rank r in the bucket group form
a record group - Each record group gets n parity records
- Computed using Reed-Salomon erasure correction
codes - Additions ans multiplications in Galois Fields
- See the Sigmod 2000 paper on the Web site for
details - r is the common key of these records
- Each group supports unavailability of up to n of
its members
71LHRS Record Groups
Data records
Parity records
72LHRS Parity Management
- An insert of data record with rank r creates or,
usually, updates parity records r - An update of data record with rank r updates
parity records r - A split recreates parity records
- Data record usually change the rank after the
split
73LHRS Scalable availability
- Create 1 parity bucket per group until M 2i1
buckets - Then, at each split,
- add 2nd parity bucket to each existing group
- create 2 parity buckets for new groups until 2i2
buckets - etc.
74LHRS Scalable availability
75LHRS Scalable availability
76LHRS Scalable availability
77LHRS Scalable availability
78LHRS Scalable availability
79SDDS-2000 global architecture
80SDDS-2000 global architecture
Applications
Applications
Applications
etc
UDP TCP
81SDDS-2000 Client Architecture (RPc)
- 2 Modules
- Send Module
- Receive Module
- Multithread Architecture
- SendRequest
- ReceiveRequest
- AnalyzeResponse1..4
- GetRequest
- ReturnResponse
- Synchronization Queues
- Client Images
- Flow control
82SDDS-2000 Server Architecture (RPc)
- Multithread architecture
- Synchronization queues
- Listen Thread for incoming requests
- SendAck Thread for flow control
- Work Threads for
- request processing
- response sendout
- request forwarding
- UDP for shorter messages (lt 64K)
- TCP/IP for longer data exchanges
- Several buckets of different SDDS files
83AMOS-SDDS Architecture
- For database queries
- Especially parallel scans
- Couples SDDS-2000 and Amos II
- RAM OR-DBMS.
- AMOSQL declarative query language
- can be embedded into C and Lisp.
- call-level interface (callin )
- external procedures (functions) (callout)
- See the AMOS-II talk papers for more
- http//www.dis.uu.se/udbl/
84AMOS-SDDS Architecture
- SDDS is used as the distributed RAM storage
manager. - RP scheme for the scalable distributed range
partitioning. - Like in a B-tree, records are lexicographically
ordered according to their keys. - Supports efficiently the range queries.
- Amos II provide a fast SDDS-based RAM OR-DBMS.
- The callout capability realizes the AMOSQL
object-relational capability usually called
external or foreign functions.
85AMOS-SDDS Architecture
AMOS-SDDS Architecture
86AMOS-SDDS Architecture
AMOS-SDDS scalable distributed query processing
87AMOS-SDDS Server Query Processing
- E-strategy
- Data stay external to AMOS
- within the SDDS bucket
- Custom foreign functions perform the query
- I-strategy
- Data are on-the-fly imported into AMOS-II
- Perhaps with the local index creation
- Good for joins
- AMOS performs the query
- Which strategy is preferable ?
- Good question
88SD-AMOS
- Server storage manager is a full scale DBMS
- AMOS-II in our case since it is a RAM DBMS
- Could be any DBMS
- SDDS-2000 provides the scalable distributed
partitioning schema - Server DBMS performs the splits
- When et How ???
- Client manages scalable query decomposition
execution - Easier said than done
- The whole system generalizes the PDBMS technology
- Static partitioning only
89Scalability Analysis
- Theoretical
- To validate an SDDS
- See the papers
- To get an idea of system performance
- Limited validity
- Experimental
- More accurate validation of design issues
- Practical necessity
- Costs orders of magnitude more of time and money
90Experimental Configuration
- 6 machines 700 Mhz P3
- 100 Mb/s Ethernet
- 150 byte records
91LH file creation
LH Scalability is confirmed
Time (ms)
with splits w / splits
of buckets
Time (ms)
with splits w / splits
of inserts
Time (ms)
Bucket size b 5.000 Flow control On
of inserts
Performance bound by the client processing speed
Ph. D Thesis of F. Bennour, 2000
92LH Key search
Actual client image New client image IAMs
Time (ms)
2
3
4
5
File Size in records and servers
Performance bound by the client processing speed
93LHRS Experimental Performance(Preliminary
results)
Insert time during the file creation (moving
average)
Time (ms)
Number of records
94LHRS Experimental Performance(Preliminary
results)
File Creation Time
Time (s)
Number of records
95LHRS Experimental Performance(Preliminary
results)
- Normal key search
- Unaffected by the parity calculus
- 0.3 ms per key search
- Degraded key search
- About 2 ms for the application
- 1.1 ms (k 4) for the record recovery
- 1 ms for the client time-out and the coordinator
action - Bucket recovery at the spare
- 0.3 ms per record (k 4)
96LHRS Experimental Performance(Preliminary
results)
Insert time during the file creation (moving
average)
Time (ms)
Number of records
97LHRS Experimental Performance(Preliminary
results)
File Creation Time
Time (s)
Number of records
98LHRS Experimental Performance(Preliminary
results)
- Normal key search
- Unaffected by the parity calculus
- 0.3 ms per key search
- Degraded key search
- About 2 ms for the application
- 1.1 ms (k 4) for the record recovery
- 1 ms for the client time-out and the coordinator
action - Bucket recovery at the spare
- 0.3 ms per record (k 4)
99Performance Analysis (RP)
- Experimental Environment
- Six Pentium III 700 MHz
- Windows 2000
- 128 MB RAM extended later to 256 MB RAM
- 100 Mb/s Ethernet
- Messages
- 180 bytes 80 for the header, 100 for the
record - Keys are random integers within some interval
- Flow Control sliding window of 10 messages
- Index
- Capacity of an internal node 80 index elements
- Capacity of a leaf 100 records
100Performance Analysis
- File Creation
- Bucket capacity 50.000 records
- 150.000 random inserts by a single client
- With flow control (FC) or without
File creation time
Average insert time
101Discussion
- Creation time is almost linearly scalable
- Flow control is quite expensive
- Losses without were negligible
- Both schemes perform almost equally well
- RPC slightly better
- As one could expect
- Insert time 30 faster than for a disk file
- Insert time appears bound by the client speed
102Performance AnalysisFile Creation
- File created by 120.000 random inserts by 2
clients - Without flow control
Comparative file creation time by one or two
clients
File creation by two clients total time and per
insert
103Discussion
- Performance improves
- Insert times appear bound by a server speed
- More clients would not improve performance of a
server
104Performance AnalysisSplit Time
Split times versus bucket capacity
105Discusion
- About linear scalability in function of bucket
size - Larger buckets are more efficient
- Splitting is very efficient
- Reaching as little as 40 ?s per record
106Performance AnalysisKey Search
- A single client sends 100.000 successful random
search requests - Flow control the client sends at most 10
requests without reply
Search time (ms)
107Performance AnalysisKey Search
- A single client sends 100.000 successful random
search requests - Flow control the client sends at most 10
requests without reply
Total search time
Search time per record
108Discussion
- Single search time about 30 times faster than for
a disk file - 350 ?s per search
- Search throughput more than 65 times faster than
that of a disk file - 145 ?s per search
- RPN appears again surprisingly efficient with
respect RPc for more buckets
109Performance Analysis
- Range Query
- Deterministic termination
- Parallel scan of the entire file with all the
100.000 records sent to the client
Range query total time
Range query time per record
110Discussion
- Range search appears also very efficient
- Reaching 10 ?s per record delivered
- More servers should further improve the
efficiency - Curves do not become flat yet
111Range Query Parallel Execution Strategies
Study of MM. Tsangou (Master Th.) Prof. Samba
(U. Dakar)
Sc. 3 1 server at the time
Sc. 1,2 all servers together
Sc. 1 single connection request per server
Response Time (ms)
servers
112File Size Limits
- Bucket capacity 751K records, 196 MB
- Number of inserts 3M
- Flow control (FC) is necessary to limit the input
queue at each server
113File Size Limits
- Bucket capacity 751K records, 196 MB
- Number of inserts 3M
- GA Global Average MA Moving Average
114Related Works
Suspicious
Comparative Analysis
115AMOS-SDDS
- Benchmark data
- Table Pers (SS, Name, City)
- Size 20.000 to 300.000 tuples
- 50 Cities
- Random distribution
- Benchmark queries
- Join SS and Name of persons in the same city
- Nested loop or Local index
- Count Join Count couples in the same city
- To determine the result transfer time to the
client - Count () Pers, Max (SS) from Pers
- Measures
- Scale-up Speed-up
- Comparison to AMOS-II alone
116Join best time per tuple
14.4 2.4 1.6
20.000 tuples in Pers
AMOS-II alone 13.5 Nested loop 2.25 Index
lookup
3,990,070 tuples produced
117Join Count best time per tupleI-strategy
1.8 1.0
20.000 tuples in Pers
AMOS-II alone 13.5 Nested loop 2.25 Index
lookup
3,990,070 tuples produced
118Join Speed-upI-strategy
20.000 tuples in Pers
AMOS-II alone 13.5 Nested loop 2.25 Index
lookup
3,990,070 tuples produced
119Count Speed-up
341
E-strategy wins
100.000 tuples in Pers
AMOS-II alone 280 ms
120Join Scale-up Performance
- The file scales to 300.000 tuples
- Spreading from 1 to 15 AMOS-SDDS Servers
- Transparently for the application !
- 3 servers per machine
- The poor men configuration has only 5 server
machines - Results are extrapolated to 1 server per machine
- Basically, the CPU component of the elapsed time
time is divided by 3
121Join Elapsed Time Scale-up
AMOS-SDDS I-Strategy with Index Lookup Join
122Join Elapsed Time Scale-up
123Join Time per Tuple Scale-up
Better scalability than any current P-DBMS could
provide
Join w. Count flat !
124SD-AMOS File Creation
Insert Time (ms)
servers inserts
Global Avg. Moving Avg.
Flat unexpectedly fast insert time
b 4.000
125SD-AMOS Large File Creation
Insert Time (ms)
Global Avg. Moving Avg.
servers inserts
Flat fast insert time remains
b 40.000
126SD-AMOS Large File Search(time per record)
File of 300K Records
Client max process. speed is reached
127SD-AMOS Very Large File Creation
Bucket size 750 K records Max file size 3M
records Record size 100 B
128Conclusion
- SDDS-2000 a prototype SDDS manager for Windows
multicomputer - Several variants of LH and RP
- High-availability
- Scalable distributed database query processing
- AMOS-SDDS SD-AMOS
129Conclusion
- Experimental performance of SDDS schemes appears
in line with the expectations - Record search insert times in the range of a
fraction of a millisecond - About 30 to 100 times faster than a disk file
access performance - About ideal (linear) scalability
- Including the query processing
- Results prove the overall efficiency of SDDS-2000
system
130Current Future Work
- SDDS-2000 Implementation
- High-Availability through RS-Codes
- CERIA U. Santa Clara (Prof. Th. Schwarz)
- Disk-based High-Availability SDDSs
- CERIA IBM-Almaden (J. Menon)
- Parallel Queries
- U. Dakar (Prof. S. Ndiaye) CERIA
- Concurrency Transactions
- U. Dakar (Prof. T. Seck)
- Overall Performance Analysis
- SD-AMOS SD-DBMS in general
- CERIA U. Uppsala (Prof. T. Risch) U. Dakar
- SD-SQL-Server ?
- Extent dependent basically on available funding
131Credits
- SDDS-2000 Implementation
- CERIA Ph. D Students
- F. Bennour (Now Post-Doc) (SDDS-2000 LH)
- A. Wan Diene, Y. Ndiaye (SDDS-2000 RP
AMOS-SDDS SD-AMOS) - R. Moussa (RS-Parity Subsystem)
- Master Student Thesis
- At CERIA and coop. Universities
- See Ceria Web page ceria.dauphine.fr
- Partial Support for SDDS-2000 research
- HPL, IBM Research, Microsoft Research
132Problems Exercices
- Install SDDS-2000 and experiment with the
interactive appl. Comment the experience on a few
pages. - Create your favorite appl. Using the .h
interfaces provided with the package. - Comment on on a few pages on LHrs goal and way
to work, on the basis of the Sigmod paper and the
WDAS-2002 paper by Rim Moussa. - Comment on a few pages on the strategies for a
scalable distributed hash joins according to D.
Schneider al paper. Your own ? - Can you propose how to deal with Theta-joins ?
- You should split a table under one SQL Server
into two tables on two SQL Servers. You wish to
generate RP like range partitioning on the key
attribute(s). You wish to use as much as possible
standard SQL queries - How can you find the record with median (middle)
key ? - What are the catalog tables where you can find
the table size, the associated check constraints,
indexes, triggers, stored procedures to move to
the new node - How will you move the records that should leave
to the new node. - Idem for a stored funciton under Amos
- Idem for Oracle
- Idem for DB2
133END
- Thank you for your attention
Witold Litwin litwin_at_dauphine.fr wlitwin_at_cs.berkel
ey.edu
134(No Transcript)