Title: RCAN A Relational Database over CANlike Network
1RCANA Relational Database over CAN-like Network
- Wenhan Wang
- Jinwoo Kim
- Lan Zhang
- Daskos Antonios
- and possibly a few more
2Background
- Traditional database operation is executed on a
single node - Data size is exploding nowadays
- Peer-to-peer network, why not?
3Limitation of CAN
- CANs hash nature only provides exact query
facility - e.g. get(key)
- CAN cannot handle a range query
- e.g. get(alt20)
- Cannot be achieved with multiple requests
- Broadcasting is unacceptable
4Goal
- Design and implement RCAN (ouR CAN)
- A relational database over modified CAN, which
supports a subset of query language (select,
project and join) with the benefits of a highly
distributed decentralized system.
5System Overview
Network
6System Overview (continue..)
- Place data intelligently on a set of nodes
- Query only on these relevant nodes
- Terminate query
- Send query results back to user node
7Intelligent node deployment
- CAN doesnt know where query data is
- CAN needs relational data placement to process
Query effectively - Apply MAGIC (Multi Attribute GrId deClustring) to
CAN
8Intelligent node deployment CAN
SELECT FROM Items WHERE price lt 50 and price
gt 40
MUST visit 36 node to get all required data
9Intelligent node deployment RCAN
Assign note by specific attribute
A-D E-H I-L M-P Q-T U-Z
Ticker_Symbol
0-10
11-20
21-30
31-40
41-50
Price
51-60
10MAGIC
SELECT FROM Items WHERE price lt 50 and price gt
40
Ticker_Symbol
A-D E-H I-L M-P Q-T U-Z
0-10
11-20
21-30
31-40
41-50
Price
51-60
Just 6 nodes are required to get query data
11MAGIC
SELECT FROM Items WHERE Pricelt50 AND Pricegt40
AND Ticker_Symbol AXP
Ticker_Symbol
A-D E-H I-L M-P Q-T U-Z
0-10
11-20
21-30
31-40
41-50
Price
51-60
Just 1 node is required when query uses two
attributes
12Intelligent node deployment RCAN
Request add(ltCJF,25gt) is forwarded to point
ltCJF,25gt of our coordination system, thus node 13
A-D E-H I-L M-P Q-T U-Z
Ticker_Symbol
0-10
11-20
21-30
31-40
41-50
Price
51-60
13RCAN
- Node deployment by attributes, MAGIC
- Not by hash function value
- Reduce searching cost
- Questions rise about implementation (shown later)
14RCAN SELECT Query - Steps
- SELECT A1, A2 FROM R WHERE A1 lt v1 AND A2 lt v2
- Pre-request (client)
- Where?
- Threads?
- Message?
- Optimization
- Request (peer-to-peer network)
- Target node
- Propagation
- Post-request (client)
- Collection
- Termination
15RCAN SELECT Query (example)
R(A1,A2) A1 text (e.g. full name) A2 number
(e.g. age)
SELECT FROM R WHERE A1 D AND A2 40 ID
No 234947 Level 0 Client IP
SELECT FROM R WHERE A1 D AND A2 40
R
L0 1
16RCAN SELECT Query (example)
R(A1,A2) A1 text (e.g. full name) A2 number
(e.g. age)
R1
SELECT FROM R WHERE A1 D AND A2 40 ID
No 234947 Level 1 Client IP
R1
R1
D
Level0 Messages 2 Data (A-D),(30-40)
L0 0 L1 2
L0 1
17RCAN SELECT Query (example)
R(A1,A2) A1 text (e.g. full name) A2 number
(e.g. age)
R2
R2
R2
SELECT FROM R WHERE A1 D AND A2 40 ID
No 234947 Level 2 Client IP
D
R2
R2
Level1 Messages 1 Data (A-D),(0-30)
L0 0 L1 2
L0 0 L2 1 L1 1
18RCAN SELECT Query (example)
R(A1,A2) A1 text (e.g. full name) A2 number
(e.g. age)
R3
SELECT FROM R WHERE A1 D AND A2 40 ID
No 234947 Level 3 Client IP
R3
L0 0 L2 3 L1 0
19RCAN SELECT Query (example)
R(A1,A2) A1 text (e.g. full name) A2 number
(e.g. age)
R3
R3
SELECT FROM R WHERE A1 D AND A2 40 ID
No 234947 Level 3 Client IP
L0 0 L2 0 L1 0 L3 2
20RCAN SELECT Query Repetitions
Termination Condition All replies received
- If a node receives the same request from two
different nodes? - Knows which queries it has served (ID) no
resend - Replies back with empty message for consistency
- We try to reduce the occurence of such cases to
minimum
21RCAN SELECT Query - Forwarding
- Forwarding rules (that reduce repetition)
- If already served dont
- Forward to all nodes whose zone intersects with
query ranges - If received by neighbor of dimension i forward
to all neighbors of dimensions 1..i except sender
itself - other ideas still to be considered
Reduces greatly but doesnt eliminate
22RCAN SELECT Query - Steps
- SELECT A1, A2 FROM R WHERE A1 lt v1 AND A2 lt v2
- Pre-request (client)
- Where Node (v1,v2)
- Threads Disjunction of conjectures ORs
parallelized - Message Query ID Query body Level
- Optimization ()
- Request (peer-to-peer network)
- Target node Simple CAN functionality
- Propagation Controlled flooding based on zones
- Post-request (client)
- Collection Direct connection for each piece
- Termination Counter of remaining replies
23RCAN SELECT Query - Questions
- Storing of data
- Initial state of RCAN network
- Distribution over values
- Special fields (booleans dates)
- Splitting algorithm infinite case
- Termination conditions efficiency/accuracy
- Storing of level-values
- Loss of message or failure of nodes no reply or
partial? - Number of repeated messages
- Current belief suggests a few nodes receiving the
same request at most twice - Elimination probable?
- Multiple replies or collection of data in one
node? - Security and speed over bandwidth
24RCAN Group Research Progress Milestone
Study Read papers, propose some basic ideas of
relational queries in RCAN system (such as
SELECT, Terminate condition, etc)
Implementation
Evaluation Testing Refine
End of the semester
10/14
10/28
10/29
11/26
10/8
10/28
11/18
11/27
10/15
(1)
(2)
(3)
(4)
(5)
Finish Prototype Design Add JOIN, etc.
Communicate with CAN group and finish the
prototype design.
Integration Integration with other modules
25RCAN