Title: A Scalable Distributed Information Management System (SDIMS)
1A Scalable Distributed Information Management
System (SDIMS)
- P. Yalagandula, M. Dahlin
- cs.utexas.edu
- SIGCOMM 2004
2Outline
- Introduction
- Goal Aggregation
- Innovation
- Flexibility
- Scalability
- Robustness
- Implementation
- Evaluation
- Conclusions
3Introduction
- Why SDIMS ?
- Monitor, querying, reacting to changes are core
components of applications such as system
management, service placement, data sharing and
caching, etc. - SDIMS in a networked system would provide a
distributed operating system backbone and
facilitate the development and deployment of new
distributed service.
4Introduction (cont.)
- Fundamental
- Hierarchical aggregation
- A node access detailed views of nearby
information and summery views of global
information. - A hierarchical system aggregate information
through reduction trees.
5Introduction (cont.)
- A SDIMS should have four properties.
- Scalable
- Flexibility
- Administrative isolation
- Robustness
6Scalable
- SDIMS should accommodate large numbers of nodes.
- SDIMS should allow applications to install and
monitor large numbers of data attributes.
7Flexibility
- SDIMS should accommodate a range of applications
and attributes. - Read-dominated attribute (rarely change)
- Num of CPUs
- Write-dominated attribute (change often)
- Num of processes
- SDIMS should leave the policy decision of tuning
replication to applications.
8Administrative isolation
- Nodes can be arranged in an organizational or
administrative hierarchy. - Domain-based control.
- Monitor
- Query
9Robustness
- SDIMS should adapt to reconfigurations in a
timely fashion when node failures or
disconnections. - SDIMS should provide mechanisms so that
applications can tradeoff the cost of adaptation
with consistency level of aggregated results when
reconfigurations occur.
10Related Works
- Astrolabe
- A single logical aggregation tree that mirrors a
system administrative hierarchy. - A general interface for installing new
aggregation functions. - An unstructured gossip protocol for disseminating
information and replicating all aggregated
attribute values for a sub-tree to all nodes in
the sub-tree.
11Related Works (cont.)
- Any nodes can answer queries by using local
information. - Not scalable. (replication)
- Not flexibility. (Type of attribute)
- Solution P2P
Go to DHT
12Tree
- For each level in the hierarchy, the agent
maintains a record with the list of child zones
(and their attributes), and which child zone
represents its own zone (self).
Back to Astrolabe
13Gossip protocol
- Periodically, each agent selects some other agent
at random and exchanges state information with
it. - If the two agents are in the same zone, the state
exchanged relates to MIBs in that zone. - If the two agents are in different zone, they
exchange state associated with the MIBs of their
least common ancestor zone.
Back to Astrolabe
14Related Works (cont.)
- DHT
- SkipNet, CAN, Pastry, Chord, Tapestry
15Problem
- How to scalable map different attributes to
different aggregation tree in a DHT mesh
?physical network vs overlay network - How to provide flexibility in the aggregation to
accommodate different application requirement
?flexible API for installing and controlling
system
16Problem ?
- How to adapt a DHT mesh to attain administrative
isolation property ? virtual organization - How to provide robustness without unstructured
gossip and total replication ?cache
pre-computing or on-demand re-aggregation
17Aggregation Abstraction
18Aggregation Abstraction
- Each physical node in the system is a leaf in the
tree. - An internal non-leaf, which we call virtual node,
is simulated by one or more physical nodes at the
leaves of the sub-tree for which the virtual node
is the root.
19Aggregation Abstraction (cont.)
- Each physical node has local data stored as a set
of (attributeType, attributeName, value) tuples. - The system associates an aggregation function
ftype with each attribute type.
20Aggregation Abstraction (cont.)
- For each level-i sub-tree Ti in the system has an
aggregate value Vi, type, name for each
(attributeType, attributeName) pair. - The aggregate value for a level-i sub-tree Ti is
the aggregate function for the type, ftype
computed across the aggregate values of each of
Ti s k children. Vi, type, name ftype
21Aggregation Abstraction (cont.)
- Example of ftype
- Avg(V1, , Vn)1/n ??
- SUM(V1, , Vn) ??
- Aggregation function satisfy the hierarchical
computation property
22Aggregation Abstraction (cont.)
node
Virtual node
23Innovation
- Flexibility
- Scalability
- Administrative isolation
- Robustness
24Flexibility
- Operation API
- Install
- Update
- Prob
25Install Operation
- The Install operation installs an aggregation
function in the system.
26Prob Operation
?????reconfigure,????cache
27Prob Operation (cont.)
- When node A issues a continuous probe at level l
for an attribute, then updates for the attribute
at any node in As level-l ancestors subtree are
aggregated up to level l and is propagated down
along the path from the ancestor to A.
28Update and Prob Operation
29Update and Prob Operation (cont.)
30Update Operation API
- Update-UpK-downj Up to kth level and propagates
the aggregate values of a node at level l
downward for j levels. (l k)
31Operation API
K
Update-UpK-downj
Level-4
Level-3
Level-2
L
Level-1
J
Level-0
32Dynamic Adaptation
- A SDIMS implementation can dynamically adjust its
up/down strategies for an attribute based on its
measured read/write frequency.
33Scalability
- SDIMS defines the aggregation abstraction to mesh
with its underlying scalable DHT system. - SDIMS refines the basic DHT abstraction to form
an Autonomous DHT (ADHT) to achieve the
administrative isolation properties
34Mapping to DHT
1
35Mapping to DHT
- Aggregating an attribute along the aggregation
tree is corresponding to DHTtreek for k
hash(attribute type, attribute name) - Different attributes will be aggregated along
different trees.
36Administrative isolation
- For security
- Updates and Probes are not accessible outside the
domain - For availability
- Queries for values in a domain are not affected
by failures of nodes in other domains - For efficiency
- Domain-scoped queries can be simple and efficient.
37Administrative isolation
- Autonomous DHT
- Path Locality Search paths should always be
contained in the smallest possible domain. - Path Convergence Search paths for a key from
different nodes in a domain should converge at a
node in that domain.
38Administrative isolation
Domain univ.
Domain dept.
L0 host L2 univ.
isolation property is violated
39Administrative isolation
Domain dept.
Domain univ.
Autonomous DHT
L0 host L2 dept.
40Robustness
- ADHT
- Distributed Computing (?)
- Aggregation Management Layer (AML)
- Lazy re-aggregation
- On-demand Re-aggregation
- Replication in Space
412 Layer arch. ADHT and AML
- The ADHT layer informs the AML layer about
reconfigurations in the network. - NewParent
- FailedChild
- NewChild
42Implementation
DifferentOverlay(?)
43MIB
- Child MIBs containing raw aggregate values
gathered from children. - Reduction MIB containing locally aggregated
values across this raw information - Ancestor MIB containing aggregate values
scattered down from ancestors.
44Implementation
parent
child
45Implementation (cont.)
- attribute key Use for retrieving data by
aggregation function. - (attributetype, attribute name)
46Implementation (cont.)
- A node acts
- as leaf for all attribute keys
- as a level-1 subtree root for keys whose hash
matches the nodes ID in b prefix bits. - as a level-i subtree root for keys whose hash
matches the nodes ID in the initial i b bits. - as the systems global root for attribute keys
whose hash matches the nodes ID in more prefix
bits than any other node
47Evaluation
?????MIB
????Node?MIB
Up-All, Down 0
Monitor?attribute???
Monitor?attribute???
48Evaluation (cont.)
the session size is set to 8 (domain size), the
branching factor is set to 16
Message size
nodes
49Evaluation (cont.)
Bf Branch Factor
Average path length to root
50Evaluation (cont.)
Bf Branch Factor
51Evaluation (cont.)
440
700
40
100
52Evaluation (cont.)
283?node???, ??node?10
53Evaluation (cont.)
Re-aggregation
275s? root killed
54Conclusion
- Scalability with respect to both nodes and
attributes through a new aggregation abstraction
that helps leverage DHT's internal trees for
aggregation. - Flexibility through a simple API that lets
applications control propagation of reads and
writes.
55Conclusion (cont.)
- Administrative isolation through simple
augmentations of current DHT algorithms. - Robustness to node and network reconfigurations
through lazy reaggregation, on-demand
reaggregation, and tunable spatial replication.