Title: Cyber Entity Directory Service
 1Cyber Entity Directory Service 
 2Goal
- To provide a directory service for rapid location 
of Cyber Entities within the Bionet architecture.  - Tasks 
 - Prove the directory is scalable. 
 - Prove the directory is efficient. 
 - Prove the directory is fault tolerant.
 
  3Motivation
- In BIONET a CE may only maintain a finite number 
of relations.  - These relationships stabilize based upon 
similarity and usefulness.  - If a CE attempts to locate another CE that is not 
closely related or directly useful, search may 
take a long time.  - What if a CE application provided assistance to 
the search by organizing a distributed directory?  - The CE directory provides a fast look up service 
for other Cyber entities in the hopes of making 
discovery more efficient.  - Note Purpose in Bionet Reduces the amount of 
stabilization time in a network because CEs have 
an ordered lookup directory. This eliminates the 
unbounded search. 
  4Motivation (cont.)
- What if network is highly dynamic and is 
constantly undergoing stabilization?  - Suppose a network is designed in such a fashion 
that CEs are born and die frequently.  - These CEs may not have enough time to establish 
meaningful relationships.  - Network may be slow because too many CEs are 
searching too much of the network (Relationships 
developed are not efficient enough)  - Ex PDA / Cellphone network with PDAs and 
Cellphones constantly coming on and off of the 
network.  - The Directory could allow direct access to 
necessary application CEs thus facilitating 
stability.  
  5Proposed Purpose
- To provide a mechanism for fast look up of CEs 
based on a CEs published keyword.  - CEs will not have the capacity to publish an 
entire list of keywords. This facilitates 
modular design. Each CE should have a single 
purpose. That purpose is described by the 
keyword.  - It would become extremely much more demanding in 
terms of directory load to allow CEs to use 
multiple keywords.  - The mechanism is NOT the standard query method. 
It is meant only to enhance the relationship 
method.  - Suppose it takes 30 hops to find something in the 
directory while it takes 4 hops to find the same 
CE you are related to. Clearly the relational 
query is faster.  - Suppose it takes 30 hops to find something in the 
directory but takes 100 hops to find a CE you are 
not directly related to ( or have links 
established to ). Clearly it is better to take 
the directory.  - In the Bionet, CEs must make their own decisions 
on how to use the directory. The choice of when 
to use it is up to the CE designer. However, 
policies should be developed to guide the 
designer towards the most efficient use.  - Use the directory to look up CEs not closely 
related to currently established relationships.  - Use currently established relationships as a 
cache mechanism.  - Use the directory at CE birth to establish 
relationships.  
  6Approach
- The directory is composed entirely of the CEs 
that choose to be a part of the directory.  - The choice is made be the CE designer. 
 - In order to be a part of the directory, a CE must 
inherit properties from a special CE class.  - The system has 3 types of CEs. 
 - The base level CE that contains methods and 
properties to maintain the database. The other 2 
CEs inherit from this CE/class.  - Daemon CEs that facilitate communication across 
platforms. ( Daemon CEs implement the base level 
directory CE ).  - Application level CEs. ( Application level CEs 
implement the base level directory CE ). 
  7Approach
- How it works 
 - All CEs in the system have special relationships. 
 These relationships are permanent.  - Left Child 
 - Right Child 
 - Parent 
 - When a platform is started, it connects to other 
platforms. A bootstrap process establishes a 
Daemon directory CE on the platform. Only one 
instance of a Daemon may run on a platform at a 
time. The Daemon may not die.  - Daemon CEs work to setup and maintain distributed 
tree structure. The concept must be thought of 
recursively.  
  8Approach
- If a distributed tree exists then new Daemons / 
platforms insert themselves into the tree by 
contacting a Daemon on the platform where the 
Daemon migrated from.  - CEs in the directory are ordered off of a single 
published Keyword.  - They insert themselves in binary tree fashion. 
Since each CE in the directory has a left,right, 
and parent it can search for the location the new 
CE should be located.  - When the location is found, the CE is inserted at 
that spot, and a balancing algorithm progresses 
up to balance the tree.  - If no tree exists ( the first instance of the 
directory structure ) then create a new tree with 
the current Daemon as the top of the tree. 
  9Approach
- When the Daemon has been established on a 
platform, CEs that implement the base level 
directory CE may now insert themselves into the 
directory. They do so in the same fashion that 
the Daemon CEs did as described above by 
contacting the local Daemon.  - When a CE is dying it sends a message to its 
children and parent. The message tells the 
parent and children to coordinate in such a 
fashion as to replace the dying node ( ie. The 
left child moves up ). At this point a 
rebalancing occurs.  - When a CE migrates it sends a message to its 
children and parent. The message tells the 
parent and children the new location that the CE 
is migrating too. 
  10Approach
- Tree balancing. 
 - Tree balancing is needed to handle the following 
cases  - Host failure When a host goes down the tree 
cannot fall apart. Sub trees must realize that 
there has been a host failure and merge with the 
main tree. The main tree can be found using the 
Daemon nodes.  - Multiple directory locations If a failure occurs 
in such a fashion that the Bionet splits and 
multiple directories reestablish themselves, the 
directories must be able to merge themselves 
together efficiently.  - Insertion / Deletion When insertion and deletion 
happen there needs to be a weight balance 
algorithm that ensures the tree has O(log(n)) 
depth.  - Unexpected CE death If a CE dies for some reason 
without completing its death state, the sub trees 
of the CE must be merged back into the tree. ( 
similar to a host failure ).  
  11Approach
- Tree balancing should guarantee some type of 
bound on the time for  - Insertion 
 - Deletion 
 - Search 
 - Merging 
 - The bound can either be rigid or amortized.
 
  12Approach
- Tree synchronization and locking 
 - Due to balancing and merging functionality 
certain locking procedures must be in place.  - What if a CE is searching the tree while a local 
node is rebalancing? The CE may take the wrong 
path down the tree.  - What if 2 sub trees try and merge to the same 
point in the tree? 
  13Approach
- I am currently looking at algorithms for 
maintaining parallel trees, mergeable trees, and 
amortized trees. A suitable algorithm must be 
designed out of these fields to build the 
balancing algorithm the directory tree will 
implicitly maintain. 
  14Approach (cont.)
- Distribute the CEs over the network. 
 - The CEs are now free to behave like all other 
CEs. They can migrate over the network as much as 
they want because each migration requires only 3 
updates to the tree. ( Left child, right child, 
parent ). The updates are done in constant time 
because a direct relationship is maintained.  
  15Approach (cont.)
- Make the application an extension of Bionet that 
CEs chose to participate in rather than be 
forced to.  - If CEs can satisfy all their needs effectively 
without the directory, then they can exclude 
themselves from the directory. This leaves fewer 
CEs in the directory, thus reducing search 
resources overall. 
  16Efficiency
- Search, insert, delete Time Sufficient 
algorithms exist to allow search times to be 
guaranteed at O(log(n)) to O( square root(n) ).  - Weight balance trees ( BBalpha, AVL ). 
 - Red-Black trees / 2  3  4 trees. 
 
  17Efficiency
- Merge Time Using the datastructures listed 
previously we can guarantee a bound on merging 
two trees  - a is the number of nodes in tree A. 
 - b is the number of nodes in tree B. 
 - Inserting each value of A into B takes 
 - O(a  log(a))  O( a  log(b)) 
 - If a is larger  O(a  log(a) ) 
 - If b is larger  O( a  log(b) ) 
 
  18Efficiency
- This is the naive algorithm however. Do better 
algorithms exist? ( probably ) I am looking into 
this.  - However, we can assume that large merge 
operations do not occur frequently since they 
only occur during host failure or CE failure.  - Given M hosts who equally distribute CEs, the 
probability of a top level failure is 1 / M for 
any given host failure. For a top level failure 
we must merge two trees of n/2 size. This takes 
O(n/2 log(n))  O(n/2 log(n) ) time. Resulting 
in O(nlog(n)).  - Furthermore, if a failure occurs, the sub trees 
exist in a special state that may be more easily 
repaired then just merging two random binary 
trees. 
  19Efficiency
Failure occurs at the red relationship.
If a failure occurs at the red relationship the 
resulting sub trees have a special property The 
left and right sub tree each maintain a Somewhat 
disjoint domain. 
 20Efficiency
A
B
lt
The two trees can be merged together because 
their domains to not overlap ( provided the tree 
has not changed ). 
 21Alternate Solutions
- Allow only BIONET relationships and current 
discovery mechanisms to handle discovery.  - Does not allow for rapid lookup of unrelated 
entities.  - Requires a period of time before the network 
becomes stable enough to search efficiently. 
  22Alternate Solutions
- Distributed Consistent Hashing ( Chord protocol 
applied to BIONET. http//pdos.ics.mit.edu/chord/ 
)  - Provides good upperbound on both maximum number 
of links to other agents as well as search time. 
approx O(log( n ))  - n agents requires nlog(n) total relationships in 
the system.  - Our system needs 3  n. 
 
  23Alternate Solutions (cont.)
- Adapt a graph theory approach like the HITS or 
HyperClass algorithm for webcrawling.  - Must be implemented at the CE level so all CEs 
are a part of the directory, not just a subset.  - Similar to Freenet, but instead of node becoming 
good at searching an area, a node gets ranked on 
how well it searches. 
J. Kleinberg, S.R. Kumar, P. Raghavan, S. 
Rajagopalan, and A. Tomkins, The web as a graph 
Measurements, models and methods, Proceedings of 
the International Conference on Combinatorics and 
Computing 1999. 
 24Advantages
- Applications that wish to be diverse may 
implement a protocol to talk to the CE directory 
service. This would link highly mobile and 
diverse agents together, while leaving relatively 
stagnant agents out of the directory, keeping the 
size of the directory limited and thus faster.  
  25Advantages
- Provable bounds 
 - Search times of O(log(n)). 
 - Insertion / Deletion times of O(log(n)). 
 - Merge times of O(nlog(n)) with unrelated trees ( 
two independent directories merging )  - Merge times of O(log(n)) with tree failures. 
 
  26Summary
- To provide a directory service for rapid location 
of Cyber Entities within the Bionet architecture.  - Tasks 
 - Prove the directory is scalable. 
 - Prove the directory is efficient. 
 - Prove the directory is fault tolerant. 
 
  27Summary
- This all boils down to finding good merging 
algorithms.