A Symmetric and Polyvalent Resource Location System - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

A Symmetric and Polyvalent Resource Location System

Description:

attribute/value pairs ( Resource/ Query properties) Constraint section ... Define the valid attributes and values that can appear in the description of a ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 55
Provided by: lya1
Category:

less

Transcript and Presenter's Notes

Title: A Symmetric and Polyvalent Resource Location System


1
A Symmetric and Polyvalent Resource Location
System
  • Candidate Chuang Liu
  • Advisor Ian Foster
  • University of Chicago

2
Growth of the Internet
  • The broad deployment of the Internet and the
    emergence of service-oriented architectures have
    led to a remarkable increase in the number of
    resources to which a user, program, or community
    may have access.

3
Infrastructures of Resource Pools
Pools 1079 CPUs 105146
Sites 100 CPUs 100,000
Condor
Globus
Planetlab
Gnutella
Sites 298 CPUs 629
Node 1.5 M
4
Applications and Challenges
  • Applications
  • Scientific computing application
  • Content distribution systems
  • On-demand and utility computing
  • Challenges
  • Applications need to run on one resource (or
    resource collection) with desired individual and
    aggregation properties to achieve good
    performance or efficiency
  • Resources are heterogeneous and dynamic
  • Large number of resources ? selection expensive
  • Resource owners impose policies concerning, e.g.,
    who can use a resource and for what purpose
  • In Internet environments, resources are
    distributed

5
We Hypothesize a Unifying MechanismResource
Location Service
  • We need efficient algorithms for polyvalent
    queries, e.g.
  • Resource set based on their aggregation
    properties
  • Resource set based on their network locations
  • In Internet environments, resources are
    distributed
  • Organization of information, distributed query
    evaluation
  • ? Scalable Internet resource location service

6
Outline
  • Resource and query description
  • Data Model
  • Syntax
  • Search algorithms
  • A computer location service
  • Summary

7
Requirements
  • Resource description
  • Resource properties
  • Query description
  • Search condition constraints on resource
    properties
  • Traditionally, resources and queries look
    different
  • MDS, UDDI, etc.

Access policies constraints on user properties
User properties
We want to treat resources queries as symmetric
Condor pioneered such an approach
(matchmaking)But many limitations in its features
8
Symmetric Data Model
  • A (query or resource) description
  • Data section
  • attribute/value pairs ( Resource/ Query
    properties)
  • Constraint section
  • constraints on properties (Access policy / search
    condition)
  • Rank section
  • Symmetric evaluation
  • A query and a (set of ) resource(s) match each
    other if all constraints in their descriptions
    are satisfied
  • Focus here on 1-1 and 1-N matches
  • Have addressed N-N in other work CCGrid 2005

9
Syntax
  • Description uses XML-based syntax
  • Extensibility
  • XML Schema
  • Define the valid attributes and values that can
    appear in the description of a particular type
    resource/query

ltrldescription typecomputergt
ltrldata_sectiongt ltcomposgtlinuxlt/compo
sgt ltrlif condition'userorgA'gt
ltcompdisksizegt100lt/compdisksizegt
ltcompbandwidthgt100lt/compbandwidthgt
lt/rlifgt lt/rldata_sectiongt
ltrlrequirement_sectiongt ltrlif
condition'userorgB'gt
ltrlconstraint nameaccess time
errmsgnot accessiblegt
rq.useraccesstime between (600PM, 600AM)
lt/rlconstraintgt lt/rlifgt
lt/rlrequirement_sectiongt ltrlrank_sectiongtlt/rl
rank_sectiongt lt/rldescriptiongt
10
New Features(Relative to Previous Approaches)
  • Resources may show different properties or
    different access policies to different users
  • Condition structure
  • If( condition1 ) attribute value1
  • If( condition2 ) attribute value2
  • Option structure
  • attribute1 value2, attribute2 value3
    or
  • attribute1 value3, attribute2 value4
  • Queries for resource sets
  • Constraints on aggregation properties of resource
    set
  • rs1 ISASET computer sum(rs1.memorysize) gt
    100rank -count(Rs1)

A Constraint Language Approach to Matchmaking.
Liu, C., Foster, I., 14th Intl Workshop on
Research Issues on Data Engineering (RIDE 2004),
Boston, 2004.
11
Outline
  • Resource and query description
  • Search algorithms
  • A computer location service
  • Summary and future work

12
Search Algorithms
  • Locating one resource with desired properties
  • MDS, Condor Matchmaker, RGIS, Gnutella, UDDI
  • Relational and other databases
  • Locating resource set with desired properties
    (polyvalent queries)
  • a) Resource sets with required aggregation
    properties
  • b) Resource sets with required network connections

13
Queries with Aggregation PropertiesExtending
Relational Databases
  • A query for a resource set with aggregation
    properties can be represented by a database query
    requiring the simultaneous satisfaction of
    arithmetic constraints on multiple attributes
    (ACMA) from different relations
  • Database search engine solves ACMA queries by
    join operations. Unfortunately, current
    algorithms have poor performance.
  • ? Introduce ACMA Join operator and ACMA query
    evaluation plan

SELECT FROM T as A, T as B, T as C, T as D
WHERE A.price
B.price C.price D.price lt 5
AND A.cpuSpeed
B.cpuSpeed C.cpuSpeed D.cpuSpeed gt 100
AND A.memory B.memory C.memory
D.memory gt 100
14
Execution Plan of ACMA Join

SELECT FROM T as A, T as B, T as C, T as D
WHERE
A.price B.price C.price D.price lt 5
AND
A.cpuSpeed B.cpuSpeed C.cpuSpeed D.cpuSpeed
gt 100 AND A.memory B.memory
C.memory D.memory gt 100
15
Implementation of ACMA Join
  • Selection operators
  • Use consistency algorithm to initialize selection
    conditions, which is range constraints on single
    attributes, in selection operators
  • Constrained join operator
  • Extends nested-loop join operator
  • Use consistency algorithms to foretell if an
    intermediate result will lead to any final query
    results

Consistency algorithm
Range constraints on single attribute
ACMA query
ACMA query Intermediate result
Consistency algorithm
Yes/no
16
Evaluation of Our Method
  • Traditional plan
  • Plan with selection operators
  • Plan with selection operators
  • and constrained join operator

17
Performance ExperimentsExample Results
  • Plan I reads from 104 to 106 times more tuples
    than do the other two plans
  • Plan III performs a factor of ten times fewer
    tuple reads than does plan II.

Efficient Combinatorial Search in Relational
Databases, Liu, C., Yang, L., Foster, I., 9th
International Database Applications and
Engineering Symposium (IDEAS 2005), Montreal, 2005
18
Outline
  • Resource and query description
  • Search algorithms
  • Resource sets with required aggregation
    properties
  • gt Resource sets with required network connections
  • A computer location service
  • Summary

19
Resource Set with Required Network Connection
  • Locate a set of resources with particular network
    connections in the Internet.
  • Q1 Find a set of R resources close to each
    other
  • The network latency between any pair of those
    resources is less than L milliseconds
  • Useful for e.g. computational applications
  • Q2 Find a set of R resources far from each
    other
  • The network latency between any pair of those
    resources is more than L milliseconds
  • Useful for e.g. content distribution applications

20
Challenges
  • Direct computation
  • Such as tree search algorithm
  • Challenges
  • It is a NP-hard problem
  • It may require a large number of measurements
  • Unstable networks and resources may lead to
    individual measurements failing ? only partial
    data
  • Network latency data is noisy because of the
    sharing of network resources among users

21
Intuition of Our Heuristic Method
  • Clustering
  • We partition resources into clusters based on
    end-to-end network latency
  • A cluster is set of resources having much smaller
    latency with each other than with other resources
  • Search based on the cluster structure
  • Q1. Search for resources in a cluster
  • Q2. Search for resources from different clusters

22
Outline
  • Resource and query description
  • Search algorithms
  • Resource sets with required aggregation
    properties
  • Resource sets with required network connection
  • Cluster Algorithms
  • Cluster Algorithm I
  • Cluster Algorithm II
  • Search Algorithm
  • A computer location service
  • Summary

23
Cluster Algorithm I Resource Pool
  • Resource pools such as OSG, PlanetLab, etc.
  • Hundreds of resources
  • Resources are relatively stable
  • Latency measurements between resources exist
  • Available latency measurements are only a subset
    of all possible measurements

Latency data on PlanetLab Collected by Stribling
24
Cluster Algorithm I
  • Markov cluster algorithm Dongen 2000
  • If there are many short paths between two
    resources, it is highly possible that these two
    resources have a small latency, and therefore
    belong to the same cluster
  • Details in
  • S. Dongen A cluster algorithm for graphs, 2000

25
Effectiveness of the Cluster Algorithm
  • Compute cluster structures using 10-90 of data.
  • Quantify, as fraction of changes D, difference
    between each structure and the structure obtained
    with all data
  • ? We conclude that the cluster algorithm is still
    effective when running on an incomplete set of
    data

Frac 90 80 70 60 50 40 30 20 1
D 0.06 0.145 0.152 0.161 0.198 0.228 0.336 0.38 0.46
26
Variation of the Cluster Structure
  • Compare each clustering structure with the one
    based on data one, two and four hours ago.
  • 30 of cluster structures change less than 10
    from one hour ago
  • gt60 of cluster structures change between 10 and
    15 from one hour ago
  • Difference does not increase over time

Efficient and Robust Computation of Resource
Clusters in the Internet, Liu, C., Foster, I. 6th
IEEE International Conference on Cluster
Computing (Cluster 2005), Boston, 2005
27
Outline
  • Resource and query description
  • Search algorithms
  • Resource sets with required aggregation
    properties
  • Resource sets with required network connection
  • Cluster Algorithms
  • Cluster Algorithm I
  • Cluster Algorithm II
  • Search Algorithm
  • A computer location service
  • Summary

28
Cluster Algorithm II Resource Pool
  • Resource pools such as Gnutella, Kazaa, etc.
  • Resources join the resource pool incrementally
  • Very large number of resources
  • Very expensive to measure and store latency
    between all resources
  • Requirements
  • Incrementally modify cluster structure when
    resources leave and join the resource pool
  • Only a modest number of latency measurements
  • Need small storage space

29
Hierarchical Cluster Structure
  • Storage space O(N)

Average Standard deviation
30
Incremental Cluster Algorithm
  • Number of Measurements Log(N)

31
Outline
  • Resource and query description
  • Search algorithms
  • Resource sets with required aggregation
    properties
  • Resource sets with required network connections
  • Cluster Algorithms
  • gt Search Algorithm
  • A computer location service
  • Summary

32
Modified Tree Search Algorithm
  • Tree search algorithm
  • Starts with an empty set
  • Repeatedly picks from available resources one
    resource that has required connections with
    current members in the set, and adds it to the
    set
  • Rolls back the addition in previous step if no
    such resource exists
  • Finishes when the set contains all required
    resources
  • Modified tree search algorithm
  • Q1 pick resources from the same clusters
  • Q2 pick resources from different clusters

33
Evaluation of Performance
  • Cumulative distribution of execution time
  • Our algorithm answers 70 of queries within a few
    milliseconds

Algorithm 70 90
tree 0.6 s 26 s
modified 1.6 ms 0.4 s
34
Outline
  • Resource and query description
  • Search algorithms
  • Resource sets with required aggregation
    properties
  • Resource sets with required network connections
  • gt A computer location service
  • Summary

35
Computer Location Service
  • Build a resource location service for computers
    connected by Internet
  • Requirements
  • Support polyvalent queries for computer sets
  • Support queries for one computer with
    requirements on multiple properties
  • Support queries based on network locations
  • Support resource access policy
  • Scalable to handle large number of computers and
    queries

36
Related Work
We need a new service
37
System Structures
Centralized structure Short response time Poor
scalability E.g., MDS2, Napster, UDDI
Peer-to-peer structure Good scalability Long
response time Poor support of queries for
resource setE.g., Gnutella 0.4, SWORD
Super-peer structure Medium response time Good
scalability Good support of queries for resource
set E.g., Gnutella 0.6, Kazaa
38
Super-peer Structure
  • Partition computers based on the latency
    hierarchy
  • One computer in each group acts as the super-peer
  • Advantages
  • Answer polyvalent queries locally
  • Support queries for computer based on their
    network location
  • Low network traffic
  • Cannot find solutions that span groups

39
Load Balance
  • Update of computer information
  • Each computer reports to the super-peer in its
    group
  • Query processing
  • Each computer knows about K super-peers and sends
    queries to them randomly

40
Fault Tolerance
  • Restart of a super-peer
  • A super-peer periodically sends out a backup list
    to each computer managed by it
  • If a super-peer fails, all related computers
    report to the first computer in the backup list
  • Recovery of data in a super-peer
  • Each computer reports to the new super-peer its
    clusterID that will be used to reconstruct the
    cluster structure

41
Work Remaining to be Done
  • Measure
  • Query success rates
  • Query response times
  • Average and maximum input/output traffic
  • For
  • Our super-peer structure and algorithm
  • Random super-peer structure and our algorithm
  • Others?
  • Using
  • Workloads TBD
  • Assuming
  • Computer characteristics change randomly

42
Outline
  • Description of resources and queries
  • Matchmaking algorithms
  • An algorithm to locate resource sets with
    required aggregation properties
  • Algorithms for locating resource sets with
    required network connection
  • A matchmaking service
  • Summary

43
My Contributions
  • A matchmaking language to describe resources and
    queries
  • Symmetric mechanism that enables both resource
    owner and requesters to control matching between
    resources and queries
  • Support polyvalent queries
  • Fast algorithms to solve polyvalent queries that
    search for a resource set with desired
    aggregation properties and network connections
  • Order-of-magnitude(s) faster than other
    approaches
  • Scalable resource location service that supports
    a large set of queries for networked computers
  • Evaluation in progress

44
Publications
  1. Efficient and Robust Computation of Resource
    Clusters in the Internet, Liu, C., Foster, I.,
    6th IEEE International Conference on Cluster
    Computing (Cluster 2005), Boston, 2005
  2. Matchmaking Systems A Survey, Liu, C., Foster,
    I., unpublished document, 2005
  3. Efficient Combinatorial Search in Relational
    Databases, Liu, C., Yang, L., Foster, I., 9th
    International Database Applications and
    Engineering Symposium (IDEAS 2005), Montreal,
    2005
  4. Online Resource Matching in a Heterogeneous Grid
    Environment, Naik, V., Liu, C., Yang, L., Wagner,
    J., 6th IEEE International Symposium on Cluster
    Computing and the Grid (CCGrid 2005), Cardiff,
    UK, 2005.
  5. DB_CSP A Framework and Algorithms for Applying
    Constraint Solving within Relational Databases,
    Liu, C., Foster, I., 19th Workshop on
    (Constraint) Logic Programming (WLP 2005), Ulm,
    Germany, 2005.
  6. A Constraint Language Approach to Matchmaking.
    Liu, C., Foster, I., 14th International Workshop
    on Research Issues on Data Engineering (RIDE
    2004), Boston, 2004.
  7. Scheduling in the Grid Application to Grid
    Resource Selection. Dail, H., Sievert, O.,
    Berman, F., Casanova, H., Yarkhan, A., Vadhiyar,
    S., Dongarra, J., Liu, C., Yang, L., Angulo, D.,
    Foster, I., In Grid Resource Management, Kluwer
    Publishing, 2003.
  8. Design and Evaluation of a Resource Selection
    Framework. Liu, C., Yang, L., Foster, I. and
    Angulo, D., 11th IEEE International Symposium on
    High Performance Distributed Computing (HPDC-11),
    Edinburgh, Scotland, 2002.
  9. The Cactus Worm Experiments with Dynamic
    Resource Discovery and Allocation in Grid
    Environments, Allen, G., Angulo, D., Foster, I.,
    Lanfermann, G., Liu, C., Radke, T., Seidel, E.,
    Shalf, J., International Journal of Supercomputer
    Applications, Winter, 2001, v15(4).

45
  • Questions?
  • Thank you

46
Consistency algorithm
  • ACMA
  • Consistency algorithm

Logic operators, such as gt, lt, , etc.
attributes
constants
47
To Do
  • Refine slide 23.

48
Infrastructures
Sites 27 Users 100 CPUs 2700
Sites 100 Users 25,000 CPUs 100,000 Data
10 PB
Sites 298 CPUs 629
Condor
Pools 1079 CPUs 105146
Node 1.5 M
49
Protocol
50
System Structures
  • Centralized vs. peer-to-peer vs. super-peer
    structure
  • Reasons to choose super-peer structure
  • It is necessary to aggregate computer information
    to process polyvalent queries efficiently
  • Balance between scalability and efficiency
  • Suitable for queries with high selectivity

51
Incremental Cluster Algorithm
  • Number of Measurements N Log(N)

52
Benchmark
  • Three relations A, B, and C with two attributes
    K1000 and K10000
  • Values of K1000 (K10000) distribute uniformly
    from 1 to 1000 (10000). (Wisconsin benchmark)
  • Values of K1000 (K10000) follow a normal
    distribution with medium value 500 and standard
    division 250 (medium 5000 and standard division
    2500 )
  • Query
  • SELECT FROM A, B, C WHERE
  • A.K1000 B.K1000 C.K1000 gt N1
    AND
  • A.K1000 B.K1000 C.K1000 lt N2
    AND
  • A.K10000 B.K10000 C.K10000 gt N3

53
New Features
  • Resources may show different properties or
    different access policies to different users
  • Condition structure
  • Option structure
  • Queries for resource set
  • constraints on aggregation properties of resource
    set, such as connected(), etc

A Constraint Language Approach to Matchmaking.
Liu, C., Foster, I., Proceedings of the 14th
International Workshop on Research Issues on Data
Engineering (RIDE 2004), Boston, 2004.
54
Cluster Structure of Resources on Planetlab
  • Cluster structures with different granularity
  • East America, West America, Central America, East
    Asian, South European, etc
  • California, Texas, China, Korean, etc
  • San Jose (HP, UCB, Stanford), Boston (BU, MIT),
    etc..

G of clusters Median latency
1.2 39 17 ms
1.3 87 4 ms
1.4 107 1 ms
Write a Comment
User Comments (0)
About PowerShow.com