Semantic Query Caching in Mobile Environments presentation

About This Presentation

Transcript and Presenter's Notes

Title: Semantic Query Caching in Mobile Environments

1
Semantic Query Caching in Mobile Environments

By Jekkin Shah
Advisor Dr. Konstantinos Kalpakis

2
Semantic Query Caching in Mobile Environments

Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work

3
Introduction

Disparate works and progresses in
Geographic Information System (GIS)
Global Positioning System (GPS)
Wireless Technology
Handheld devices
Convergence to Mobile Geographic Information
System (mobile GIS)
Rapid growth in mobile GIS applications in all
walks of life
Emphasis on spatial data, its storage, retrieval
and manipulation

4
Convergence
GIS
GPS
Mobile GIS
Wireless
Handheld
5
Growing List of Applications

Car navigation systems
Emergency services
Real time stock quotes
Field services
Real time tracking and routing of shipments
Environmental surveys
and the list is growing rapidly

6
Semantic Query Caching in Mobile Environments

Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work

7
Motivation

Hungry !!! Lets find a nearby restaurant
query Q1
FIND restaurants WHERE location nearby

Found 37 matches
8
Example 1 (cont.)

Wait We also need some gas !!!
Lets see if we can find a gas station near
McDonalds.
query Q2
FIND McDonalds WHERE gas Station nearby

Found 2 matches
9
Shouldnt we speed up the process ?

Query Q1 is in local cache
Query Q1 subsumes query Q2
Why do we need to execute query Q2 from scratch
??
We need a technique to determine and extract Q2
from Q1
Unfortunately, traditional techniques like page
caching do not provide much help in this case

Q1
Q2
10
A new approach Semantic Caching

Along with query results, store the queries also
in cache
Use these queries (query descriptors) to
determine if and how a new query can be answered
from cache
Check if the required data is present in cache.
Extract the data from cache
Add, remove, merge data by performing
corresponding operation on query descriptors
Manage cache by managing the query descriptors
Think of query descriptors as intelligent pointer
references that implicitly contain some
information about the data they refer to

11
Problems with traditional caching

Pointer references do not contain any implicit
information
Q1 ? p1,p2,p3,p4,p5,p6
Q2 ? p7,p8,p9,p10,p11,p12
Q3 ? all the pages
Space constraints will make it difficult to store
all the pages in cache.

p1
p2
p3
p4
p5
p6
data3
p7
p8
p9
p10
p11
p12
12
Semantic Query Caching in Mobile Environments

Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work

13
Contribution

An architecture for Semantic Caching in mobile
environments
A system prototype as a proof-of-concept with
the following building blocks
Query parser and validator
A Solver for determining query satisfiability
An Executor for processing partial and remainder
queries
A Cache manager for efficiently managing the
cache
A cache replacement algorithm
Techniques for query processing

14
Semantic Query Caching in Mobile Environments

Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work

15
Issues in semantic caching

Although the idea of semantic caching is straight
forward, store query descriptors along with their
results, the issues involved are much harder !!
Simple concept but Difficult Implementation
Issues
1. We need to decide if the answer is present in
cache
2. If present, do we have sufficient information
to extract it ?

16
Answering Queries from Cache
Is result of Q3 present in (Q1 Q2) ?
17
Solving the implication problem

Let T Q1, Q2 be a set of query descriptors
already in cache
We need to show that Q?T
We show that (Q ? T) is FALSE
(Q ? T)
? ( Q ? T)
? Q ? (T)
? Q ? (T1 ? T2 ? T3 ? T4)
? Q ? (T1) ? (T2) ? (T3) ? (T4)
This is the primary technique used in our thesis.
The algorithm is adopted from LY85.

18
Solving the implication problem (Cont.)

Exponential growth in the number of equations to
be solved.
Solution
Clustering based on Signatures
Signature created by taking into account the
predicate attributes present in the query
Restriction on the number of clusters created
Signature used in indexing the query descriptors

Attr A, B
Attr X, D
19
Data Extraction problem

Can we extract Data3 ?
Data1
Data3
Data2
We fetch attribute C from remote source and take
a Cartesian product with the data already present
in cache
20
Answering Partial Queries

What happens if Q?T is FALSE ?
There may be a non empty intersection set between
Q and T
Answer (Q ? T) locally (Partial match)
Send (Q ? T) to the server (Remainder Query)

T1
T2
Q
21
Semantic Query Caching in Mobile Environments

Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work

22
Semantic Caching Architecture
Solver (Query implication)
query
Query parser and Validator
Remote db
Executor
results
Cache manager
Local Cache
23
Cache Structure

Local Cache is implemented as relational database
structures
Query descriptors are stored in one table indexed
by their signatures
Corresponding query results (data) are stored in
another table
An auxiliary table associates the query
descriptors with its corresponding data
Cache manager interacts with query descriptor
table
Manipulation of data is achieved through the
manipulation of query descriptors

24
Cache Operations and Management

Cache Manager
Replacement module
Replacement Determines what needs to be cached
and what can be purged out
Management module
Addition Granularity of addition is a semantic
region
Deletion Removal of region, though not
necessarily leading to the removal of data
Merge To simplify query processing, two or more
regions can be merged
Decomposition A very large region, can be
decomposed for efficiency reasons

25
Cache Replacement

Theory and Assumptions
What is the performance metric ?
Conventional caching schemes optimize one or more
of the following parameters with the goal of
improving the performance
Hit ratio
Response time
Data transmission time
Due to the dynamics of our application domain,
none of these parameters truly reflect the
performance of our applications

26
Theory and Assumptions (Cont.)

Cache Hit Rate how do we define hit rate ?
One At least one data record obtained from cache
All All data records to be obtained from local
cache
Mid 50 of data records to be satisfied from
local cache
Response time
Partially answered queries make it difficult to
accurately define the response time
Data transmission time
Lot of dependence on the actual network
parameters like latency and bandwidth

27
Theory and Assumptions (Cont.)

Mobile environments Premium on bandwidth
Our goal To minimize the cost of servicing the
requests that cannot be answered from the local
cache
Cost is measured in terms of time
Performance metric is Byte hit rate (BHR)
Ratio of actual amount of data served from local
cache to the amount of data transferred from the
remote source
Assumptions
Negligible query execution time
Uniform latency and bandwidth across the network

28
Replacement Algorithm

Guiding Action Selection function (GAS) to assign
a value to each semantic region
GAS value a (s f b)
s size of data transferred from the remote
source
f frequency of access of the query
a, b are domain specific parameters
a freshness count of each query
b 1/Sd, where Sd is the distance between the
current location of the moving object and the
location of query
Using the GAS function the value of each semantic
region is calculated

29
Replacement Algorithm (Cont.)

For each query in cache we have,
GAS value (Vi)
Weight (Wi)
Also, we have a limit on the total size of the
cache (W) and also the total number of queries
(K) that can be admitted
Problem definition
Given a set of rectangles with a weight and a
value, choose at most K rectangles that gives
maximum value, provided the weight does not
exceed W
The problem can be formulated as the 0-1 Knapsack
problem with additional cardinality constraint

30
Semantic Query Caching in Mobile Environments

Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work

31
Experiments (Setup)

Requirements
Workload (datasets and queries)
Modeling the behavior of the moving object
Query execution guidelines
Real datasets
Hard to obtain
Complexity in processing due to complex
structures of spatial objects
Synthetic dataset generator
Easily generated
Various parameters can be controlled

32
Workload

Query load selection
Tables
Restaurants LocX, LocY, Name, ID, tables, City,
Zip
Gas Stations LocX, LocY, Name, ID, Low, Mid,
High
Query specifications
Rectangular queries (select and project only)
Number of queries issued per trip 20-70
Type of queries Location aware, location
dependent and non-location related
Frequency of issuance Selected randomly ranging
from 5 ms to 100 ms
Overlap rate 10-25

33
Experiments (Moving Object)

Behavior of Moving Object
Generating Spatio-Temporal Dataset (GSTD) PT00
Moves in a 2D space
Static points and regions called infrastructure
emulate real life objects like buildings, rivers,
roads etc.
Trajectories are generated using specific
guidelines
Initial statistical distribution of
infrastructure objects
Source and destination location
Speed of moving object
Direction of motion
Duration of journey

34
Query Execution Guidelines

Controllable parameters
Type of queries
Location dependent, Location aware, Non-location
related
Frequency of query issuance
Selectivity of chosen queries
Query overlap rate
Parameters are chosen in a variety of
combinations
Random
Gaussian distribution
Skewed distribution

35
Results

Cache Size Vs Hit Rate ( NEW vs m-LRU)
The NEW replacement scheme compares roughly equal
to modified LRU replacement scheme
BHR increases upto 70 when cache size is
progressively increased

36
Results

Hit rates Vs Number of queries (NEW scheme)
Increasing the number of queries in the system
does not substantially increase the hit rates.
Byte hit rate performs nearly equal to Hit rate
Mid

37
Semantic Query Caching in Mobile Environments

Introduction
Motivation
Contribution
Concept of Semantic Caching
Issues involved in semantic caching
System Architecture
Prototype and Experiments
Conclusion and further work

38
Conclusion

No assumption made on Spatial Locality of
Reference
Query descriptors act as Intelligent References
Can support Content Based Reasoning
Ability to take advantage of Schema Knowledge
Page / Tuple caching schemes do not scale well in
our GIS domain
Reasons
Unintelligent pointer references
Questionable assumption of Spatial Locality of
Reference
Inability to take advantage of Semantic Overlaps

39
Advantages of Semantic Caching

Benefits of Semantic Caching
Leverages semantic locality found in typical
mobile GIS applications
Adapts dynamically to the patterns of user
queries rather than caching static clusters of
tuples
Minimizes cost of cache lookup due to compact
representation of query descriptors
Capable of providing partial and/or approximate
answers to queries quickly

40
Conclusion (Cont.)

Shortcomings of Semantic Caching
Complicated cache management schemes
Too restrictive. Solver can process only simple
type of queries
Captures the semantics of the query and not the
result objects. Hence, fails to utilize cached
objects when the semantics of the query do not
match

41
Conclusion (Cont.)

Future work Lots of things
Make the solver more general to handle different
types of queries
Make the caching scheme flexible enough to
capture the semantics of the query descriptors as
well as the result objects
Simpler cache management
Ability to share cache with peers

Write a Comment

User Comments (0)

About PowerShow.com

Semantic Query Caching in Mobile Environments PowerPoint PPT Presentation