Title: Semantic Query Caching in Mobile Environments
1Semantic Query Caching in Mobile Environments
- By Jekkin Shah
- Advisor Dr. Konstantinos Kalpakis
2Semantic Query Caching in Mobile Environments
- Introduction
- Motivation
- Contribution
- Concept of Semantic Caching
- Issues involved in semantic caching
- System Architecture
- Prototype and Experiments
- Conclusion and further work
3Introduction
- Disparate works and progresses in
- Geographic Information System (GIS)
- Global Positioning System (GPS)
- Wireless Technology
- Handheld devices
- Convergence to Mobile Geographic Information
System (mobile GIS) - Rapid growth in mobile GIS applications in all
walks of life - Emphasis on spatial data, its storage, retrieval
and manipulation
4Convergence
GIS
GPS
Mobile GIS
Wireless
Handheld
5Growing List of Applications
- Car navigation systems
- Emergency services
- Real time stock quotes
- Field services
- Real time tracking and routing of shipments
- Environmental surveys
- and the list is growing rapidly
6Semantic Query Caching in Mobile Environments
- Introduction
- Motivation
- Contribution
- Concept of Semantic Caching
- Issues involved in semantic caching
- System Architecture
- Prototype and Experiments
- Conclusion and further work
7Motivation
- Hungry !!! Lets find a nearby restaurant
-
- query Q1
- FIND restaurants WHERE location nearby
Found 37 matches
8Example 1 (cont.)
- Wait We also need some gas !!!
- Lets see if we can find a gas station near
McDonalds. - query Q2
- FIND McDonalds WHERE gas Station nearby
Found 2 matches
9Shouldnt we speed up the process ?
- Query Q1 is in local cache
- Query Q1 subsumes query Q2
- Why do we need to execute query Q2 from scratch
?? - We need a technique to determine and extract Q2
from Q1 - Unfortunately, traditional techniques like page
caching do not provide much help in this case
Q1
Q2
10A new approach Semantic Caching
- Along with query results, store the queries also
in cache - Use these queries (query descriptors) to
determine if and how a new query can be answered
from cache - Check if the required data is present in cache.
- Extract the data from cache
- Add, remove, merge data by performing
corresponding operation on query descriptors - Manage cache by managing the query descriptors
- Think of query descriptors as intelligent pointer
references that implicitly contain some
information about the data they refer to
11Problems with traditional caching
- Pointer references do not contain any implicit
information - Q1 ? p1,p2,p3,p4,p5,p6
- Q2 ? p7,p8,p9,p10,p11,p12
- Q3 ? all the pages
- Space constraints will make it difficult to store
all the pages in cache.
p1
p2
p3
p4
p5
p6
data3
p7
p8
p9
p10
p11
p12
12Semantic Query Caching in Mobile Environments
- Introduction
- Motivation
- Contribution
- Concept of Semantic Caching
- Issues involved in semantic caching
- System Architecture
- Prototype and Experiments
- Conclusion and further work
13Contribution
- An architecture for Semantic Caching in mobile
environments - A system prototype as a proof-of-concept with
the following building blocks - Query parser and validator
- A Solver for determining query satisfiability
- An Executor for processing partial and remainder
queries - A Cache manager for efficiently managing the
cache - A cache replacement algorithm
- Techniques for query processing
14Semantic Query Caching in Mobile Environments
- Introduction
- Motivation
- Contribution
- Concept of Semantic Caching
- Issues involved in semantic caching
- System Architecture
- Prototype and Experiments
- Conclusion and further work
15Issues in semantic caching
- Although the idea of semantic caching is straight
forward, store query descriptors along with their
results, the issues involved are much harder !! - Simple concept but Difficult Implementation
- Issues
- 1. We need to decide if the answer is present in
cache - 2. If present, do we have sufficient information
to extract it ?
16Answering Queries from Cache
Is result of Q3 present in (Q1 Q2) ?
17Solving the implication problem
- Let T Q1, Q2 be a set of query descriptors
already in cache - We need to show that Q?T
- We show that (Q ? T) is FALSE
- (Q ? T)
- ? ( Q ? T)
- ? Q ? (T)
- ? Q ? (T1 ? T2 ? T3 ? T4)
- ? Q ? (T1) ? (T2) ? (T3) ? (T4)
- This is the primary technique used in our thesis.
- The algorithm is adopted from LY85.
18Solving the implication problem (Cont.)
- Exponential growth in the number of equations to
be solved. - Solution
- Clustering based on Signatures
- Signature created by taking into account the
predicate attributes present in the query - Restriction on the number of clusters created
- Signature used in indexing the query descriptors
Attr A, B
Attr X, D
19Data Extraction problem
Can we extract Data3 ?
Data1
Data3
Data2
We fetch attribute C from remote source and take
a Cartesian product with the data already present
in cache
20Answering Partial Queries
- What happens if Q?T is FALSE ?
- There may be a non empty intersection set between
Q and T - Answer (Q ? T) locally (Partial match)
- Send (Q ? T) to the server (Remainder Query)
T1
T2
Q
21Semantic Query Caching in Mobile Environments
- Introduction
- Motivation
- Contribution
- Concept of Semantic Caching
- Issues involved in semantic caching
- System Architecture
- Prototype and Experiments
- Conclusion and further work
22Semantic Caching Architecture
Solver (Query implication)
query
Query parser and Validator
Remote db
Executor
results
Cache manager
Local Cache
23Cache Structure
- Local Cache is implemented as relational database
structures - Query descriptors are stored in one table indexed
by their signatures - Corresponding query results (data) are stored in
another table - An auxiliary table associates the query
descriptors with its corresponding data - Cache manager interacts with query descriptor
table - Manipulation of data is achieved through the
manipulation of query descriptors
24Cache Operations and Management
- Cache Manager
- Replacement module
- Replacement Determines what needs to be cached
and what can be purged out - Management module
- Addition Granularity of addition is a semantic
region - Deletion Removal of region, though not
necessarily leading to the removal of data - Merge To simplify query processing, two or more
regions can be merged - Decomposition A very large region, can be
decomposed for efficiency reasons
25Cache Replacement
- Theory and Assumptions
- What is the performance metric ?
- Conventional caching schemes optimize one or more
of the following parameters with the goal of
improving the performance - Hit ratio
- Response time
- Data transmission time
- Due to the dynamics of our application domain,
none of these parameters truly reflect the
performance of our applications
26Theory and Assumptions (Cont.)
- Cache Hit Rate how do we define hit rate ?
- One At least one data record obtained from cache
- All All data records to be obtained from local
cache - Mid 50 of data records to be satisfied from
local cache - Response time
- Partially answered queries make it difficult to
accurately define the response time - Data transmission time
- Lot of dependence on the actual network
parameters like latency and bandwidth
27Theory and Assumptions (Cont.)
- Mobile environments Premium on bandwidth
- Our goal To minimize the cost of servicing the
requests that cannot be answered from the local
cache - Cost is measured in terms of time
- Performance metric is Byte hit rate (BHR)
- Ratio of actual amount of data served from local
cache to the amount of data transferred from the
remote source - Assumptions
- Negligible query execution time
- Uniform latency and bandwidth across the network
28Replacement Algorithm
- Guiding Action Selection function (GAS) to assign
a value to each semantic region - GAS value a (s f b)
- s size of data transferred from the remote
source - f frequency of access of the query
- a, b are domain specific parameters
- a freshness count of each query
- b 1/Sd, where Sd is the distance between the
current location of the moving object and the
location of query - Using the GAS function the value of each semantic
region is calculated
29Replacement Algorithm (Cont.)
- For each query in cache we have,
- GAS value (Vi)
- Weight (Wi)
- Also, we have a limit on the total size of the
cache (W) and also the total number of queries
(K) that can be admitted - Problem definition
- Given a set of rectangles with a weight and a
value, choose at most K rectangles that gives
maximum value, provided the weight does not
exceed W - The problem can be formulated as the 0-1 Knapsack
problem with additional cardinality constraint
30Semantic Query Caching in Mobile Environments
- Introduction
- Motivation
- Contribution
- Concept of Semantic Caching
- Issues involved in semantic caching
- System Architecture
- Prototype and Experiments
- Conclusion and further work
31Experiments (Setup)
- Requirements
- Workload (datasets and queries)
- Modeling the behavior of the moving object
- Query execution guidelines
- Real datasets
- Hard to obtain
- Complexity in processing due to complex
structures of spatial objects - Synthetic dataset generator
- Easily generated
- Various parameters can be controlled
32Workload
- Query load selection
- Tables
- Restaurants LocX, LocY, Name, ID, tables, City,
Zip - Gas Stations LocX, LocY, Name, ID, Low, Mid,
High - Query specifications
- Rectangular queries (select and project only)
- Number of queries issued per trip 20-70
- Type of queries Location aware, location
dependent and non-location related - Frequency of issuance Selected randomly ranging
from 5 ms to 100 ms - Overlap rate 10-25
33Experiments (Moving Object)
- Behavior of Moving Object
- Generating Spatio-Temporal Dataset (GSTD) PT00
- Moves in a 2D space
- Static points and regions called infrastructure
emulate real life objects like buildings, rivers,
roads etc. - Trajectories are generated using specific
guidelines - Initial statistical distribution of
infrastructure objects - Source and destination location
- Speed of moving object
- Direction of motion
- Duration of journey
34Query Execution Guidelines
- Controllable parameters
- Type of queries
- Location dependent, Location aware, Non-location
related - Frequency of query issuance
- Selectivity of chosen queries
- Query overlap rate
- Parameters are chosen in a variety of
combinations - Random
- Gaussian distribution
- Skewed distribution
35Results
- Cache Size Vs Hit Rate ( NEW vs m-LRU)
- The NEW replacement scheme compares roughly equal
to modified LRU replacement scheme - BHR increases upto 70 when cache size is
progressively increased
36Results
- Hit rates Vs Number of queries (NEW scheme)
- Increasing the number of queries in the system
does not substantially increase the hit rates. - Byte hit rate performs nearly equal to Hit rate
Mid
37Semantic Query Caching in Mobile Environments
- Introduction
- Motivation
- Contribution
- Concept of Semantic Caching
- Issues involved in semantic caching
- System Architecture
- Prototype and Experiments
- Conclusion and further work
38Conclusion
- No assumption made on Spatial Locality of
Reference - Query descriptors act as Intelligent References
- Can support Content Based Reasoning
- Ability to take advantage of Schema Knowledge
- Page / Tuple caching schemes do not scale well in
our GIS domain - Reasons
- Unintelligent pointer references
- Questionable assumption of Spatial Locality of
Reference - Inability to take advantage of Semantic Overlaps
39Advantages of Semantic Caching
- Benefits of Semantic Caching
- Leverages semantic locality found in typical
mobile GIS applications - Adapts dynamically to the patterns of user
queries rather than caching static clusters of
tuples - Minimizes cost of cache lookup due to compact
representation of query descriptors - Capable of providing partial and/or approximate
answers to queries quickly
40Conclusion (Cont.)
- Shortcomings of Semantic Caching
- Complicated cache management schemes
- Too restrictive. Solver can process only simple
type of queries - Captures the semantics of the query and not the
result objects. Hence, fails to utilize cached
objects when the semantics of the query do not
match
41Conclusion (Cont.)
- Future work Lots of things
- Make the solver more general to handle different
types of queries - Make the caching scheme flexible enough to
capture the semantics of the query descriptors as
well as the result objects - Simpler cache management
- Ability to share cache with peers