Using User Access Patterns for Semantic Query Caching - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Using User Access Patterns for Semantic Query Caching

Description:

help to anticipate and pre-fetch incoming queries. help for semantic query caching. ... pre-fetching queries based on user access graphs. integrating query pre ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 18
Provided by: cseY
Category:

less

Transcript and Presenter's Notes

Title: Using User Access Patterns for Semantic Query Caching


1
Using User Access Patterns for Semantic Query
Caching
  • Qingsong Yao, and Aijun An
  • qingsong,ann_at_cs.yorku.ca
  • Department of Computer Science
  • York University
  • Toronto, Canada
  • 2003.08

2
Topics of Discussion
  • User Access Patterns
  • Semantic Query Caching Problem and Solutions
  • Algorithms
  • Implementation and Experiments
  • Conclusion and Future Work

3
Background
  • All SQL queries submitted by a client or a user
    have specific meaning, and the query execution
    orders follow certain business logics or rules.
  • The applications are written by using certain
    programming tools. The embedded business logics
    ensure that the submitted queries have certain
    formats and follow certain rules.
  • In a database-driven web site, each dynamic web
    page corresponds to a set of queries. The web
    visitors show certain navigation patterns, thus
    the queries show certain orders.
  • User access patterns describe how a group of
    users or a client application accesses the data
    of a database, include
  • a collection of user access events which
    represents the format of queries.
  • a collection of frequent user access graphs
    which describes the query execution orders.
  • User access patterns can be mined from database
    workload or business logics, and they can
  • help to rewrite certain SQL to gain fast
    execution time.
  • help to tune the database system,
  • help to anticipate and pre-fetch incoming
    queries.
  • help for semantic query caching.

4
User Access Patterns
  • User access patterns describe how a group of
    users or a client application accesses the data
    of a database, include
  • a collection of user access events associated
    with the occurrence frequency and the parameter
    distribution which describe the format of
    queries.
  • a collection of frequent user access graphs
    which describes the query execution orders.
  • User access patterns can be mined from database
    workload or business logics, and they can
  • help to rewrite certain SQL to gain fast
    execution time.
  • help to tune the database system, and generate
    aalternative query execution path by adding
    index, materialized view , statistics.
  • help to anticipate and pre-fetch incoming
    queries.
  • help for semantic query caching.
  • help to find possible data distributions or
    relations.

5
User Access Event and User Access Graph
  • A user access event represents a set of similar
    queries. It contains a SQL template and a set of
    parameters.
  • SQL Template each value of the SQL queries is
    replaced by a wildcard character ()
  • Parameters the corresponding values of the
    queries, and can be constants or variables
  • For example, event (select name from customer
    where id,101) will retrieve customer 101 s
    name, and event (select name from customer where
    idd, cid) represent a set of queries that
    retrieve the name of a given customer
  • A user access graph is a directed dependency
    graph which represents the query execution order
  • each node is a user access event or user access
    graph
  • each edge associates with a confidence value
  • contains a set of global variables shared by the
    nodes
  • some nodes associate with actions which change
    the value of global variables
  • Semantic relationship exists between the events
    of a graph

6
An Example of User Access Graph
V31,V32,V34 are determined event.V34 is
parameter-determined by V33. V31 and V32 have the
same query predicate.
7
Problem and Solutions
  • Semantic Query Caching
  • Previous query results are cached at clients or
    mediates.
  • Each cache associates with a formula to describe
    the content.
  • By finding the semantic relationship between the
    cache formula and the query predicate, cached
    query result can help to evaluate incoming
    queries.
  • In order to answer a query, a probe query is
    performed on the cached data, as well as a
    reminder query is performed on the server to
    retrieve the data which is not in the caches.
  • Problem
  • no good cache selection and replacement policies.
  • cache matching time can not be ignored.
  • Solution
  • anticipate future queries according to request
    sequence and user access graphs.
  • rewrite current query to answer future queries.

8
Problem and Solutions (2)
  • Given two consecutive parameterized queries u and
    v, three kinds of solutions are proposed

PR
SEQ
  • Costs
  • Response Time Network Traffic Server
    Processing Time and Costs.
  • Factors
  • The semantic relationship between u and v.
  • The result size of u and v.
  • The possibility that v follows u .

9
Semantic Relationship
rows
u v
columns
6. u and v are irrelevant.
10
Rewriting Algorithm
  • Assumption the query predicates of u and v are
    both the conjunction of basic predicate units
    var1 op var2 cons., where op in ,lt,gt,gt,lt
  • solutions
  • if v is undetermined, solutions
  • else if v is result-determined by u, solutions
    SEQ
  • else if u and v are irrelevant, solutions
    SEQ
  • else Gu weighted_directed_graph(u)
  • Gv weighted_directed_graph(v)
  • comm_diff(Gu,Gv,comm,diff)
  • rel relation(comm,diff)
  • choose solutions according to rel
  • generate rewriting queries
  • select a solution based on the overall costs.

11
Example
  • u1 select a1 from r1 where a21 and a3lt3
  • u2 select a1 from r1 where a21 and a3gt1,

u1gt select a1 from r1 where a2lt1 and 0lta2-1
and a3lt3
comm (a2,0,1),(0,a2,-1) difflt(a3,0,3),null
gt, ltnull,(0,a3,-1)gt, lt(a3,a2,2),nullgt,ltn
ull,(a2,a3,0)gt
Relationship partial-match
Rewriting queries u1 ? u2 select a1,a3 from r1
where a21 u1 select a1,a3 from r1 where a21
and a3 lt 3 u2 select a1 from r1 where a21
and a3gt3
12
Implementation - Architecture
  • SQL-Relay is an event-driven, rule-based database
    gateway
  • Each connected user correspond to a state machine
    which contains a set of states, variables and
    user request sequence.
  • Each incoming query is one kind of user access
    event.
  • Each event associates with a set of pre-defined
    execution rules.
  • SQL-Relay contains a set of standard routine to
    process a given execution rules.
  • Previous query results are cached for answering
    incoming queries.
  • Two different kinds of caches global cache and
    local cache.

13
Implementation-Cache Management
Two kinds of caches
  • A global cache aims to answer multiple request
    from the clients.
  • always available to answer queries.
  • various cache update policies can be used
    (immediate update, periodic update, or
    deferred update) .
  • can not be replaced by other caches, and is
    always available unless it is temporarily
    disabled due to updating.
  • A local cache aims to answer the request from a
    specific client.
  • has shorter life-time.
  • may be replaced by other local caches, and an
    LRU-like replacement policy is used for cache
    replacement.
  • is discarded when the underlying data is update.

14
Experiment Result (1)
  • Mining result for a client/server application
    from one day's database queries log
  • 12 instances of the application, and 9,344 SQL
    queries.
  • 190 user access events.
  • 718 user request sequences belong to 21 frequent
    user access graphs (support gt10).

q2,q3,q4 has horizontal-match relationship, can
be rewrite as select count() as num,
card_id, contract_last,
contract_first from customer where
cust_num1074
15
Experiment Result (2)
  • Client program is implemented by using java
    language, simulates user request sequences based
    on the user access graphs and the following
    parameters
  • nClients the number of concurrent connected
    clients.
  • nPaths the number of user request
    sequences.
  • nRuns the number of runs.
  • errorFreq the frequency of a random event
    occurs.
  • abortFreq the frequency of a sequence abort
    to execution.
  • randomFreq the frequency of a random sequence
    occurs.
  • sleepTimePerEvent average sleep time between
    two queries, default 0.1s.
  • sleepTimePerPath average sleep time between
    two sequences,default 0.5s.
  • sleepTimePerRun average sleep time between
    two runs, default 3min.
  • Database server is MySQL version 4 which
    implements a server-side query cache function,
    and MySQL is configured with a 128Kbytes server
    cache.
  • SQL-Relay is implemented by using java language,
    and has 128Kbytes global cache, 4Kbytes local
    cache per connected client.

16
Experiment Result (3)
  • Client program simulates user request sequences
    based on the user access graphs. Database server
    is MySQL version 4 which is configured with a
    128Kbytes server cache. SQL-Relay is implemented
    by using java language, and has 128Kbytes global
    cache, 4Kbytes local cache per connected client
  • Comparison of cache performance under the
    following conditions
  • executing queries without cache
  • executing queries with 128K server cache
  • pre-fetching queries based on user access graphs
  • integrating query pre-fetching and query
    rewriting rules together
  • Our solution has better performance than
    others.
  • cache hit frequency is higher that server-side
    cache.
  • save network traffic by retrieving less data from
    the server.
  • the rewritten queries have less server I/Os.

17
Experiment Result (4) TPC-W Benchmark
  • Pseudo code for order display web interaction
  • q1 select c_id from customer where
    c_uname_at_c_uname and c_passwd_at_c_passwd
  • q2 select max(o_id) from orders where
    o_c_id_at_c_id
  • q3 select customer., orders., address.
    country. from customer,address,country,orders
  • where o_id_at_o_id and c_id_at_c_id and
    o_bill_addraddr_id and addr_co_idco_id
  • q4 select address., country. from address,
    country
  • where addr_id_at_a_ship_id and addr_co_idco_id
  • q5 select from order_line,item where
    ol_i_idi_id and ol_o_id_at_o_id
  • split join query q3 into two queries
  • q3_1 select from customer where cid_at_c_id
  • q3_2 select orders., address. country. from
    customer,address,country,orders
  • where o_id_at_o_id and o_bill_addraddr_id and
    addr_co_idco_id
  • rewrite q1 to include the answer of q1 and q3_1
  • q1 select from customer where
    c_uname_at_c_uname and c_passwd_at_c_passwd
  • reason c_uname and c_passwd is the key.
  • benefit access table customer only once, and
    avoid join customer with other tables.
  • disadvantage retrieve more data when the
    customer has no orders.
  • question can we rewrite q3_2 to include the
    answer of q3_2 and q4?
  • select orders., a1., a2. c1., c2. from
    address a1, address a2, country c1, country c2,
    orders
  • where o_id_at_o_id and o_bill_addra1.addr_id and
    a1.addr_co_idc1.co_id
  • and o_ship_addra2.addr_id and
    a2.addr_co_idc2.co_id
  • reason o_ship_addr , addr_co_id are foreign
    key.
  • benefit make use of foreign key constraints.
  • disadvantage introduce new joins.

18
Experiment Result (5) TPC-C Benchmark
  • Pseudo code for order status transaction
  • q1 EXEC SQL select count(c_id) INTO namecnt
  • from customer where
    c_lastc_last AND c_d_idd_id AND c_w_idw_id
  • q2EXEC SQL DECLARE c_name CURSOR FOR
  • select c_balance, c_first, c_middle, c_id
  • from customer where c_lastc_last AND
    c_d_idd_id AND c_w_idw_id
  • order by c_first
  • EXEC SQL OPEN c_name
  • q3 if (namecnt2) namecnt // Locate midpoint
    customer
  • for (n0 nltnamecnt/2 n)
  • q4 EXEC SQL FETCH c_name INTO c_balance,
    c_first, c_middle, c_id
  • q5 EXEC SQL CLOSE c_name
  • Solution
  • Execute q2 instead of q1 when q1 is submitted.
  • q1 can be answered by retrieving the number of
    rows returned by the cursor.
  • Reason
  • q2 always follows q1 (i.e., the confidence is
    1.0).
  • q1 and q2 have similar query predicate (except
    the order by clause).
  • q2 contains q1 (i.e., the answer of q1 can be
    answered by that of q2).
  • Benefit
  • Only submit one query to the server.
  • Only access the base relation once.
  • Improve performance when no customer meets the
    condition since the original solution search the
    database twice.
  • Disadvantage
  • Not a general solution, every DB has own function
    to retrieve the number of rows of a cursor.
  • More
  • DB server can take advantage of such
    relationship.

19
Conclusion and Future Work
  • Our solution has the several advantages
  • our caching algorithms are based on the query
    execution orders and the semantic relationship
    between queries, which are better than the
    selection policies based on the global query
    reference statistics.
  • It separates global cache with local cache, and
    which will result in a better cache hit ratio.
  • Our SQL-Relay application is flexible and
    extendable where various caching and rewriting
    rules can be added and tested.
  • Future work
  • Finding user access pattern from database
    workload.
  • Exploring more ways to use user access patterns.

20
The End.
  • Thanks
Write a Comment
User Comments (0)
About PowerShow.com