Bidirectional Expansion for Keyword Search on Graph Databases

About This Presentation

Title:

Bidirectional Expansion for Keyword Search on Graph Databases

Description:

Need to find a (closely) connected set of nodes that together match all given keywords ... Propagate values to ancestors if necessary ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 33

Provided by: Char183

Category:

more less

Transcript and Presenter's Notes

Title: Bidirectional Expansion for Keyword Search on Graph Databases

1
Bidirectional Expansion for Keyword Searchon
Graph Databases
Varun Kacholia
Shashank Pandit Soumen Chakrabarti
S. Sudarshan Rushi Desai
Hrishikesh Karambelkar
http//www.cse.iitb.ac.in/banks/
2
Keyword Search on Graph Representation of Data

Keyword search on relational, XML, HTML, etc.
data
BANKS, Discover, DBXplorer, XRank, etc.
Need to find a (closely) connected set of nodes
that together match all given keywords
Focus of our work
Search algorithms to find connections between
nodes

3
Outline

Data, Query and Response Models
Backward Search Algorithm
Bidirectional Search Algorithm
Experiments
Related Work
Conclusions

4
Graph Data Model

Data modeled as a directed weighted graph BANKS
ICDE02
Can model relational, XML, HTML, etc. data
E.g., DBLP database
Node tuple
Edge foreign key reference

5
Graph Data Model (2)

E.g., XML data
ltproceedingsgt
ltpaper id1gt
lttitlegtDatabaseslt/titlegt
lt/papergt
ltpaper id2gt
lttitlegtKeyword Searchlt/titlegt
ltcite ref1gtDatabaseslt/citegt
lt/papergt
lt/proceedingsgt

proceedings
paper (_at_id 1)
paper (_at_id 2)
cite
title
title
6
Response Model

Response Minimal, rooted tree connecting keyword
nodes
Undirected Discover, DBXplorer
Directed BANKS

paper
Multi-Query Optimization
E.g., Sudarshan Roy
writes
writes
author
author
Prasan Roy
Sudarshan
7
Response Ranking

Edge Score EA
Smaller tree gt higher score
E.g., BANKS EA 1/ (S edge weights)
Node Score NA
Measure of authority of nodes in tree
E.g., BANKS NA S (leaf and root node
authorities)
Overall score f (EA, NA)
E.g., BANKS f (EA, NA) EA . NAl

8
Finding Answer Trees

Backward Expanding Search
BANKS ICDE02
Intuition travel backwards from keyword nodes
till you hit a common node

Query sudarshan roy
MultiQuery Optimization
paper
writes
Sudarshan
Prasan Roy
authors
9
Backward Search Algorithm

Algorithm
Run concurrent single source shortest path
iterators from each node matching a keyword
Traverse the graph edges in reverse direction
Output next nearest node on each get-next() call
Do best-first search across iterators
Output node if in the intersection of sets of
nodes reached from each keyword

10
Backward Search Limitations

Wasteful exploration of graph
Frequently occurring keywords
Hub nodes in the graph (high in-degree)

Shashank Sudarshan Database

Schema Legend
Database

author
writes
paper
Shashank
Sudarshan
11
Bidirectional Search Motivation
12
Bidir Search Intuition

First cut solution
Dont go backward if keyword matches many nodes
Dont go backward if node points to a hub
Instead explore forward from other keywords

13
Bidir Search Example
Shashank Sudarshan Database

Database
Schema Legend

author
writes
Shashank
Sudarshan
paper
14
Bidir Search Issues

What should threshold for not expanding be?
Our solution prioritize expansion of nodes based
on spreading activation
to penalize frequent keywords and bushy trees
How to manage exploration in both directions?

15
Bidir Search Spreading Activation

Spreading Activation
Node with highest activation explored first
Every node given an initial activation
Gives low activation to frequently occurring
keywords

1/5
1/5
1/5
1/5
1/5
John
16
Bidir Search Spreading Activation

Spreading Activation
Node with highest activation explored first
Activation spread to neighbors (µ 0.3)
Gives low activation to neighbors of hubs

0.7 x 1/5 x 1/4
0
1
1/5
1
0.7 x 1/5 x 1/4
0
1
0
0.7 x 1/5 x 1/4
0.3 x 1/5
1
0.7 x 1/5 x 1/4
0
17
Bidir Search Iterators

How to manage exploration in both directions?
Single backward iterator single forward
iterator w/ suitable datastructures
E.g., to keep track of parents of nodes
Details in full paper

Dist from A, Dist from B
7
6
8,8
2,3 8
8,8 2
2,8

8,1
8,1
1,8
3
4
5
0,8
8,0
2
1
A
B
18
Bidir Search Algorithm

Algorithm
Activate matching nodes insert into backward
iterator
while (iterators are not empty)
Choose iterator for expansion in best-first
manner
Explore node with highest activation
Spread activation to neighbors
Update path weights (and other datastructures)
Propagate values to ancestors if necessary
Insert nodes explored in the backward direction
into the forward iterator / for future forward
exploration /
Stop when top-k results are produced

19
Bidir Search top-k results

Results need not be generated in-order
Naïve solution
Store results in an intermediate heap
Output top k results after mk total results have
been generated (m 10)
Can do better
Compute upper bound on score of next result
output answers with a higher score
Similar to NRA algorithm (Fagin et al., PODS01)

20
Experiments

Datasets
DBLP, IMDB 2 million nodes, 9 million edges
US Patent DB 4 million nodes, 15 million edges
Workload
Keywords randomly picked from results of SQL join
statements
Search algorithms
MI-Bkwd original backward search
Iterator for every node matching a keyword
SI-Bkwd backward search with single backward
iterator
Bidirec bidirectional search
Time taken/nodes explored
Measured when 10th answer is generated (or last
answer if answers lt 10)
Origin size
nodes matched by keywords in the query

21
Experiments (2)

MI-Bkwd versus SI-Bkwd
SI-Bkwd gain increases with origin size,
keywords

22
Experiments (3)

SI-Bkwd versus Bidirec
Bidirec gain increases with origin size,
keywords

23
Experiments (4)

Precision/Recall experiments
Relevant answers are well-defined can be
generated through SQL statements
Both MI-Backward and Bidirectional show similar
performance
Recall 100
Precision 100 at near full recall
Few irrelevant answers produced before generating
all relevant answers
Bidirectional runs faster, yet minimal loss of
relevant results!

24
Experiments (5)

Comparison with Sparse
Hristidis et al. VLDB03
Generate join expressions leading to query
results
Use DB-provided scores for ranking tuples and
aggregate them to rank answer trees
For top-k results automatically determine
required number of join expressions
Sparse-LB
Manually generate required join expressions
Sparse needs to do at least this much (and
usually a lot more!)
Bidirectional versus Sparse-LB
Bidirectional outperforms by a factor of 3
(esp. when joins is large)

25
Experiments (6)

SI-Bkwd versus Bidirec by origin size
Bidirec gains more with unbalanced origin sizes

A (T,S,S,S) B (M,M,M,M) C (M,L,L,L) D
(M,M,L,L) E (T,L,L,L) F (T,S,M,L) G
(T,M,L,L) H (T,T,T,L)
26
Discussion

Bidirectional search as dynamic per-tuple join
ordering
Related work in this area Eddies
Bidirectional search
Schema-less
Prioritization based on activation instead of
selectivity
Generate answers in relevance order

27
Related Work

Keyword querying on relational data Discover
(UCSD), DBExplorer (Microsoft)
Use SQL generation, without in-memory data
structures
Issues generate join plans, re-use common
sub-expressions, etc.
Keyword querying on XML
XRank (Cornell), Schema-Free XQuery (Michigan),
Tree model is too limited
ObjectRank

28
Conclusions

Graph model
Convenient common denominator representation
Schema-free querying leads to graph search
Purely backward strategy inadequate
Bidirectional search with spreading activation
performs much better
Dynamically choose join order on per-tuple basis

29
Thank You!
Questions??
30
Future of Keyword Search in DBs

Next generation of intelligent search will
require context information
E.g. search email, files, calendar, ..
Information integration will be important
Graph structured data will be a key component
Is there a killer app?
Deep web?
Display of answers
Users dont want to see schema details
Can we leverage off existing (Web) apps?

31
BANKS Future Work

Applications of BANKS
Soumen Chakrabarti, Sunita Sarawagi and students
Exploit BANKS to integrate different sources of
data
Extract information, Infer soft links
BANKS for personal information management
SPIN Search Personal Information Networks
Ongoing/future work on BANKS
More sysadmin/user control on ranking
One size does not fit all
BANKS provides infrastructure
Characterize bidirectional search better
And find other applications
Security

32
Bidir Search top-k results (2)

Compute upper bound on score of next result
output answers with a higher score
Computing the bound
mi minimum path length explored backward from
keyword i
unseen answer node 1/(m1 m2 mn )
visited answer node suppose reached from first x
keywords with distance di
1/(d1 d2 dx ) (mx1 mx2
mn )
combine this with max node prestige
We simply use 1/(m1 m2 mn )!
Experiments show no significant loss in using
this heuristic

Write a Comment

User Comments (0)

About PowerShow.com

Bidirectional Expansion for Keyword Search on Graph Databases - PowerPoint PPT Presentation

Bidirectional Expansion for Keyword Search on Graph Databases

Need to find a (closely) connected set of nodes that together match all given keywords ... Propagate values to ancestors if necessary ... – PowerPoint PPT presentation