The Graph Query Language - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

The Graph Query Language

Description:

Peter. Smalltown, USA. Example Scenario ... name: Peter. time: 2pm. Fox: fox1. Rabbit: rabbit2. age: 3. name: George. Chases: chases2 ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 40
Provided by: DavidPSi4
Category:
Tags: fox | graph | language | peter | query

less

Transcript and Presenter's Notes

Title: The Graph Query Language


1
The Graph Query Language
  • David Silberberg
  • The Johns Hopkins University
  • Applied Physics Laboratory
  • July 18, 2006

2
Team Members
  • Wayne Bethea
  • Jim Cavanaugh
  • Clay Fink
  • Paul Frank
  • John Gersh
  • Elisabeth Immer
  • Roger Remington

3
Outline
  • Goals Example Scenario
  • Related Work and Key Features of GQL
  • Graph Model and Query Language
  • Computational Complexity of Query Execution
  • Future Directions

4
Goals of the Graph Query Language (GQL)Project
  • To introduce a new approach to graph query
    languages for graph analysis
  • Enable graph analysts to perform semantic search
    and iterative analysis over large graphs in a
    scalable fashion
  • Seamlessly integrate graph analysis functions
    into the graph query language
  • To quantify the scalability of this type of
    language
  • To use ontologies to enrich graph querying

5
Example Scenario
  • Farmer Jones' lettuce crop did well this year,
    but few other farmers did well. Why?
  • First, find Farmer Jones.

6
Example Scenario
  • Rabbits usually eat lettuce. Let's find the
    rabbits that ate Farmer Jones' lettuce.

7
Example Scenario
  • Let's look at all the farmers, and their
    locations, whose lettuce was eaten by fewer than
    5 rabbits.

8
Example Scenario
  • What commonalities do the farmers have with each
    other and with the rabbits?

9
Graph Interaction Methods
  • Graph Analysis is a process of both browsing and
    searching elements of the graph
  • Browsing
  • One-step-at-a-time graph navigation
  • One-operation-at-a-time graph algorithms
  • Searching
  • Several-steps-at-a-time graph navigation
  • The steps can include one or more graph
    algorithms
  • GQL is a declarative graph query language for
    searching!

10
Outline
  • Goals Example Scenario
  • Related Work and Key Features of GQL
  • Graph Model and Query Language
  • Computational Complexity of Query Execution
  • Future Directions

11
Related Work
  • Four categories of graph query languages
  • Knowledge base (subject-predicate-object) query
    languages
  • SPARQL, RQL, RAL, RDF Query Language
  • Graph reasoning query languages
  • OWL-QL, GraphLog, Query and Inference Service for
    RDF
  • Query languages with graph operators
  • GOQL
  • GRAM
  • Graphical user interface query language
  • QGRAPH

12
Key Features of GQL
  • Graph Paradigm
  • Syntax, operators and results use the graph
    paradigm
  • Returns a single graph or a set of graphs (not
    tables or XML files) to support analysis of large
    graphs
  • Facilitates iterative graph querying
  • Semantic Graph Query
  • Schema-based
  • Can be extended to utilize ontology-based
    inference
  • Graph Exploration
  • Wildcard searches
  • Query over patterns

13
Key Features of GQL (continued)
  • Expressivity
  • Composite entities
  • New graph construction of results
  • Universal and existential quantification
  • Analysis support
  • Hypothesis expressions
  • Special graph functions (Shortest Path, Adjacent
    Vertices, etc.)
  • Aggregation functions (count, sum, average, min,
    max)
  • Set aggregation functions (union, intersection,
    difference)

14
Outline
  • Goals Example Scenario
  • Related Work and Key Features of GQL
  • Graph Model and Query Language
  • Computational Complexity of Query Execution
  • Future Directions

15
Graph Data Models
  • Simple model
  • Vertices usually represent concepts or objects
  • Edges usually represent relationships between
    vertices
  • Properties attributes of objects or
    relationships
  • Represent highly-connected information such as
  • Social networks
  • Knowledge bases
  • Disciplines that use graphs
  • Link mining analysis
  • Semantic Web
  • Bioinformatics

16
Example Graph Model
  • Graph Schema
  • Data Graph

17
GQL Operators - Overview
  • Basic Syntax
  • SUBGRAPH clause
  • Finds a subgraph in the source graph
  • CONSTRAINT clause
  • Filters the subgraph based on property
    constraints
  • RETURN clause
  • Describes the resulting graph or sets of graphs
    to return
  • Syntax for analysis
  • ASSUME clause
  • Supports hypothesis statements
  • PATTERN clause
  • Defines search patterns

18
Basic GQL Operators
  • Subgraph Template Operators SUBGRAPH clause
  • Conjunctions and disjunctions of path-segment
    operators
  • Hierarchy operators (for composite vertices)
  • Constraint Operators CONSTRAINT clause
  • Standard first-order logic
  • Conjunctions, disjunctions and negations as well
    as universal and existential quantification of
    predicates.
  • Projection Operators RETURN clause
  • Constructs the result graph(s)
  • Path segment operator
  • Hierarchy operator (for composite vertices)
  • Present results as a set of graphs
  • Edge expansion operator
  • Common join operator

19
Simple Query
  • SUBGRAPH Fox Chases Rabbit AND Fox Eats Rabbit
  • CONSTRAINT Chases.Time lt Eats.Time
  • RETURN Fox Chases Rabbit AND Fox Eats Rabbit

20
New Result Graph Structure Query
  • SUBGRAPH Fox Eats Rabbit AND Rabbit Eats Lettuce
  • RETURN Fox new(Ingests) Lettuce

Fox fox1
Lettuce lettuce1
Ingests ingests1
age 3
name George
name PrizeLettuce
Fox fox2
name Fred
age 2
Lettuce lettuce2
Ingests ingests3
name Icy
21
Aliasing
  • SUBGRAPH Fox ALIAS ChasingFox Chases Rabbit AND
  • Fox ALIAS EatingFox Eats Rabbit
  • CONSTRAINT ChasingFox.name ltgt EatingFox.name
  • RETURN ChasingFox Chases Rabbit AND
  • EatingFox Eats Rabbit
  • If our graph had an additional edge in which
    George Fox chased Jack Rabbit at 8 a.m., the
    result would look like

Fox fox1
age 3
name George
Chases chases3
time 8am
Fox fox2
Rabbit rabbit3
Eats eats2
name Fred
age 2
age 1
name Jack
time 9am
22
Wildcard Queries
  • SUBGRAPH Fox ALIAS InterestingEdge Rabbit
  • RETURN Fox InterestingEdge Rabbit

Fox fox1
Rabbit rabbit1
Chases chases1
time 2pm
age 3
name George
age 2
name Peter
Eats eats1
time 3pm
Chases chases2
Rabbit rabbit2
time 5pm
age 4
name Bugs
Fox fox2
Rabbit rabbit3
Eats eats2
name Fred
age 2
age 1
name Jack
time 9am
23
Composite Vertices
  • Composite vertices
  • Composed of vertices and edges
  • Contained vertices can be composite as well

24
Composite Vertex Queries - continued
  • SUBGRAPH HuntingEvent OccuredAt Place AND
  • HuntingEvent DIRECTLY CONTAINS Rabbit AND
  • Rabbit Eats Lettuce
  • CONSTRAINT Place.name Smith Game Park
  • RETURN Rabbit Eats Lettuce

time
Lettuce
name
Eats
Rabbit
name
age
25
Patterns
  • Pattern Definition
  • Assigns names to interesting graph patterns
  • Can be used in multiple queries
  • PATTERN Predator (Fox new(PreysUpon) Rabbit)
  • SUBGRAPH Fox Chases Rabbit AND
  • Fox Eats Rabbit
  • CONSTRAINT Chases.time lt Eats.time
  • RETURN Fox new(PreysUpon) Rabbit

26
Pattern Use
  • Query
  • SUBGRAPH Predator(Fox PreysUpon Rabbit) AND
  • Rabbit Eats Lettuce
  • RETURN Fox new(Ingests) Lettuce
  • Is evaluated as if it were
  • SUBGRAPH Fox Chases Rabbit AND
  • Fox Eats Rabbit AND
  • Rabbit Eats Lettuce
  • CONSTRAINT Chases.time lt Eats.time
  • RETURN Fox new(Ingests) Lettuce

27
Hypothesis Expressions
  • Enables queries on hypothetical data
  • SUBGRAPH Fox Chases Rabbit AND
  • Fox Eats Rabbit AND
  • Rabbit Eats Lettuce
  • CONSTRAINT Chases.time lt 8am
  • RETURN Fox new(Ingests) Lettuce
  • ASSUME EDGE Chases NEW time 7am
  • FROM FoxCONSTRAINT name Fred
  • TO RabbitCONSTRAINT name Jack

28
Special Graph Operator Queries
  • Shortest Path
  • SUBGRAPH GameWarden Chases Fox AND
  • ShortestPath(Fox, Rabbit) ALIAS SP_alias AND
  • Rabbit Eats Lettuce
  • RETURN GameWarden Chases Fox AND
  • SP_alias AND
  • Rabbit Eats Lettuce
  • Adjacent Vertices
  • SUBGRAPH AdjacentVertices(Rabbit) ALIAS AV_alias
  • CONSTRAINT count_edges(Rabbit) gt 10
  • RETURN AV_alias

29
Returning a Set of Graphs
  • Can be done with edge expansion or joins in the
    RETURN clause
  • Can be seamlessly integrated with non-graph
    expansion expressions
  • Any query can be returned as a set of graphs if
    desired
  • SUBGRAPH Fox Chases Rabbit
  • RETURN Fox Chases Rabbit

30
Outline
  • Goals Example Scenario
  • Related Work and Key Features of GQL
  • Graph Model and Query Language
  • Computational Complexity of Query Execution
  • Future Directions

31
Query Optimization
  • Query execution time is the key to success for
    any query language GQL is no exception
  • Our approach
  • Address query optimization on a per path-segment
    basis
  • Address path-segment ordering
  • Address the management of large amounts of
    intermediate results of a query
  • Our efforts so far
  • Addressed per path-segment optimization
  • Started to address path-segment ordering
  • Have not yet addressed the management of large
    amounts of intermediate results

32
Query Optimization
  • Query plan representations are used to define
    query execution plans
  • Query plan representations are manipulated to
    optimize the query execution time
  • Via laws of graph algebra
  • Via graph statistics to estimate query costs for
    each operation
  • Query optimizer determines
  • The best algorithm to execute each operation
  • The best operation ordering to optimize overall
    query execution time

33
Query Planning and Optimization
  • Query planning process determines the operators
    required to solve a query
  • Query optimization process determines the most
    efficient way to
  • Execute query operators
  • Order the execution of query operators
  • Heuristics have been identified to implement
    query planning and optimization based on
    statistical analysis

34
Graph Statistics
  • Estimating costs requires statistical knowledge
    of the graph
  • We estimate the cost of the path segment operator
  • One of the most common and costly operations
  • Statistics that we initially considered useful
  • Vertex Cardinality The number of vertices of
    type v is count(v) or just V.
  • Vertex Edge Set Cardinality The total number of
    edges e that emanate from all vertices of type v
    is count(ev) or just EV.
  • Edge Cardinality The number of edges of type e
    is count(e) or just E.
  • Edge Distribution The number of different vertex
    type pairs that edges of type e connect of just
    ED.
  • Selectivity Factor The percentage of vertices or
    edges that match a property constraint is sel(?),
    where ? is the property constraint.
  • Uniformity assumption
  • Independence assumption

35
Path Segment Vertex Search, No Indices
  • Algorithm
  • Iterate through a set of vertices of type v in
    O(V) time
  • For each vertex, iterate through its edge list to
    find edges of type e in O(EV/V) time
  • Follow the edge to vertex w in constant time
  • Execution time is O(V(EV/V)) O(EV)

36
Path Segment Indices on Vertex Edge Set
  • Requires each edge set to be indexed through a
    logarithmic-time search tree (e.g., B tree)
  • Next values are (virtually) collocated with the
    matching value
  • Enables a constant time search for the next
    value(s)
  • Algorithm
  • Iterate through vertices of type v in time O(V)
  • Find matching edge(s) in logarithmic time
    O(log(EV/V)
  • Iterate through the matching edges in time
    O(E/EDV)
  • Execution time is O(V (log(EV/V) E/EDV) )
    O(Vlog(EV/V) E/ED)
  • If ED ? E (i.e., one edge of type e emanates from
    each v), then the algorithm tends to operate in
    time O(Vlog(EV/V))
  • If ED ? E and EV ?V, the algorithm tends operate
    in time O(V)
  • If ED ? E and EV?gtgt V, the algorithm tends to
    operate in time O(Vlog(EV))
  • If ED gtgt E, then the algorithm tends to operate
    in time O(E/ED)

37
Path Segment Edge Indices, Constraint
  • Beneficial when the query includes a constraint
    ?v on an indexed property of vertices of type v
  • Vertex edge sets are indexed as well
  • Algorithm
  • Logarithmic-time search through the indexed
    properties ?v in time O(log(V))
  • Iterate through vertices (collocated in the
    index) that satisfy the constraint in time
    O(sel(?v)V)
  • Performs a logarithmic-time search on the edges
    of each matching vertex in time O(log(EV/V))
  • Iterate through the matching edges in time
    O(E/EDV)
  • Execution time is O(log(V) (sel(?v)V(log(EV/V)
    E/EDV)) ) O(log(V) sel(?v)Vlog(EV/V)
    sel(?v)E/ED)
  • If sel(?v) ? 0, the dominant factor is the search
    for vertices or O(log(V))
  • If the selectivity factor is higher, the
    execution time approaches the times of the
    previous slide

38
Path Segment Edge Search, No Indices
  • Algorithm
  • Iterate over edge types e and select those that
    connect v to w in time O(E)
  • Find the corresponding vertices in constant time
  • Execution time is O(E)

39
Path Segment Edge Search, Constraint
  • Beneficial when the query statement includes a
    constraint ?e on an indexed property of edges of
    type e
  • Algorithm
  • Performs a logarithmic-time search through
    properties to find the first matching edge in
    time O(log(E))
  • Performs a linear search through all subsequent
    matching edges in time O(sel(?e)E)
  • Find both vertices attached to each edge in
    constant time
  • Execution time is O(log(E) sel(?e)E)
  • If sel(?e) ? 0, the algorithm tends to an
    execution time of O(log(E))
  • Otherwise, the algorithm tends to an execution
    time of O(E)

40
Varying Number of Vertices per Vertex Type
41
Varying Number of Edges per Vertex
42
Varying Edge Types with Constraints
43
Path Segment Ordering
  • Assume the following query
  • SUBGRAPH Fox Chases Rabbit AND
  • Rabbit Eats Lettuce
  • CONSTRAINT Rabbit.age lt 3
  • RETURN Fox new(Ingests) Lettuce
  • Query processing produces the following query
    execution plan

p Fox new (Ingests) Lettuce
s Rabbit.age lt 3
?
?
Lettuce
Eats
Fox
Rabbit
Chases
44
Path Segment Execution Order Choice
  • Which is more efficient?

p Fox new Ingests Lettuce
p Fox new Ingests Lettuce
s Rabbit.age lt 3
or
?
?
Lettuce
Eats
Fox
Rabbit
Chases
45
Execution Order Heuristics
  • In simple terms
  • Identify the path segment operation that promises
    to return the least number of results
  • Then identify the next operation that promises to
    return the next least number of results
  • It is actually more complicated than this
  • Need to search an exponential number of orderings
    to find the most efficient ordering
  • Heuristics can make this search tractable

46
Path-Segment Ordering Metric
  • Order the path segment operators to return the
    fewest results
  • Rough heuristic
  • If predicates ?v, ?e, and ?w are applied to V, E
    and W respectively
  • Start with V and use selectivity factors to
    estimate execution time
  • Execution time is
  • V sel(?v) (E/EDV) sel(?e) (WED/E)
    sel(?w)
  • Or, sel(?v) sel(?e) sel(?w) W
  • Use this formula to determine whether Fox Chases
    Rabbit should precede or follow Rabbit Eats
    Lettuce

47
Outline
  • Goals Example Scenario
  • Related Work and Key Features of GQL
  • Graph Model and Query Language
  • Computational Complexity of Query Execution
  • Future Directions

48
Future Work
  • Create an operational prototype of a Graph Query
    Language system
  • Continue to address query optimization issues
  • Use ontologies to enrich graph queries
  • Address language issues
  • Define the query execution process
  • Inferences
  • Ontology to graph mappings
  • Tie GQL to a graphical interface
  • Enables analysts to express queries through
    graphical means
  • Can leverage several technologies (QGraph,
    Conceptual Graphs, etc.)
  • Augment GQL to include Uncertainty, Geospatial
    and Temporal operators and data structures

49
Backups
50
Costs of Various Path Strategies
  • Search by Vertex Type
  • Plain O(EV)
  • With indexed Edges O(Vlog(EV/V) E/ED)
  • If ED ? E (i.e., one edge of type e emanates from
    each v), then the algorithm tends to operate in
    time O(Vlog(EV/V))
  • If ED ? E and EV ?V, the algorithm tends operate
    in time O(V)
  • If ED ? E and EV?gtgt V, the algorithm tends to
    operate in time O(Vlog(EV))
  • If ED gtgt E, then the algorithm tends to operate
    in time O(E/ED)
  • With indexed Properties and Edges O(log(V)
    sel(?v)Vlog(EV/V) sel(?v)E/ED)
  • If sel(?v) ? 0, the dominant factor is the search
    for vertices or O(log(V))
  • Otherwise, the execution time approaches the
    times of the previous strategy
  • Search by Edge Type
  • Plain O(E)
  • Since EVW ? EV, the execution time is at least as
    fast as that of the first algorithm
  • With indexed Properties O(log(E) sel(?e)E)
  • If sel(?e) ? 0, the algorithm tends to an
    execution time of O(log(E))
  • Otherwise, the algorithm tends to an execution
    time of O(E)
Write a Comment
User Comments (0)
About PowerShow.com