Graph Databases: Efficient storage ? and Rapid retrieval ? - PowerPoint PPT Presentation

About This Presentation
Title:

Graph Databases: Efficient storage ? and Rapid retrieval ?

Description:

Method III: Full Partial Order Hierarchy. Method IV: Multi-Level Hierarchical Retrieval. Method V: Remember Node Bindings. Method VI: UDS: The Universal Data ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 25
Provided by: Gues239
Category:

less

Transcript and Presenter's Notes

Title: Graph Databases: Efficient storage ? and Rapid retrieval ?


1
Graph Databases Efficient storage ? and Rapid
retrieval ?
  • Robert Levinson
  • Machine Intelligence Laboratory
  • University of California
  • Santa Cruz

2
THE CG MARS LANDER
  • High level architecture

English-CG-English Translation
English Discourse
English Queries
English Translator, Source reference, GUI
CG Creator/Translator with Type Hierarchy
CG Parser Processor
Query Processor Matcher
Answer more specific CGs in DB
ADB Processor
Santa Cruz The CG Mars Lander
ADB
3
THE CG MARS LANDER
document
English
queries
CGs
TH
replies
4
SUBGRAPH-ISOMORPHISM
  • NP-COMPLETE ?
  • 2 Main Methods
  • A. Backtracking Search
  • B. Refinement O(n2) on avg. ?
  • (both exploit candidate binding lists, modulo
    type hierarchy)
  • Key Idea Amortize Cost Over
  • Millions of Operations
  • Mega-graph storage

5
Exploit Symmetry !! ?
  • Invariant with respect to transformation.
  • Shared information between objects
  • or systems or their representations.
  • ABAC A(BC). ?

6
Symmetry Synonyms
  • similarity
  • commonality
  • structure
  • mutual information
  • relationship
  • redundancy

7
Total Information Diversity Symmetry
  • Diversity corresponds to Comp Sci Complexity
    resources required.
  • Diversity can often only be resolved with
    Combinatorial Search ???

8
Conceptual Graph Processing
  • Concept Types a cat is an animal
  • Relation Types or Graph Type
    mother-of Is parent-of
  • Transitivity of Projection (subgraph-isomorphism
  • Redundant Substructures
  • Redundant Literals
  • Redundant Pointers

9
6 Retrieval Methods
  • Method I Flat Ordering
  • Method II 2-Levels Indexes, Graphs
  • Method III Full Partial Order Hierarchy
  • Method IV Multi-Level Hierarchical Retrieval
  • Method V Remember Node Bindings
  • Method VI UDS The Universal Data Structure ?

10
THE CG MARS LANDER
  • Exploit Tuple-Based Linear CGs ! ?
  • (a conceptual graph syntax
  • that supports rapid retirieval and
    question-answering).

11
_at_CG000
  • AGNT (government, BE) .
  • _at_CG001
  • AGNT (Hungarian_American_Enterprise_
    Fund, invest),
  • OBJ (invest, Dollars 1000000
    ),
  • IN (Dollars 1000000,
    first_business)
  • .
  • _at_CG002
  • AGNT (_at_CG000, manage),
  • OBJ (manage, _at_CG001) .

12
THE CG MARS LANDER
  • A query
  • / Q2 Does anybody own the rag newspaper
  • New York Post ? /
  • Query_at_bob_202
  • ISA ( New_York_Post , newspaper n34861
    ) ,
  • CHRC ( newspaper n34861 , rag n9
    ) ,
  • AGNT ( own v9125 , ????? ) ,
  • .

13
THE CG MARS LANDER
  • Answer ?? ?
  • / A2 Rupert Murdoch once owned the troubled
    tabloid newspaper
  • New York Post. /
  • _at_CG1684_3
  • ISA ( New_York_Post , newspaper n34861
    ) ,
  • CHRC ( newspaper n34861 , tabloid
    n27111 ) ,
  • CHRC ( newspaper n34861 , trouble
    n25320 ) ,
  • AGNT ( own v9125 , Rupert Murdoch) ,
  • CHRC ( own v9125 , once )
  • .

14
THE CG MARS LANDER
  • Capabilities timings
  • Inputs
  • CGs (tens of thousands)
  • pre-processed parts of speech
  • Type Hierarchy (150,000 WORDNET augmented English
    words)
  • natural language queries
  • Outputs
  • CG (save restore) DB
  • replies to queries
  • specializations and maximal specializations

15
THE CG MARS LANDER
  • Capabilities timings
  • benchmark machine
  • Sun Ultra Enterprise 4000 (with 4 UltraSPARC
    167Mhz and 512KB External Cache CPU and 256MB of
    main memory)
  • Read, process, and store an 18,000 CG input file
    in 1 hour and 46 minutes. ?
  • Reloading of above DB takes on the order of
    seconds. ?
  • A 150,000 word ontology is processed in 16
    seconds. ?
  • Each query is handled in at most 5.5 seconds.?
  • For smaller database (hundreds of CGs only), the
    time to handle a single query can be as low as
    0.2 seconds. ?

16
THE CG MARS LANDER
  • Cost/benefit analysis
  • assume N CGs and Q queries
  • Method I Cost
  • Method III Cost
  • N insertions
  • Q queries

N
?
Q
N
2
N
?
log

10
2
2
Q
?
log
N
10
17
Cost/ benefit table

18
THE CG MARS LANDER
  • 6 UDS DESIGN PRINCIPLES
  • 1. Every primitive data object, label or symbol
    should be stored only once with pointers used to
    denote the actual uses of the object.
  • 2. Every compound object should be stored with
    the minimum information required to represent the
    combination of its parts.

19
THE CG MARS LANDER
  • 3. Given no loss of accuracy, objects should be
    processed at the highest level of abstraction
    possible.
  • 4. If one were to implement a conceptual graph
    based on the diagrammatic representation, the
    costs associated with storage and matching would
    be much higher than they need to be.

20
THE CG MARS LANDER
  • 5. The same abstraction mechanism that goes from
    labels to graphs can be taken one step further to
    facilitate the storage and retrieval of nested
    context graphs.
  • 6. A graph is itself the best descriptor of its
    nodes.

21
CONCLUDING THOUGHTS
  • The key to efficient implementation of CGs is the
    exploitation of symmetry or structure. ?
  • CG operations can be executed efficiently in
    real-time applications. ?
  • At the implementation or machine level knowledge
    representation formalisms sre often nearly the
    same. ?

22
THE CG MARS LANDER
  • References
  • 1 C. Colin and R. Levinson, Partial order
    maintenance,'' Special Interest Group on
    Information Retrieval Forum, vol. 23, no. 3,4,
    pp. 34-59, 1988.
  • 2 G. Ellis, R. A. Levinson, and P. Robinson,
    Managing complex objects in PEIRCE,'' Special
    Issue on Object-Oriented Approaches in Artificial
    Intelligence and Human-Computer Interaction
    (IJMMS), vol. 41, pp. 109-148, 1994.
  • 3 R. Hughey, R. Levinson, and J. D. Roberts,
    eds., Issues in Parallel Hardware for Graph
    Retrieval, 1993.

23
More references
  • 4R. Levinson, A self-organizing retrieval
    system for graphs,'' in AAAI-84, pp. 203-206,
    Morgan Kaufman, 1984.
  • 5 R. Levinson, Pattern associativity and the
    retrieval of semantic networks,'' Computers and
    Mathematics with Applications, vol. 23, no. 6-9,
    pp. 573-600, 1992. Part 2 of Special Issue on
    Semantic Networks in Artificial Intelligence,
    Fritz Lehmann, editor. Also reprinted on pages
    573-600 of the book, Semantic Networks in
    Artificial Intelligence, Fritz Lehmann, editor,
    Pergammon Press, 1992.

24
THE CG MARS LANDER
  • References
  • 6 R. Levinson and G. Ellis, Multilevel
    hierarchical retrieval,'' Knowledge-Based
    Systems, vol. 5, pp. 233-244, September 1992.
    Special Issue on Conceptual Graphs.
  • 7 R. Levinson and G. Fuchs, A pattern-weight
    formulation of search knowledge,'' Tech. Rep.
    UCSC-CRL-91-15, University of California Santa
    Cruz, 2001. Revision to appear in Computational
    Intelligence.
  • 8 R. A. Levinson, UDS A universal data
    structure,'' in Proc. 2nd International
    Conference on Conceptual Structures, (College
    Park, Maryland USA), pp. 230-250, 1991.
Write a Comment
User Comments (0)
About PowerShow.com