The graphbased data model: - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

The graphbased data model:

Description:

6. Using graphs to ... The Java-based Jena package from HP Labs allows users to ... SparQL is implemented in Jena through the ARQ package, and ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 52
Provided by: dgr52
Category:
Tags: data | graphbased | model

less

Transcript and Presenter's Notes

Title: The graphbased data model:


1
  • The graph-based data model
  • Storing and manipulating data in
  • distributed graphs
  • (Using RDF and Jena to put the
  • SparQL in your smile, and the
  • Twinkle in your eye
  • and D2R too)
  • Michael Grobe
  • Biomedical Applications Group
  • Research Technologies
  • University Information Technology Services
  • Indiana University

2
  • This presentation in perspective
  • This is actually one of a series of presentations
    on Linked Data Web and graph database
    technologies
  • Introduction to ontologies
  • This presentation on RDF, Jena, SparQL, etc
  • OWL and inference over ontologies
  • Using graph technologies in bioinformatics
    research
  • In general, these topics appear simple, but are
    fraught with complications, limitations, and
    qualificationsespecially when the casual user
    attempts to compare them with relational data
    approaches to the same or similar problems.
  • In addition, this is a pretty big elephant
    surrounded by a lot of blind men.
  • As a result, this presentation is a survey of
    concrete examples of basic components used to
    manipulate data using stored as graphs, or
    appearing to have been stored in graphs. It
    will use the Gene Ontology for some of these
    examples.

3
  • Table of Contents
  • Using graphs to represent data
  • Using RDF to represent graphs
  • Jena a Java class library for manipulating RDF
  • Using SparQL graph templates to query RDF
  • Using Twinkle to make SparQL queries
  • Using iSPARQL graphical graph templates to query
    RDF
  • Exposing relational data as RDF
  • Thinking of SparQL queries as SQL
  • Table of Non-contents
  • OWL and inference over ontologies
  • Using the Semantic Web in bioinformatics research

4
  • Using graphs used to represent data
  • Here are 2 graphs that represent 2 kinds of
    information associated with 4 different persons.
  • Graph 1 Person ages Graph 2
    Favorite Friends

5
  • Using graphs to represent data
  • Here the 2 graphs are combined using named edges
    to represent 2 kinds of information associated
    with the same 4 persons.
  • Graph 3 Person ages (age) and favorite friends
    (fav)
  • Read these links as Smith has age 21 or Jones
    has favorite friend Smith to make them more
    sentence-like. Each arc is like the
    predicate of a sentence, connecting a subject
    with an object. (Note that a subject may have
    gt 0 arcs of each type.)

6
  • Using graphs to represent data
  • Data is sometimes represented using so-called
    blank nodes to help cluster attributes
    together.
  • Graph 4 Blank nodes linking a name, an age,
    and a favorite friend via arcs named name,
    age, and fav, as follows
  • Blank nodes are useful for specifying lists of
    items, but are discouraged within the Semantic
    Web. Use (dereferenceable) URIs (like
    http//www.iu.edu/) whenever possible.

7
  • Using URIs and URLs to represent data
  • Now if it hadnt already happened someone could
    come up with the idea to use URLs to point to Web
    documents that describe the exact meaning of
    each edge.
  • For example, some popular magazine could publish
    their definition of favorite friend on a page
    like
  • http//CelebrityMagazine.com/fav
  • and other documents could define BFF,
    long-time-friend, family-friend, etc, And, in
    fact, these definitions could themselves refer to
    other definitions like some superset of
    relationships such as
  • http//SomeCelebrityMagazine.com/personal_rel
    ationships
  • or the personal_relationships file, itself, could
    include a collection of definitions, including
    favorite friend, or fav, that we might refer
    to as
  • http//SomeCelebrityMagazine.com/personal_rel
    ationshipsfav
  • using the convention for targeting a specific
    location within a URL.
  • Of course, for a lot of applications this would
    all be unnecessary some URI could just be used
    to indicate an edge type known to the file
    creator.

8
  • Using RDF-XML to serialize graphs
  • Graphs can be serialized or represented in a
    textual format. When graphs are serialized, each
    connection is represented by 3 components, a
    so-called RDF triple. Each triple is composed
    of a subject, predicate and object where
    each edge between each pair of entities becomes a
    named predicate.
  • Each subject is represented as
  • - a blank node, such as _2,
  • - a literal value, such as valuetype where
    type is some URI,
  • that defines a data type, as in 21age, or
  • - a URI, like http//fake.host.edu/smith
  • Each object is represented as
  • - a blank node
  • - a literal value, or
  • - a URI
  • Each predicate is represented as
  • - a URI, like http//fake.host.edu/contact-schema
    fav, or an
  • abbreviated URI like exampleage which
    represents a URI that will
  • be expanded by substituting a value for the
    stringexample. If the

9
  • Graph 3 as a set of 12 triples (3 for each
    person)
  • -------------------------------------
  • Subject Predicate Object
  • Blake examplefav Blake
  • Blake exampleage "12"
  • Blake examplename "Blake"
  • Jones examplefav Smith
  • Jones exampleage "35"
  • Jones examplename "Jones"
  • George examplefav Smith
  • George exampleage "21"
  • George examplename "George"
  • Smith examplefav Jones
  • Smith exampleage "21"

10
  • Two ways to represent the Graph 3 triples using
    RDF-XML
  • Properties encoded as XML entities
  • ltrdfRDF      xmlnsrdf"http//www.w3.org/1999/0
    2/22-rdf-syntax-ns"      xmlnsexample"http//f
    ake.host.edu/example-schema"gt      ltexamplePer
    songt          
  • ltexamplenamegtSmithlt/examplenamegt
    ltexampleagegt21lt/exampleagegt
  • ltexamplefavgtJoneslt/examplegt     lt/examplePer
    songt           lt/rdfRDFgt
  • Properties encoded as XML attributes
  • ltrdfRDF      xmlnsrdf"http//www.w3.org/1999/0
    2/22-rdf-syntax-ns"      xmlnsexample"http//f
    ake.host.edu/example-schema"gt      ltrdfDescrip
    tion  examplenameSmith           
    exampleage21
  • examplefavJones
         lt/rdfDescriptiongt           lt/rdfRDFgt

11
  • Representing URIs
  • In work with RDF you will see URIs abbreviated in
    several ways, using namespace, PREFIX and ENTITY
    definitions, depending on the context
  • xmlnslibhttp//some.host.edu/directory
  • or
  • PREFIX ltlibhttp//some.host.edu/directorygt
  • or
  • !ENTITY lib http//some.host.edu/directory
  • If the namespace abbreviations in the entities
    example above get expanded, then Smith is
    actually being represented as
  • ltrdfRDF      xmlnsrdf"http//www.w3.org/1999/0
    2/22-rdf-syntax-ns"      lthttp//fake.host.edu/
    example-schemaPersongt
  •           
  • lthttp//fake.host.edu/example-schemanamegt
  • Smith
  • lt/http//fake.host.edu/example-schemanamegt
  • lthttp//fake.host.edu/example-schemaagegt
  • 21

12
  • Graph 3 using resources to represent each
    person
  • Persons are modeled as resources by replacing
    the strings for each node identifier with URIs
  • ----------------------------------------------
    ---------------------------------
  • Subject Predicate
    Object

  • lthttp//fake.host.edu/blakegt examplefav
    lthttp//fake.host.edu/blakegt
  • lthttp//fake.host.edu/blakegt exampleage
    "12"
  • lthttp//fake.host.edu/blakegt
    examplename "Blake"

  • lthttp//fake.host.edu/jonesgt examplefav
    lthttp//fake.host.edu/smithgt
  • lthttp//fake.host.edu/jonesgt exampleage
    "35"
  • lthttp//fake.host.edu/jonesgt
    examplename "Jones"

  • lthttp//fake.host.edu/georgegt examplefav
    lthttp//fake.host.edu/smithgt
  • lthttp//fake.host.edu/georgegt exampleage
    "21"
  • lthttp//fake.host.edu/georgegt
    examplename "George"


13
  • Representing entries in Graph 3 as resources
  • Format 1
  • ltrdfRDF    xmlnsrdf"http//www.w3.org/1999/02/
    22-rdf-syntax-ns"    xmlnsexample"http//fake.
    host.edu/example-schema"gt    ltexamplePerson
    rdfabouthttp//fake.host.edu/smithgt
  •        ltexamplenamegtSmithlt/examplenamegt
           ltexampleagegt21lt/exampleagegt
  • ltexamplefav rdfresourcehttp//fake.host
    .edu/jones /gt   lt/examplePersongt          
    lt/rdfRDFgt
  • - - - - - - - - - - - - - - - - - - - - -
    - - - - - - - - - - - - -
  • Format 2
  • ltrdfRDF    xmlnsrdf"http//www.w3.org/1999/02/
    22-rdf-syntax-ns"    xmlnsexample"http//fake.
    host.edu/example-schema"gt    ltrdfDescription
     abouthttp//fake.host.edu/smith
  • examplenameSmith
    exampleage21 /gt
  • ltexamplefav rdfresourcehttp//fake.host.
    edu/jones /gt   lt/rdfDescriptiongt          
    lt/rdfRDFgt
  • Note that the resource URI references in this
    example are not real documents they are not
    dereferenceable.

14
  • A person record using FOAF (from Obitko)
  • ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-
    rdf-syntax-ns"
  • xmlnsfoaf"http//xmlns.com/foaf/
    0.1/"
  • xmlns"http//www.example.org/joe
    /contact.rdf"gt
  • ltfoafPerson rdfabout "http//www.example.org
    /joe/contact.rdfjoesmith"gt
  • ltfoafmbox rdfresource"mailtojoe.smith
    _at_example.org"/gt
  • ltfoafhomepage rdfresource"http//www.e
    xample.org/joe/"/gt
  • ltfoaffamily_namegtSmithlt/foaffamily_name
    gt
  • ltfoafgivennamegtJoelt/foafgivennamegt
  • lt/foafPersongt
  • lt/rdfRDFgt

15
  • RDF summary and implications for the Semantic Web
  • A graph may be represented as a collection of
    triples.
  • RDF-XML representations of graphs will contain
    URIs that
  • - serve to identify and/or reference syntactic
    elements (they define
  • tag names), and
  • - identify and/or name resources subjects,
    predicates and/or objects.
  • Such URIs may be imaginary or provide addresses
    of actual, dereferenceable, web documents, in
    possibly remote locations.
  • This can result in a Gigantic Global Graph,
    usually know as the Linked Data Web or the
    Semantic Web, with RDF as one of W3Cs Semantic
    Web architectural levels.
  • If HTML and the Web make all online documents
    look like one huge book, RDF, schema, and
    inference languages will make all the data in the
    world look like on huge database. TimBL
  • Editors note Here TimBL is using the term
    schema to refer to an RDF schema that defines
    RDF triples much more loosely than a relational
    database schema defines a collection of tables in
    a database.

16
  • RDF graphs may be interrogated
  • - by physical inspection (for anyone willing to
    read XML)
  • - by writing programs that read RDF files,
    construct the
  • represented graphs internally, and then
  • - access graph triples in sequential order,
  • - select triples according to specified content,
    and/or
  • - apply SparQL queries and access results in
    sequential order
  • - using command-line tools that apply SparQL
    queries
  • - using GUI interfaces accepting SparQL queries
  • - written in text, or
  • - represented graphically
  • - via URLs carrying form data, or SOAP requests
    to SparQL endpoints

17
  • How to query an RDF graph using Jena
  • The Java-based Jena package from HP Labs allows
    users to manipulate and query graphs, and
    import/export RDF, etc.
  • You can write a program that uses Jena classes to
  • - retrieve and parse an RDF file containing a
    graph or a
  • collection of graphs,
  • - store it in memory, and then
  • - examine each triple in turn, examine one
    component (say,
  • the subject) of each triple in turn, or
    examine only triples that
  • meet specified criteria.
  • For example, one might examine each stored triple
    searching for a specific reference URI, or for a
    specific literal value.
  • One might look for persons of a specific age,
    21xsdage, in the object portion of each
    triple.
  • Jena also provides support for inference using
    rule sets and for querying via SparQL.

18
  • Jena example
  • In JENA, RDF nodes can have type Resource, URI
    Resource, literal, or anonymous (slight
    extension to standard RDF).
  • A Jena model is created by a factory
  • Model m ModelFactory.createDefaultModel()
  • A Jena ontological model is a model along with a
    reasoner(sic)
  • OntModel m ModelFactory.createOntologyModel()
  • Jena can
  • - read in an RDF serialized graph (from a
    file, URL, etc.)
  • - write a serialized model to a file or STDOT,
    and
  • - perform standard operations on the model.
    For example, given the
  • populated models m and n, Jena can then do
  • Model x m.add( n ) // Union

19
  • Reading and writing a model in Jena
  • String input FIleName Some-GO-entries-diddled.r
    df
  • Model m ModelFactory.createDefaultModel()
  • InputStream in FileManager.get().open(
    inputFileName )
  • if( in null )
  • throw new IllegalArgumentException( File not
    found.\n )
  • model.read( in, ) // Treat blank lines as
    nulls.
  • model.write( System.out , N-tripleRDF/XML
    XML-ABBREV )
  • //which will yield a file of N-triple,
    RDF/XML, or XML-ABBREV records.

20
  • Cannonical process to examine each triple in a
    model
  • stmtIterator iterator model.listStatements()
    // Statements composed of triples
  • while( iterator.hasNext() )
  • Statement statement iterator.nextStatement(
    )
  • Resource subject statement.getSubject()
  • Property predicate statement.getPredicate()
  • // Get the object, which in this example, may
    be a Resource or just a string, so
  • // it is kept in an RDFNode, a superclass of
    Resource and literal.
  • RDFNode object statement.getObject() //
    superclass of Resource and literal
  • // Now process the object here it is just
    printed.
  • System.out.print( subject.toString() )
  • System.out.print( predicate.toString()
    )
  • if( object instanceof Resource )
  • // its a resource.
  • System.out.print(
    object.toString() )

21
  • Statement iterators for accessing selected
    components
  • There are several methods for creating iterators
    over a model
  • - Some simply list the components of each
    triple
  • - model.listSubjects()
  • - model.listObjects()
  • - Some compare a specific component with a
    specified value, as in
  • model.listSubjectsWithProperty( Prop p, RDFNode
    o)
  • (which will get you a
    collection of subjects possessing
  • property/predicate p and specific value o)
  • - Some compare all components against specific
    values in 2 steps
  • - define a selector possessing specific values
    s, p and o,
  • where null or (RDFNode) null matches
    anything
  • Selector selector new SimpleSelector(
    subject,

  • predicate, object )

22
  • SparQL a graph-based query language
  • Sparql is a language that lets users query RDF
    graphs . . . using graph patterns (written in N3)
    containing variables.
  • The query engine will return an exhaustive list
    of triples that satisfy each query through value
    substitution. (aka query by example, QBE).
  • This process is not always intuitive, and/or SQL
    has perverted the minds of a generation of
    programmers (J. Random Guy somewhere on the
    Web).
  • SparQL is implemented in Jena through the ARQ
    package, and queries may be made from within Java
    scripts (McCarthy, 2005), or via a SparQL client
    distributed with Jena. The process to make a
    query is
  • - build a query in a .rq file, and
  • - execute the query using
  • sparql query filename.rq
  • or
  • sparql.bat query filename.rq
  • SparQL does not do inference (except when used
    within Jena against an ontological model).

23
  • A SparQL example
  • This SparQL example query simply asks for a list
    of the first 10 triples in the file specified in
    the FROM clause
  • PREFIX
  • rdf lthttp//www.w3.org/1999/02/22-rdf-syntax
    -nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select s o
  • from lthttp//kongo.uits.iupui.edu8546/rdf-example
    -1.rdfgt
  • where
  • s p o .
  • LIMIT 10
  • s, p, and o are variable names that will each
    be assigned a value as the query is satisified.
    Variable names may also start with ?.

24
  • SparQL a graph-based query language
  • The basic, partial syntax of a SparQL query is
    based on N3(/turtle) and similar to
  • BASE ltsome URI from which relative FROM and
    PREFIX entries will be offsetgt
  • PREFIX prefix_abbreviation lt some_URI gt
  • SELECT some_variable_list
  • FROM ltsome_RDF_source gt
  • WHERE
  • some_triple_pattern . .
  • Notes
  • - the lt and gt characters are required
    literals,
  • - the BASE and PREFIX entries are optional and
    BASE applies to relative

25
  • Querying Graph 3 format 1 using SparQL
  • Heres a reminder of one of the representations
    used to store of Graph 3 here stored in a file
    named rdf-example-1.rdf
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltrdfRDF
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synt
    ax-ns"
  • xmlnsexample"http//fake.host.edu/example-sche
    ma"
  • gt
  • ltexamplePerson rdfabout"http//fake.host.edu/
    smith"gt
  • ltexamplenamegtSmithlt/examplenamegt
  • ltexampleagegt21lt/exampleagegt
  • ltexamplefav rdfresource"http//fake.host.edu
    /jones" /gt
  • lt/examplePersongt
  • lt/rdfRDFgt

26
  • A SparQL query against the first data
    representation
  • C\Jena-2.5.7\Jena-2.5.7\batgt cat
    query-example-1.rq
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
    tax-nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select
  • from lthttp//kongo.uits.iupui.edu8546/smiht-forma
    t-1.rdfgt
  • where
  • s p o .
  • C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
    query-example-1.rq
  • --------------------------------------------------
    ----------------------------
  • s p o

  • lthttp//fake.host.edu/smithgt examplefav
    lthttp//fake.host.edu/jonesgt
  • lthttp//fake.host.edu/smithgt exampleage
    "21"

27
  • Querying Graph 3 format 2 using Sparql
  • Heres a reminder of the other representation of
    Graph 3 stored in a file named
    rdf-example-2.rdf
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltrdfRDF
  • xmlnsrdf"http//www.w3.org/1999/02/22-rdf-synt
    ax-ns"
  • xmlnsexample"http//fake.host.edu/example-sche
    ma"
  • gt
  • ltexamplePerson rdfabout"http//fake.host.edu/
    smith"
  • examplenameSmith
  • exampleage21 /gt
  • ltexamplefav rdfresource"http//fake.host.ed
    u/jones" /gt
  • lt/examplePersongt
  • lt/rdfRDFgt

28
  • The same SparQL query against the second data
    representation
  • C\Jena-2.5.7\Jena-2.5.7\batgt cat
    query-example-2.rq
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
    tax-nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select
  • from lthttp//kongo.uits.iupui.edu8546/smith-forma
    t-2.rdfgt
  • where
  • s p o .
  • C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
    query-example-2.rq
  • --------------------------------------------------
    ----------------------------
  • s p
    o

  • lthttp//fake.host.edu/smithgt examplefav
    lthttp//fake.host.edu/jonesgt
  • lthttp//fake.host.edu/smithgt exampleage
    "21"

29
  • A distributed SparQL query against 4 separate
    RDF files
  • The next query searches 4 dereferenceable files
    holding live data in the first representation
    format above
  • C\Jena-2.5.7\Jena-2.5.7\batgt cat
    query-example-all.rq
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-syn
    tax-nsgt
  • PREFIX example lthttp//fake.host.edu/example-sch
    emagt
  • select
  • from lthttp//kongo.uits.iupui.edu8546/smithgt
  • from lthttp//kongo.uits.iupui.edu8546/jonesgt
  • from lthttp//kongo.uits.iupui.edu8546/georgegt
  • from lthttp//kongo.uits.iupui.edu8546/blakegt
  • where
  • s p o .

30
  • Results of the distributed SparQL query
  • C\Jena-2.5.7\Jena-2.5.7\batgt sparql.bat --query
    query-example-all.rq
  • --------------------------------------------------
    -----------------------------------------
  • s p
    o

  • lthttp//kongo.uits.iupui.edu/blakegt
    examplefav lthttp//kongo.uits.iupui.edu/blakegt
  • lthttp//kongo.uits.iupui.edu/blakegt
    exampleage "12"
  • lthttp//kongo.uits.iupui.edu/blakegt
    examplename "Blake"
  • lthttp//kongo.uits.iupui.edu/blakegt rdftype
    examplePerson
  • lthttp//kongo.uits.iupui.edu/jonesgt
    examplefav lthttp//kongo.uits.iupui.edu/smithgt
  • lthttp//kongo.uits.iupui.edu/jonesgt
    exampleage "35"
  • lthttp//kongo.uits.iupui.edu/jonesgt
    examplename "Jones"
  • lthttp//kongo.uits.iupui.edu/jonesgt rdftype
    examplePerson
  • lthttp//kongo.uits.iupui.edu/georgegt
    examplefav lthttp//kongo.uits.iupui.edu/smithgt
  • lthttp//kongo.uits.iupui.edu/georgegt
    exampleage "21"
  • lthttp//kongo.uits.iupui.edu/georgegt
    examplename "George"
  • lthttp//kongo.uits.iupui.edu/georgegt rdftype
    examplePerson

31

The magic of ontologies There are many
defintions of ontology, but in very general
terms, an ontology may be thought of as a
taxonomy of objects (or concepts) based on a
particular relationship between pairs of those
objects (or concepts). A common example of a
taxonomy is an evolutionary tree in which
individual species are related on the basis of
evolutionary descent. That is, one species of
each pair connected by an edge descended from the
other. (Actually, its the members of the
species who evolve, but . . .) Within such
structures no member is considered to have
descended from more than one immediate species.
Within an ontology, however, an object or
concept may have more than one immediate
parent, and no circular sub-graphs are allowed,
so the resulting structure is a Directed Acyclic
Graph (DAG). An ontology can be represented by
a special RDF graph It is special in that
the predicates convey transitivity if A is a
descendant of B, and B is a descendant of C, then
A is a descendant of C. IS_A and PART_OF
relationships are commonly used to build
ontologies.
32

Here is a portion of the GO is_a DAG
(Ashburner, 2004) for molecular function
(example chromatin binding is_a DNA
binding) (Note that this diagram
shows some genes, but the Gene Ontology is
actually a taxonomy of terms that can be used to
describe or annotate genes, rather than a
taxonomy of genes. )
33
  • Heres the first entry (of the 26K) in the GO
    text version (with all three parts intermixed)
  • Term
  • id GO0000001
  • name mitochondrion inheritance
  • namespace biological_process
  • def "The distribution of mitochondria, including
    the mitochondrial genome, into daughter cells
    after mitosis or meiosis, mediated by
    interactions between mitochondria and the
    cytoskeleton." GOCmcc, PMID10873824,
    PMID11389764
  • synonym "mitochondrial inheritance" EXACT
  • is_a GO0048308 ! organelle inheritance
  • is_a GO0048311 ! mitochondrion distribution
  • You can also get the GO as RDF XML, or as a MySQL
    database. A portion of the molecular function
    extract on the previous page is shown in RDF XML
    on next page

34
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
    df-syntax-ns" xmlnsgo"http//www.geneontology.o
    rg/dtds/go.dtd"gt
  • ltgoterm rdfabout"http//www.geneontology.org/
    goall"gt (Note all is like root.)
  • ltgoaccessiongtalllt/goaccessiongt
  • ltgonamegtalllt/gonamegt
  • ltgodefinitiongtThis term is the most general
    term possiblelt/godefinitiongt
  • lt/gotermgt
  • ltgoterm rdfabout"http//www.geneontology.org/go
    GO0003674"gt
  • ltgoaccessiongtGO0003674lt/goaccessiongt
  • ltgonamegtmolecular_functionlt/gonamegt
  • ltgosynonymgtGO0005554lt/gosynonymgt
  • ltgosynonymgtmolecular functionlt/gosynonymgt
  • ltgodefinitiongtElemental activities, such as
    catalysis or binding, describing the actions of a
    gene product at the molecular level. A given gene
    product may exhibit one or more molecular
    functions.lt/godefinitiongt
  • ltgois_a rdfresource"http//www.geneontology
    .org/goall" /gt
  • lt/gotermgt
  • lt/rdfRDFgt

35
  • Find parents of GO0004003 in the example GO
    subset
  • PREFIX xsd lthttp//www.w3.org/2001/XMLSchemagt
  • PREFIX rdf lthttp//www.w3.org/1999/02/22-rdf-synt
    ax-nsgt
  • PREFIX go lthttp//www.geneontology.org/dtds/go.dt
    dgt
  • select
  • from lthttp//discern.uits.iu.edu8421/Some-GO-entr
    ies-
  • diddled.rdfgt
  • where
  • lthttp//www.geneontology.org/goGO0004003gt
    gois_a parent .
  • Result
  • C\Jena-2.5.7\batgt sparql.bat --query
    GO-paths-from-4003.rq
  • -----------------------------------------------
  • parent

36
  • Find all 3-element paths up from GO0004003
  • PREFIX go lthttp//www.geneontology.org/dtds/go.d
    tdgt
  • select
  • from
  • lthttp//discern.uits.iu.edu8421/Some-GO-entries
    -diddled.rdfgt
  • where
  • lthttp//www.geneontology.org/goGO0004003gt
    gois_a a .
  • a gois_a b .
  • b gois_a c .
  • Note that given a table showing the GO DAG, you
    can get this result within SQL using multiple
    joins, but you cant find N-element paths in
    either language (unless you use inference within
    SparQL).

37
  • Find all 3-element paths up from GO0004003 using
    Twinkle

38
  • Query dbpedia for entries about Goethe
  • (using Virtuoso iSparql text query)
  • Note that the predicate bifcontains is a
    Virtuoso Built-In Function that searches
    back-end text indexes. It might be possible to
    search using a standard SparQL regex FILTER, but
    it would be much slower.

39
  • The same query using the iSparql graphical QBE
    interface
  • Here is the same query in graphical form as
    constructed using the iSparql QBE interface
  • Components can be dragged-and-dropped from the
    menu at the top of the window. The whole
    interactive window is shown on the next page.

40
  • The same query within the whole iSparql QBE window

41
  • Results from the iSparql text and/or QBE queries

42
  • Possible applications for ontologies
  • Suppose uniprot.org provides a list of 89K
    proteins, their mappings to NCBI Gene IDs, and
    their GO annotations (which it does), and perhaps
    a small subset looks like
  • XXX GO00003682
  • YYY GO00003682
  • ZZZ GO00008026
  • AAA GO00008096
  • And suppose go.org links GO IDs with GO category
    names, which it does,
  • And suppose I have a list of researchers and
    their various areas of interest, like
  • Smith studies gene XXX
  • Jones studies nucleic acid binding
  • etc.
  • Then . . . what kinds of questions can I ask that
    would have been difficult before, like

43
  • Optional clauses in SparQL queries
  • SparQL has more features than presented so far.
  • Here are some clauses permitted following the
    where clause
  • order by DESCASC ( variable_list )
  • limit n print up to n return values.
  • offset n start output with the nth return value.

43
44
  • Optional clauses in SparQL queries
  • Permitted within where clauses
  • FILTER restricts variable matches in the
    preceding triple to specified filter patterns, as
    in
  • s p date FILTER ( date gt
    "2005-01-01T000000Z"xsddateTime )
  • or
  • s p d FILTER
  • ( xsddateTime( d ) lt
    xsddateTime( "2005-01-01T000000Z ) )
  • or
  • ?s ?p ?name FILTER regex( ?name,
    "smi", some_flag )
  • UNION where clauses may be constructed as
  • triple_pattern_1 UNION
    triple_pattern_2

45
  • A relational view of the Semantic Web (Newman,
    2007)
  • Relaxing certain requirements normally imposed
    upon SQL (specifically type contraints on joined
    fields), there are strong similarities among
    operations applied to relational and graph-based
    models. For example
  • - triple_pattern . triple_pattern
  • approximates an untyped join, as demonstrated
    on the next slide
  • - filter
  • approximates an SQL conditional
  • - union
  • approximates an outer union
  • - optional
  • approximates a left outer join( R, S ), which
  • ? join( R, S ) unioned with an anti-join( R, S),
    where an anti-join
  • ? difference with a semi-join, and a semi-join
  • ? join and a projection.

46
  • A relational view of the Semantic Web (Newman,
    2007)
  • Here we look at the triple pattern used to find
    the 3 hop paths towards the GO root node,
  • select a, b, c where
  • lthttp//www.geneontology.org/goGO000400
    3gt
  • gois_a a .
  • a gois_a b .
  • b gois_a c .
  • Which is roughly equivalent to the following SQL
    query
  • select
  • a.parent_id, b.parent_id, c.parent_id
  • from
  • GO.molecular_function_DAG a
  • join
  • GO.molecular_function_DAG b

47
  • Publishing relational data as virtual RDF
    stores
  • So far we have accessed RDF presented mostly from
    free-standing files. However, legacy relational
    databases can be published as RDF stores on the
    Semantic Web by using gateways like D2R and
    Virtuoso (commercial).
  • The D2R approach requires 2 steps
  • - interrogate the database via JDBC using
    generate-mapping to build a configuration
    (mapping) file from the relational table
    definitions, and then
  • - start the D2R server with the mapping file.
  • Notes
  • - Each table row becomes a separate
    resource/graph.
  • - Primary keys (if any) become resource
    identifiers, and
  • - rows in linked tables identified by foreign
    keys may be
  • merged into the entity (?).
  • The D2R utility dump-rdf can also convert an
    entire table into RDF form for access in a single
    SparQL query.

48
  • Accessing data via a SparQL Endpoint
  • Since the D2R server makes a SparQL endpoint
    available, one can execute queries via HTTP
    requests like
  • http//kongo.uits.iupui.edu6700/sparql?query
  • select ?s ?p ?o where ?s ?p ?o .
    limit 10
  • The D2R server also provides a Web form that can
    be used to interrogate its content using SparQL.
    This interface is based on an AJAX component
    called SNORQL, and available at
  • http//kongo.uits.iupui.edu6700/sparql
  • The D2R server also provides an interface for
    users to browse its backend data. To use it you
    just Web in to
  • http//kongo.uits.iupui.edu6700

49
  • Portion of a D2R-server mapping file for CLSD
  • _at_prefix map ltfile/C/d2r-server-0.4/mapping-clsd
    2-GO-DGN.n3gt .
  • _at_prefix d2rq lthttp//www.wiwiss.fu-berlin.de/suhl
    /bizer/D2RQ/0.1gt .
  • mapdatabase a d2rqDatabase
  • d2rqjdbcDriver "com.ibm.db2.jcc.DB2Driver"
  • d2rqjdbcDSN "jdbcdb2//libra45.uits.iu.edu5000
    0/clsd2"
  • d2rqusername account"
  • d2rqpassword password"
  • .
  • Table DISEASE_GENE_NET.GENES
  • mapDISEASE_GENE_NET_GENES a d2rqClassMap
  • d2rqdataStorage mapdatabase
  • d2rquriPattern "DISEASE_GENE_NET.GENES/_at__at_DISEASE
    _GENE_NET.GENES.GENE_ID_at__at_"
  • d2rqclass vocabDISEASE_GENE_NET_GENES
  • .
  • mapDISEASE_GENE_NET_GENES__label a
    d2rqPropertyBridge
  • d2rqbelongsToClassMap mapDISEASE_GENE_NET_GENES

50
  • Triple stores
  • There exist so-called triple stores that can
    use backend data storage engines, like MySQL, to
    house RDF data, and process queries.
  • For example, Sesame is a triple store that can
    use serveral different kinds of backends DBMS
    (originally PostgreSQL), simple RDF files, and/or
    other, network-accessed triple stores, like
    Sesame itself.
  • Sesame also demonstrates a generic architecture
    for RDF and RDFS storage and query processing,
    and does not require keeping the whole graph in
    memory, when processing requests.
  • Jena can also employ back-end data base
    management systems.
  • There are also some graph based data management
    systems, like Neo4j, that can be used to store
    raw graph structured data. In fact, Neo4j has at
    least one overlay product that uses Neo4j to
    manage RDF.
  • Neo4j may work well for data collections running
    into the billions of nodes, since it does not
    require its whole graph to be memory-contained
    (although it works better with larger memory),
    and is quite fast. .

51
  • References
  • Ashburner, M., et al., Gene ontology a tool for
    the unification of biology, Nature Genetics,
    2000.
  • Berners-Lee, Tim, Linked Data, 2006.
    http//www.w3.org/DesignIssues/LinkedData.html
  • Bizer, Chris, The D2RQ Plattform - Treating
    Non-RDF Databases as Virtual RDF Graphs,
    http//www4.wiwiss.fu-berlin.de/bizer/d2rq/
  • Bizer, Chris, Richard Cyganiak, Tom Heath, How
    to Publish Linked Data on the Web, 2007.
  • http//www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDa
    taTutorial/
  • Cygniak, Richard, A Relational Algebra for
    SPARQL, HP Labs, 2005.
  • http//www.hpl.hp.com/techreports/2005/HPL-2005-17
    0.pdf
  • Davis, Ian, An Introduction to RDF,
    http//research.talis.com/2005/rdf-intro/
  • Dodds, Leigh, Introducing SparQL Querying the
    Semantic Web, 2005. http//www.xml.com/lpt/a/1628
  • McBride, Brian, An Introduction to RDF and the
    Jena RDF API , 2007. http//jena.sourceforge.net/
    tutorial/RDF_API/index.html
Write a Comment
User Comments (0)
About PowerShow.com