Title: An Introduction to RDF and the Semantic Web
1An Introduction to RDF and the Semantic Web
2Resource Description Framework
- RDF
- Least Understood standard to come from the W3C
- May be the most powerful
- In order that the web achieve its potential
- May be the most important
- In order that the web achieve its potential
3Resource Description Framework
- Why RDF?
- With HTML and XML we can swap our documents
easily - No meaning is attached to them - they are just
data - RDF addresses the problem of meaning in the data
on the web
4What We Need To Know
- When we exchange data we need to know things
like, - Who wrote the data
- When was the data written
- When was the data last updated
- These pieces of data are not data per se but the
data about the data or meta data
5XML
- Promised to deliver us from the unstructured data
that makes up the Internet - XML brings structure to the data
- Because HTML combined the appearance of the
document with the content of the document it, the
content was extremely hard to extract - XML separated content from presentation
6XML
- XML specifically dealt with the data of the
content
ltmusic genre classicalgt lttitlegtEine Kleine
Nacht Muziklt/titlegt ltcomposergtMozartlt/composergt ltk
eygtE Flatlt/keygt lttempogt2/4lt/tempogt lt/musicgt
7XML
- We could convey some of the same information with
different data
ltdocument type classical musicgt ltnamegtEine
Kleine Nacht Muziklt/namegt ltauthorgtMozartlt/authorgt
lt/documentgt
8XML
- What if we wanted to find all pieces of music
composed by Mozart? - We would have to find all documents where the
ltcomposergt element had a value of Mozart. - We would also have to find all documents where
the ltauthorgt element had a value of Mozart.
9XML
- If there was another element used to denote the
creator of the music then that term would have to
be searched for also - In order to be able to find all compositions
written by Mozart without having to identify all
elements designating the creator of the music
then the same term would have to be used to
identify the creator
10XML
- This problem could also be solved by indicating
that when the term composer is used, it means the
same when another document says written by, and
another says created by - This would be quite an undertaking though as it
involves identifying all words and phrases in all
languages having this meaning
11Missing
- Our ability to know that one or more terms mean
the same thing is the thing that is missing from
the Internet - If we can build this layer into the Internet, it
will take the information to a fundamentally
different level
12Dublin Core
- 1995
- Conference in Dublin, Ohio
- Discussed issues of semantics
- Agreed to a core set of themes common to all
documents - Set of properties became known as the Dublin Core
(DC) initiative
13Dublin Core
- 3 Core Properties
- DC.Title
- DC.Creator
- DC.Subject
- 15 core properties were defined in the Dublin
core (originally)
14Dublin Core
- The Dublin Core can be applied to XML
ltmusic genre classicalgt lttitlegtEine Kleine
Nacht Muziklt/titlegt ltCreatorgtMozartlt/Creatorgt ltkey
gtE Flatlt/keygt lttempogt2/4lt/tempogt lt/musicgt
ltdocument type classical musicgt ltnamegtEine
Kleine Nacht Muziklt/namegt ltCreatorgtMozartlt/Creator
gt lt/documentgt
15Dublin Core
- Even though we now have used the same element to
identify the entity responsible for creating the
we dont know if the meaning of Creator is the
same in both of these instances - The only way to be sure is to use a very precise
mechanism to identify the element being used
16Dublin Core
- The Dublin Core can be applied to XML
ltmusic genre classicalgt lttitlegtEine Kleine
Nacht Muziklt/titlegt ltdc.Creator
xmlnsdchttp//purl.org/dc/elements/1.1/gtMozart
lt/dc.Creatorgt ltkeygtE Flatlt/keygt lttempogt2/4lt/tempogt
lt/musicgt
ltdocument type classical musicgt ltnamegtEine
Kleine Nacht Muziklt/namegt ltdc.Creator
xmlnsdchttp//purl.org/dc/elements/1.1/gtMozart
lt/dc.Creatorgt lt/documentgt
- Now we can see that these elements refer to
exactly the same concept
17CD Database
- Suppose you keep a small database of CDs on your
computer - There is a table in the database as below
Primary Key Album Name Artist
1 The Ecleftic Two Sides II a Book Wyclef Jean
2 Eine Kleine Nacht Muzik Mozart
3 Soultrane John Coltrane
4 The Real Eminem
18Another CD Database
- There is a second database kept by another person
who has a CD collection - A table in the database is shown below
Key Title Performer
1 Eine Kleine Nacht Muzik Mozart
2 The Ecleftic Wyclef Jean
3 Kind of Blue Miles Davis
19Comparing Databases
- Exchanging Information
- If we wanted to share information there would be
a problem since the tuple names are different - The same solution we used in the XML can be used
in the database - the unique identifier
20Another CD Database
- There is a second database kept by another person
who has a CD collection - A table in the database is shown below
Primary Key http//purl.org/dc/elements/1.1/Title http//purl.org/dc/elements/1.1/Creator
1 The Ecleftic Two Sides II a Book Wyclef Jean
2 Eine Kleine Nacht Muzik Mozart
3 Soultrane John Coltrane
4 The Real Eminem
21Another CD Database
- There is a second database kept by another person
who has a CD collection - A table in the database is shown below
Key http//purl.org/dc/elements/1.1/Title http//purl.org/dc/elements/1.1/Creator
1 Eine Kleine Nacht Muzik Wolfgang Amadeus Mozart
2 The Ecleftic Wyclef Jean
3 Kind of Blue Miles Davis
22URIs
- Uniform Resource Identifiers (URIs) give us a
way to insure that the meaning of the column of
data between databases is the same so long as the
column is labeled with the same URI
23Other Problems
- Unfortunately when we look at the databases we
notice some other problems
Primary Key http//purl.org/dc/elements/1.1/Title http//purl.org/dc/elements/1.1/Creator
1 The Ecleftic Two Sides II a Book Wyclef Jean
2 Eine Kleine Nacht Muzik Mozart
3 Soultrane John Coltrane
4 The Real Eminem
Key http//purl.org/dc/elements/1.1/Title http//purl.org/dc/elements/1.1/Creator
1 Eine Kleine Nacht Muzik Wolfgang Amadeus Mozart
2 The Ecleftic Wyclef Jean
3 Kind of Blue Miles Davis
24Other Problems
- Problem 1
- Albums which may be the same have different names
- Problem 2
- Different names are used to denote the same
composers
25Taxonomies
- These problems can be solved through the use of
taxonomy - A taxonomy is a -
- Controlled vocabulary of words
- Usually about a constrained topic
- Unique identifiers are key to developing
taxonomies
26Taxonomies
- If we were to devise a controlled classification
list so we could tell which CDs were which genre
then we would avoid problems like having one CD
labeled as classical and another CD labeled as
classic
27Taxonomies
- CD Taxonomy
- Jazz
- Classical
- Soul
- Pop
- Hip Hop
- Folk
28Taxonomies
- We are not limited to taxonomies of of music
- We could have type of performance, i.e., play,
movie, live performance, etc.
29Moving the Problem
- We really didnt solve the problem we described
earlier - We only moved the problem up a level
- We now have the problem with having more than one
taxonomy for the same thing
30Moving the Problem
- Consider
- http//taxonomies.org/Plays/PorgyAndBess
- http//taxonomies.org/Albums/PorgyAndBess
- We do not know whether the PorgyAndBess in the
first reference is the same as the PorgyAndBess
in the same reference
31We Need An Authority Figure
- Let us imagine that there is some authority that
keeps track of al CDs that are released - This is similar to books and their ISBN numbers
which are unique - We will call the fictitious authority
MuzicBiz.org - MuzicBiz.org maintains a central database of CDs
that have been released
32Tables Now ...
Key http//purl.org/dc/elements/1.1/Title http//purl.org/dc/elements/1.1/Creator
http//MuzicBiz.org/Album/1011234 Eine Kleine Nacht Muzik Wolfgang Amadeus Mozart
http//MuzicBiz.org/Album/7655432 The Ecleftic Wyclef Jean
http//MuzicBiz.org/Album/8997654 Kind of Blue Miles Davis
Key http//ebiz.org/Stock http//ebiz.org/Cost
http//MuzicBiz.org/Album/1011234 5 16.00
http//MuzicBiz.org/Album/7655432 4 19.00
http//MuzicBiz.org/Album/8997654 10 12.00
33Unique Identifiers
- Since we are guaranteed that these identifiers
ALWAYS refer to the same CD any table row having
a specific key will ALWAYS refer to the same CD -
there is NO reason to doubt this - Data validity is enforced
34Meta-Data
- Meta-Data
- Data that describes data
- Creator, Type, Date are all kinds of meta-data
- So far the meta-data we have described consists
of two values - an attribute name and an
attribute value
35Meta-Data
- To be precise we need to add one more piece of
meta-data to complete any meta-data we might have - Since it is entirely possible to have as Creator,
the value Mozart, we need to identify what/where
Mozart is the creator of - the so-called DOCUMENT
36Triples
- The combination of Source, Attribute name, and
Value makes what is called in the RDF-biz a
TRIPLE and that constitutes a fundamental element
in RDF
37Transporting Triples
- We will assume the following -
- Meta-data can be expressed as a set of triples
- Key to sharing meta-data is the URI
- Now given that we accept this representation, the
next challenge is to decide how we will share
this information (transport)
38Sharing Meta-Data and Data
Key http//ebiz.org/Stock http//ebiz.org/Cost
http//MuzicBiz.org/Album/1011234 5 16.00
http//MuzicBiz.org/Album/7655432 4 19.00
http//MuzicBiz.org/Album/8997654 10 12.00
- The database contains the information as
organized in the table above - We need to transform this data into the accepted
form, i.e., triples
39Sharing Data and Meta-Data
Document Name Value
http//MuzicBiz.org/Album/1011234 http//ebiz.org/Stock 5
http//MuzicBiz.org/Album/1011234 http//ebiz.org/Cost 16.00
http//MuzicBiz.org/Album/7655432 http//ebiz.org/Stock 4
http//MuzicBiz.org/Album/7655432 http//ebiz.org/Cost 19.00
http//MuzicBiz.org/Album/8997654 http//ebiz.org/Stock 10
http//MuzicBiz.org/Album/8997654 http//ebiz.org/Cost 12.00
40Sharing Data and Meta-Data
Document Name Value
http//MuzicBiz.org/Album/1011234 http//ebiz.org/Stock 5
http//MuzicBiz.org/Album/1011234 http//ebiz.org/Cost 16.00
http//MuzicBiz.org/Album/7655432 http//ebiz.org/Stock 4
http//MuzicBiz.org/Album/7655432 http//ebiz.org/Cost 19.00
http//MuzicBiz.org/Album/8997654 http//ebiz.org/Stock 10
http//MuzicBiz.org/Album/8997654 http//ebiz.org/Cost 12.00
- We have adequately represented the meta-data and
it is ready for transport via XML - But this table only represents the meta-data and
does not relate to any data described by it
41Sharing Data and Meta-Data
Document Name Value
http//MuzicBiz.org/Album/1011234 http//ebiz.org/Stock 5
http//MuzicBiz.org/Album/1011234 http//ebiz.org/Cost 16.00
http//MuzicBiz.org/Album/7655432 http//ebiz.org/Stock 4
http//MuzicBiz.org/Album/7655432 http//ebiz.org/Cost 19.00
http//MuzicBiz.org/Album/8997654 http//ebiz.org/Stock 10
http//MuzicBiz.org/Album/8997654 http//ebiz.org/Cost 12.00
- We need a way to identify the document that the
meta-data describes - For this purpose we add a name/value pair that
names the URL of the document
42Sharing Data and Meta-Data
ltdocument type"News Item" url"http//www.ePoli
tix.com/Articles/0000005a4787.htm" xmlnsdc"http
//purl.org/dc/elements/1.1/"gt ltdcTitlegtI will
stand says Portillolt/dcTitlegt ltdcCreatorgtCraig
Hoiylt/dcCreatorgt ltdcSubjectgtTory leadership
contestlt/dcSubjectgt lt/documentgt
43RDF Model and Syntax
- RDF Model
- In this case the model we are speaking of are the
triples - The definition of RDF is representation
independent - This means that XML is only one way of writing RDF
44RDF Terminology
- In RDF terminology a STATEMENT is used to
describe a triple - This term arises from using a triple to make a
statement about a document
45RDF Terminology
- Triples
- Resources and Properties
- In the RDF specification the name part of the
name/value pair is regarded as a PROPERTY - The subject of the meta data is regarded as a
RESOURCE
46RDF Terminology
- Triples
- A triple is the combination of the three parts -
a resource with a property and a value
47RDF Terminology
- A triple can express a relationship between
resources
Resource Property Value
http//MuzicBiz.org/Albums/7655432 http//MuzicBiz.org/Prop/Track http//MuzicBiz.org/Tracks/1667653
Track
http//MuzicBiz.org/Albums/7655432
http//MuzicBiz.org/Tracks/1667653
48RDF Terminology
Track
http//MuzicBiz.org/Albums/7655432
http//MuzicBiz.org/Tracks/1667653
- The terminology for this model is the SUBJECT of
our statement is the album and the track is the
OBJECT - The two resources are joined by a PREDICATE
- The predicate specifies the nature of the
relationship between the two resources
49RDF Terminology
- Notation
- When writing about RDF it is useful to be able to
show statements or sets of triples for discussion
50Notation
- English
- English is simplist
- Craig Hoy is the author of http//www.ePolitix.com
/Articles/0000005a4787.htm
51Notation
- SUBJECT has a PREDICATE of OBJECT
- Example
- http//www.ePolitix.com/Articles/0000005a4787.htm
has an author of Criag Hoy
52Notation
.../Articles/000000005a4787.htm
Craig Hoy
author
53Notation
http//MuzicBiz.org/Review, http//MuzicBiz.o
rg/Albums/101234, A relaxing album to prune
to.
http//MuzicBiz.org/Review, http//MuzicBiz.o
rg/Albums/7655432, Lively! Perfect when mowing
the lawn.
http//MuzicBiz.org/Review, http//MuzicBiz.o
rg/Albums/8997654, Very moody. Great when
planning your next planting.
54Notation
- Complex sets of data can most compactly be
represented in a graph
.../Articles/000000005a4787.htm
ltdcCreatorgt
.../Authors/Craig20Hoy
ltdcPublishergt
ltxyzJobTitlegt
.../companynumber/3935644
Editor
ltdcCreatorgt
55RDF Syntax
- So far weve seen how RDF models meta data
- Now we need to look at how these models are
expressed in XML
56RDF/XML
57How is a statement formed?
- Statement begins -
- Reference to the resource that the statement is
about (SUBJECT) - This is in the rdfabout attribute of the
ltrdfDescriptiongt element
58How is a statement formed?
- The statement is located inside the
ltrdfDescriptiongt element - Says there is a property of this resource -
dcCreator that has a value of Craig Hoy
59Many Namespaces
- When there are many namespaces to be defined in
an RDF document grouping them in one place makes
them stand out
60RDF Elements
- ltrdfDescriptiongt Element
- Contains the URI for the resource being described
- The ltrdfDescriptiongt element identifies the
subject - A child element defines a predicate/object pair
61ltrdfDescriptiongt
- More detail about this element -
- Multiple properties for the same resource
- String literals and resource URIs
- Nesting statements
- rdfabout attribute
62ltrdfDescriptiongt
- More detail about this element -
- The rdfID attribute
- Anonymous resources
- The rdftype attribute
63ltrdfDescriptiongt
- The ltrdfDescriptiongt element is actually a
container for as many predicate/object pairs are
you want
.../Articles/000000005a4787.htm
Craig Hoy
dcCreator
dcPublisher
ePolitix
64ltrdfDescriptiongt
- One or more properties may be specified for the
same resource
65ltrdfDescriptiongt
- An alternative syntax
- Attributes take the place of child elements
66ltrdfDescriptiongt
- In order that a resource not be confused with a
string literal, there is an RDF attribute
67ltrdfDescriptiongt
- Supposing we wanted to add some information to
the description
ltdcCreatorgt
../Articles/000000005a4787.htm
../Authors/Craig20Hoy
ltdcPublishergt
ltgt
../companynumber/3935644.htm
Editor
68ltrdfDescriptiongt
- One way to code this in RDF is to simply add a
statement that contains the new information
69ltrdfDescriptiongt
- RDF allows for the ltrdfDescriptiongt element to
be nested
70ltrdfDescriptiongt
- Both representations are correct and the
underlying model is the same in both cases - Which to use depends on context?
- If there are many articles, the nested
information would be repeated - Therefore the first representation would be
preferable in this case
71ltrdfDescriptiongt
- Attributes
- We know about the rdfabout attribute
- The contents of the rdfabout attribute are a URI
72ltrdfDescriptiongt
- rdfID
- This attribute allows a resource in a document to
be named and then referred to with this name - The ID attribute and the about attribute ARE
EXCLUSIVE - only one or the other can be used
73rdfID
74Anonymous Resources
- An option for the ltrdfDescriptiongt element would
be to NOT specify an rdfabout or rdfID
attribute - This would be the way to introduce anonymous
resources as part of an RDF description - The description element would exist for no other
reason then to be given properties
75Anonymous Resources
76Anonymous Resources
77Anonymous Resources
- Back to Mozart
- Assume that some authority has given the piece
Eine Kleine Nacht Muzik the URL - http//MuzikBiz.org/233456
- We can also give this piece of music an assigned
code from the Dewey Decimal Classification code - 781.68
78Anonymous Resources
- The resulting statement describing this would be
- http//MuzikBiz.org/233456 has a dcSubject of
781.68 - The RDF is shown following
79Anonymous Resources
80Anonymous Resources
- If we want now to identify the source of this
classification we can do so with the RDF value
tag (shown following)
81Anonymous Resources
82Anonymous Resources
- When representing an anonymous resource like this
one, we know that there is some resource we are
representing, we just dont know how to name it - This is why we introduce an ltrdfDescriptiongt tag
into the RDF without an rdfabout tag - The result is a graph with an anonymous node
83Anonymous Resources
84rdftype Attribute
- Applies to ltrdfDescriptiongt
- Powerful
- Links worlds of knowledge representation to
object orientation (ooh ... aah) - Allows us to specify that the resource being
referred to is of a particular class - Allows parsers to understand more about the meta
data
85rdfType Attribute
- Assume that an organization named the
International Press Telecommunications Council
(IPTC) is responsible for the XML format used in
the ePolitix articles we have been using
86rdfType Attribute
- IPTC has defined a URI that allows us to indicate
that the article being referred to is in their
NITF format - NITF refers to News Industry Text Format
- This format is used widely to transfer news
between organizations
87rdftype Attribute
- The URL for all object types that belong to the
NITF group of objects is something like - - http//www.iptc.org/schema/NITF
88rdftype Attribute
- This information could be used to enhance the RDF
used in the ePolitix XML
89rdfType Attribute
- Now the rdftype attribute gives us a very
powerful capability akin to one that we would
find in object-oriented programming - Once we know that a particular resource is of a
particular type then we can use that information
to check its meta-tags to insure that the correct
meta-tags are used
90rdftype Attribute
- For example if we are referring to a person
resource AND we have said that a person has a
FORMAT then this is probably incorrect - (The dcFormat property is used to specify the
type of MIME documet)
91rdftype Attribute
- But we know nothing more at this point about the
resource - http//www.ePolitix.com/Authors/Craig20Hoy
- By specifying an rdftype we can give the RDF
processor more information
92rdftype Attribute
93Typed Elements
- An alternate syntax to use to express the same
type of information are known as TYPED ELEMENTS - In this notation the resource that would be used
in the rdftype attribute would be turned into a
namespace qualified element
94Typed Elements
- We assumed a namespace prefix for objects created
by the IPTC for their NITF stnadard - The namespace prefix was
- http//www.iptc.org/schema/NITF
- It is now possible to create object types or
references to schemas by specifying a URI as in - http//www.iptc.org/schema/NITFNewsArticle
95Typed Elements
- By assigning the prefix that was just defined to
a namespace paceholder, and use the classname as
the name of an element the ltrdfDescriptiongt
element can be replaced - ltrdftype rdfresourcehttp//www.itpc.org/schema
/NITFNewsArticle gt - ltrdfRDF xmlnsnitfhttp//www.iptc.org/schema/NI
TF
96Typed Elements
Namespace definitions
97Typed Elements
- This feature is very important to RDF
- Anything which can appear in an RDF description
tag, - is valid when used as a typed element
98Typed Elements
Observe the change to attributes
99Typed Elements
- Being able to do this allows you to extract data
from existing XML documents in the form of
triplese
100Property Elements
- Property Information can be expressed through -
- String literals
- value for a predicate defined by the name of the
element containing the literal
101Property Elements
- Said a lot about the ltrdfDescriptiongt elements
so far - Recap
- String Literals
- Value for a predicate that is defined by the name
of the element containing the literal
102Property Elements
103Property Elements
- Resources
- Express properties of a resources
- The value of the predicate is actually another
resource - Use a URI to specify which resource it is
104Property Elements
105Property Elements
- Yet another way to accomplish this is to nest RDF
statements one within another - This says that the value of the property
ltdcCreatorgt is itself a resource
106Property Elements
- Type information can also be specified in the
content of a property element
107Property Elements
- Taking a type resource and turning it into a
namespace-qualified element name could abbreviate
this
108parseTypeLiteral
- Sometimes it is necessary to tell the parser that
it should NOT parse a particular part of the RDF - The RDF should be stored as is
- Consider the following example
109parseTypeLiteral
- We are writing a mathematical paper entitled
Ramifications of (ab)2 to World Peace - We would like to create a MathML to specify the
title since it can help us format the various
symbols properly - If we place the MathML inside the ltdcTitlegt tag
we need a way to tell the RDF parser that the
MathML is not RDF
110Ramifications of ...
- The contents of this element are not simply a
string - The text must be well-formed XML otherwise the
parser will fail
111parseTypeResource
- There are times when the parser cannot tell the
difference between a property value and a
resource - Property values are usually inside an
rdfDescription element
112parseTypeResource
- If this were all there is to it then all would be
well. Unfortunately RDF allows us to make
statements about the author as follows
113parseTyperesource
- What is all we wanted to do was to provide the
email of the author? - We really dont care about identifying the author
114parseTypeResource
- This still seems too elaborate
- We could simply express this information as
follows
115parseTypeResource
- So if we were to interpret this we would come up
with two different interpretations making its
meaning ambiguous - On the one hand if we evaluated the
representation from the inside out we would have
an anonymous ltdcCreatorgt element which has a
ltvEmailgt property
116parseTypeResource
First Interpretation Inside Out
117parseTypeResource
- If you interpret the RDF representation from the
outside in you would say you had a resource of a
web page that had a ltdcCreatorgt property and
that this ltdcCreatorgt property refers to an
anonymous resource of rdftype vEmail
118parseTypeResource
119parseType Resource
- This second interpretation of the RDF/XML is the
one that we would prefer but the parser cannot
distinguish which of these two models it should
create - The problem is we need the ltdcCreatorgt element
to be interpreted as both a web page and also as
an anonymous resource so properties can attach to
it
120parseType Resource
- RDF/XML does allow us to force the ltdcCreatorgt
to be interpreted as - a predicate
- an anonymous resource
121parseType Resource
- which is exactly the same as specifying the
anonymous resource explicitly
122Containers
- Containers
- list of resources
- collection of resources
- Example
- List of articles that make up a web site
- List of authors who have contributed to an article
123Containers
- RDF
- Three types of containers
- bag
- sequence
- alternative
- can be used anywhere the ltrdfdescriptiongt
element can be used
124ltrdfBaggt
- simplest container
- used to contain multiple values for a property
- no significance to the order of the values
125ltrdfBaggt
- Example
- The elements in a bag may also be literals
126ltrdfSeqgt
- Whereas a bag does not impose any order on the
elements in the list that is associated with the
element, ltrdfSeqgt does require that the list
attached to it will be in a specific order
127ltrdfSeqgt
128ltrdfAltgt
- ltrdfAltgt provides us with a way to select from a
list of resources, a specific resource - In other words ltrdfAltgt provides a way of
specifying alternative options - An rdf processor could choose a resource based on
some desirable property
129ltrdfAltgt