Title: Encoding DC in XHTML, XML and RDF
1Encoding DC in (X)HTML, XML and RDF
Tutorial at ECDL 2004, Bath September 2004
- Andy Powell
- a.powell_at_ukoln.ac.uk
- UKOLN, University of Bath, UK
- http//www.ukoln.ac.uk/
UKOLN is supported by
2Contents
- an abstract model for DC (30 mins)
- encoding DC in XHTML (30 mins)
- encoding DC in XML (30 mins)
- encoding DC in RDF/XML (30 mins)
- practical examples
- OAI Protocol forMetadata Harvestingand RSS (20
mins) - assigning identifiers (20 mins)
Note you are going to see lots ofangle-brackets
but no XML schemas!
3Important
- this is a tutorial
- please feel free to ask questionsas we go
through!
4Important DCMI documents
- DCMI Abstract Model DRAFThttp//www.ukoln.ac.uk
/metadata/dcmi/abstract-model/ - Expressing Dublin Core in HTML/XHTML meta and
link elementshttp//dublincore.org/documents/dcq-
html/ - Guidelines for implementing Dublin Core in
XMLhttp//dublincore.org/documents/dc-xml-guideli
nes/ - Expressing Simple Dublin Core in
RDF/XMLhttp//dublincore.org/documents/dcmes-xml/
- Expressing Qualified Dublin Core in
RDF/XMLhttp//dublincore.org/documents/dcq-rdf-xm
l/ - Namespace Policy for the DCMIhttp//dublincore.or
g/documents/dcmi-namespace/ - DCMI Metadata Termshttp//dublincore.org/document
s/dcmi-terms/
5Implementing DC
- this tutorial is about the mechanics of
implementing DC in HTML, XML and RDF - it doesnt really consider which implementation
strategy isthe best! - ask yourself two questions
- what am I trying to achieve?
- does using HTML, XML or RDF help me achieve it?
- do software and services exist that will support
the creation and use of mymetadata?
6Abstract models for DC
7Why abstract models?
- the first part of this tutorial isnt going to
show any angle brackets! - why?
- because before we start creating DCMI
descriptions we need to understand - the DCMI view of the world/resources we want to
describe (the DCMI resource model) - the DCMI view of the descriptions we make about
that world (the DCMI description model)
8DCMI resource model
- each resource that we want to describe has zero
or more properties - a property is a specific aspect, characteristic,
attribute or relation used to describe a resource - each property has one or more values
- each value is a resource (the physical or
conceptual entity that is associated with a
property when it is used to describe a resource)
9Err but what is a resource?
- W3C/IETF definition of resource is
- anything that has identity. Familiar examples
include an electronic document, an image, a
service (e.g., "today's weather report for Los
Angeles"), and a collection of other resources.
Not all resources are network "retrievable"
e.g., human beings, corporations, and bound books
in a library can also be considered resources. - i.e. a resource is anything
- physical things (books, cars, people)
- digital things (Web pages, digital images)
- conceptual things (colours,points in time)
10Yeah, but no, but
- but this seems to be too wide for the things we
can describe with DC! - can we really describe people using DC?
- do people have titles and subjects?
- no in general we only use DC to describe a
sub-set of all resources - anything covered by the DCMIType list
- Collection, Dataset, Event, Image (Still or
Moving), Interactive Resource, Service, Software,
Sound, Text, Physical Object
11DCMI resource model (2)
- each resource may be a member of one or more
classes - each class may be related to one or more other
classes by a refines (sub-class) relationship - the two classes share some semantics such that
all resources that are members of the sub-class
are also members of the related class
where the resource is the value of a property,
the class is referred to as a vocabulary encoding
scheme
12DCMI resource model (3)
- each property may be related to exactly one other
property by a refines (sub-property) relationship - the two properties share some semantics such that
all valid values of the sub-property are also
valid values of the related property
13DCMI description model
- a description is made up of
- one or more statements (about one, and only one,
resource) and - zero or one resource URI (a URI reference that
identifies the resource being described) - each statement is made up of
- a property URI (that identifies a property),
- zero or one value URI (that identifies a value of
the property), - zero or one encoding scheme URI (that identifies
the class of the value) and - zero or more value representations of the value
14DCMI description model (2)
- each property is an attribute of the resource
being described - each property URI may be repeated in multiple
statements - the value representation may take the form of a
value string, a rich value or a related
description
15DCMI description model (3)
- each value string is a simple, human-readable
string that represents the value of the property - each value string may have an associated encoding
scheme URI that identifies a syntax encoding
scheme - each value string may have an associated value
string language that is an ISO language tag (e.g.
en-GB)
16DCMI description model (4)
- each rich value is some marked-up text, an image,
a video, some audio, etc. or some combination
thereof that represents the resource that is the
value of the property - each related description is a description of
(i.e. some metadata about) the resource that is
the value of the property
17The 11 principle
- notice that the model indicates that each
property used in a description must be an
attribute of the resource being described - this is commonly referred to as the 11 principle
- the principle that a DCMI metadata description
describes one, and only one, resource - however
18Description sets
- real-world metadata applications tend to be based
on loosely grouped sets of descriptions (where
the described resources are typically related in
some way) - known here as description sets
- for example, a description set might comprise
descriptions of both a painting and the artist
19DCMI records
- description sets are instantiated, for the
purposes of exchange between software
applications, in the form of metadata records - each record conforms to one of the DCMI encoding
guidelines (XHTML meta tags, XML, RDF/XML, etc.)
ltdctitlegt a document lt/dctitlegt ltdccreatorgt and
y powell lt/dccreatorgt
20Values (again!)
- a value is the physical or conceptual entity that
is associated with a property when it is used to
describe a resource - the value of the DC Creator property is a person,
organisation or service - a physical entitiy - the value of the DC Date property is a point in
time - a conceptual entity - the value of the DC Coverage property may be a
geographic region or country - a physical entity - the value of the DC Subject property may be a
concept - a conceptual entity - or a physical
object or person - a physical entity - each of these entities is a resource
21Simple vs. qualified DC?
- the notions of simple DC and qualified DC are
often referred to in DCMI discussions - a view of what these two terms mean is presented
here - note that not everyoneagrees with mydefinitions!
22Simple DC record
- a simple DC record is a record that
- conforms to the abstract model,
- comprises only a single description,
- uses only the 15 properties in the Dublin Core
Metadata Element Set, - makes no use of value URIs, encoding schemes,
rich values or related descriptions.
23A couple of notes
- there is no guaranteed linkage between a simple
DC record and the resource being described
because the resource URI is optional - such a linkage may be made by encoding the URI of
the resource as the value string of the DC
Identifier element, however this is not mandatory
everything in DC is optional - while the value string of a property may look
like a URI, there is nothing in the simple DC
model that indicates this is the case
at their own risk, implementations may choose to
guess which value strings are URIs and which are
not
24Qualified DC model
- a qualified DC record is a record that
- conforms to the DCMI abstract model,
- contains at least one property taken from the
DCMI Metadata Terms recommendation
25A couple of notes
- it is still the case that there is no guaranteed
linkage between a qualified DC record and the
resource being described! - a linkage may be made by encoding the URI of the
resource as the value string of the DC Identifier
element, however this is not mandatory
everything in DC is optional
where the value of a property is a URI, we can
now indicate that it is a URI by using the URI
encoding scheme
26Dumb-down
- the process of translating a qualified DC
metadata record into a simple DC metadata record
is normally referred to as dumbing-down - can be separated into two parts property
dumb-down and value dumb-down. - each of these processes can be be approached in
one of two ways - informed dumb-down
- uninformed dumb-down
27Dumb and dumberer
- informed dumb-down takes place where the software
performing the dumb-down algorithm has knowledge
built into it about the property relationships
and values being used within a specific DCMI
metadata application - uninformed dumb-down takes place where the
software performing the dumb-down algorithm has
no prior knowledge about the properties and
values being used
28Dumb-down algorithm
element
value
uninformed
informed
- and in all cases
- ignore any related descriptions and rich values,
- ignore any encoding scheme URIs.
29Encoding DC in XHTML (and HTML!)
30What is being described?
- a DC record embedded in an (X)HTML document
describes that document - if you want to describe something else, dont
embed it in the (X)HTML document! - not everyone would
agree with this
31Simple DC elements
- use the name and content attributes of the
XHTML ltmetagt element to encode the DC element
(one of the 15 DCMES elements) and it's value
string. Use the following patternltmeta
name"DC.element" content"Value string" /gt - for exampleltmeta name"DC.date"
content"2001-07-18" /gt - the element names of the 15 DCMES
elementsalways have a lower-case first letter
32Simple DC values
- value strings go in the XHTML ltmetagt element
content attribute - the string in the content attribute is defined
to be CDATA, i.e. a sequence of characters from
the document character set which may include
character entities - long value strings may be wrappedacross
multiple lines as necessarywill need to
escape some characters, amp, lt, gt, etc
33Language of the value
- where the language of the value string is
indicated, it should be encoded using the
xmllang attribute of the XHTML ltmetagt element.
For exampleltmeta name"DC.subject"
xmllang"en" content"seafood" /gtltmeta
name"DC.subject" xmllang"fr" content"fruits
de mer" /gt
34Repeated elements
- multiple property values should be encoded by
repeating the XHTML ltmetagt element for that
property, for exampleltmeta name"DC.title"
content"First title" /gtltmeta name"DC.title"
content"Second title" /gt
35Other DC elements
- use the name and content attributes of the
XHTML ltmetagt element to encode the DC element
(e.g. audience) and it's value. Use the following
patternsltmeta name"DCTERMS.element"
content"Value" /gt - for exampleltmeta name"DCTERMS.audience"
content"software developers" /gt - element names may be mixed-case butshould
always have a lower-case first letter
36Element refinements
- use the name and content attributes of the
XHTML ltmetagt element to encode the element
refinement and it's value. Use the following
patternltmeta name"DCTERMS.elementRefinement
content"Value" /gt - for exampleltmeta name"DCTERMS.modified"
content"2001-07-18" /gt
37Element refinements (2)
- element refinements should use the names
specified inDCMI Metadata Termshttp//dublinco
re.org/documents/dcmi-terms/ - as a general rule, element refinement names may
be mixed-case but should always have a lower-case
first letter
38Encoding schemes
- encoding schemes are encoded using the scheme
attribute of the XHTML ltmetagt element, using the
following patternltmeta name"DC.element"
scheme"DCTERMS.Scheme" content"Value" /gt - for exampleltmeta name"DC.date"
scheme"DCTERMS.W3CDTF" content"2001-07-18"
/gt
39Encoding schemes (2)
- encoding schemes should use the names specified
inDCMI Metadata Termshttp//dublincore.org/doc
uments/dcmi-terms/ - as a general rule, encoding scheme names may be
mixed-case but should always start with an
upper-case letter. Encoding scheme names are
often all upper-case
40Handling namespaces
- the DC. and DCTERMS. prefixes are used to
indicate the namespace from which the property is
taken - put the namespace URI in an XHTML ltlinkgt
elementltlink rel"schema.DC"
href"http//purl.org/dc/elements/1.1/" /gtltlink
rel"schema.DCTERMS" href"http//purl.org/dc/term
s/" /gt - while any string is allowable as the prefix,
current practice is to use DC. and DCTERMS.
41Value URIs
- where the value of a property is the URI of
another resource (e.g. DC.relation) an
alternative form of encoding using the XHTML
ltlinkgt element is preferred. Use the following
patternltlink rel"propertyName"
hrefvalueURI" /gt - for exampleltlink rel"DC.relation"href"http/
/www.example.org/" /gtltlink rel"DCTERMS.reference
s"href"http//www.example.org/176459.pdf" /gt
42HTML profile
- to give recipient software applications an
indication of the XHTML profile that was used to
encode the DCMI metadata, the profile attribute
of the XHTML ltheadgt element should be
usedlthead profile"http//dublincore.org/docum
ents/dcq-html/"gt
43The case of names
- note that some of the old DCMI documents
recommend(ed) using an uppercase first letter for
the names of DCMES elements and element
refinements, e.g. using Title rather than
title - this form of DCMES element naming is acceptable
but is no longer considered the preferred form
44The case of names (2)
- in general, any software applications that
consume DC records embedded into XHTML Web pages
should ignore the case of names and should treat
both . (full-stop) and (colon) as valid
characters after the prefix - i.e. all the following forms should be treated as
being equivalent ltmeta name"DC.date"
content"2001-07-18" /gt ltmeta name"DC.Date"
content"2001-07-18" /gt ltmeta name"dc.date"
content"2001-07-18" /gt ltmeta name"dcdate"
content"2001-07-18" /gt
45Older versions of HTML
- all the examples in this tutorial conform to
XHTML 1.0 - most of the recommendations in this tutorial can
be applied to older versions of HTML (e.g. HTML
4.01) but - use lang rather than xmllang
- older versions of HTML do not require the
trailing / before the closing gt in the HTML
ltmetagt element - very old versions of HTML have no ltmetagt scheme
attribute
46Mixing DC and non-DC
- DC metadata can be mixed with non-DC metadata in
XHTML ltmetagt elements - the following example embeds DC, AGLS and
unspecified metadata properties in the same XHTML
Web pageltlink rel"schema.DC"
href"http//purl.org/dc/elements/1.1/" /gtltlink
rel"schema.AGLS"href"http//www.naa.gov.au/reco
rdkeeping/gov_online/agls/1.2" /gtltmeta
name"DC.title" content"Services to Government"
/gtltmeta name"keywords" content"archives,
information management, public administration"
/gtltmeta name"AGLS.Function" scheme"AGIFT"
content"recordkeeping standards" /gt
47A couple of examples
- Simple DCexample 1
- Qualified DCexample 2
- ScreenCam of using DC-dothttp//www.ukoln.ac.uk/
metadata/dcdot/
48Encoding DC in XML
49Introduction to XML
- this section is based onGuidelines for
implementing Dublin Core in XMLhttp//dublincore.
org/documents/dc-xml-guidelines/ - nine recommendations for encoding DC in XML
- Note these recommendations do not create
RDF/XML. This is not intended to imply that
plain XML is better than RDF/XML RDF is covered
in the next section!
50Recommendation 1
- implementers should base their XML applications
on XML Schemas rather than XML DTDs - approaches based on XML Schemas are more flexible
and are more easily re-used within other XML
applications - in some cases it may be sensible to provide
both an XML Schema and a DTD for the application.
Where XML Schemas are not used, a DTD should be
provided instead
51Recommendation 1 (2)
- the DCMI maintains a list of XML schemas that are
in use in projects or products using DCMI
metadataDCMI Metadata expressed in XML Schema
Languagehttp//dublincore.org/schemas/xmls/
52Recommendation 2
- implementers should use URI references (see
later) to uniquely identify DC elements, element
refinements and encoding schemesDC namespaces
are defined in the DCMI Namespace
Recommendation...
53Container elements
- note that it is anticipated that records will be
encoded within one or more container XML
element(s) of some kind - this tutorial makes no recommendations for the
name of any container element, nor for the
namespace that the element should be taken from - candidate container element names include ltdcgt,
ltdublinCoregt, ltresourcegt, ltrecordgt and ltmetadatagt
54Recommendation 3
- implementers should encode properties as XML
elements and values as the content of those
elements - the name of the XML element should be an XML
qualified name (QName) of the propertyltdctitlegt
Dublin Core in XMLlt/dctitlegt - do not use constructs likeltdctitle
value"Dublin Core in XML" /gt
55Recommendation 4
- the property names for the 15 DC elements should
be all lower-caseltdctitlegtDublin Core in
XMLlt/dctitlegt - do not useltdcTitlegtDublin Core in
XMLlt/dcTitlegt
56Recommendation 5
- multiple property values should be encoded by
repeating the XML element for that
propertyltdctitlegtFirst titlelt/dctitlegt
ltdctitlegtSecond titlelt/dctitlegt
57Simple DC example
58Recommendation 6
- element refinements should be treated in the same
way as other properties - the name of the XML element should be an XML
qualified name (QName)ltdctermsavailablegt2002-0
6lt/dctermsavailablegt - do not use any of the followingltdcdate
refinement"available"gt2002-06lt/dcdategtltdcdate
type"available"gt2002-06lt/dcdategtltdcdategt
ltdctermsavailablegt2002-06 lt/dctermsavailablegt
lt/dcdategt
59Recommendation 6 (2)
- element refinements are properties in their own
right and are therefore best encoded in a similar
way to other DC elements - in particular, it should be noted that element
refinements may have further refinements of their
own (e.g. format is refined by extent which
might be further refined by duration) - nesting does not mean refinement
- nesting might be used for other purposes
60Recommendation 7
- encoding schemes should be implemented using the
'xsitype' attribute of the XML element for the
property - the name of the encoding scheme should be given
as the attribute value, and should be in the form
of an XML qualified name (QName)ltdcidentifier
xsitype"dctermsURI"gt http//www.ukoln.ac.uk/
lt/dcidentifiergt
61Recommendation 7 (2)
- it should be noted that there may be existing DC
XML applications that use other conventions to
support encoding schemes, notably the use of a
scheme attribute of the XML element for the
property - therefore, it may be sensible for software
applications that consume DC XML to be fairly
liberal in what they accept
62Recommendation 8
- elements, element refinements and encoding
schemes should use the names specified inDCMI
Metadata Termshttp//dublincore.org/documents/dcm
i-terms/ - note, the 15 DCMES element names all start with
a lowercase letter
63Recommendation 8 (2)
- element and element refinement names may be
mixed-case but should always have a lower-case
first letter - encoding scheme names may be mixed-case but
should always start with an upper-case
letterltdctermsisPartOf xsitype"dctermsURI"gt
http//www.bbc.co.uk/lt/dctermsisPartOfgt
ltdctermstemporal xsitype"dctermsPeriod"gtname
The Great Depression start1929 end1939
lt/dctermstemporalgt
64Recommendation 9
- where the language of the value is indicated, it
should be encoded using the xmllang
attributeltdcsubject xmllang"en"gt
seafoodlt/dcsubjectgtltdcsubject xmllang"fr"gt
fruits de merlt/dcsubjectgt
65Some examples
- Qualified DCexample 4
- DC and IMSexample 5
- DC, IMS and ODRLexample 6
HEALTH WARNING Examples 5 and 6 may seriously
damage your interoperability!
66Encoding DC in RDF
67What is RDF?
- Resource Description Framework
- W3C recommendation for metadata
- model and syntax(es)
- XML is most common syntax in use on the Web
- underpins the semantic WebW3C - Resource
Description Framework (RDF)http//www.w3.org/RDF/
68Why use RDF?
- RDF provides shared metadata model
- shared meaning
- metadata can be shared between applications that
have little or no knowledge about each other - e.g. an RDF-based bibliographic application can
consume RDF-based geospatial metadata and have
'some' knowledge of what it meanswith (X)HTML
and XML encodings, softwareapplications must
have understanding hard-codedinto them
69DC in RDF
- DC abstract models map easily onto the RDF model
(because RDF was the basis for them!) - DC in RDF/XML syntax is an encoding of the RDF
model in XML - simple DC is similar to the non-RDF XML we've
seen already - but with the addition of ltrdfRDFgt and
ltrdfDescriptiongt container elements - example 7
70RDF basics the model
- model based on triples
- a resource has a property which has a value
- often represented as an arc-node diagram (or
graph) - resources and properties are identified using URI
references
property
resource
value
71A more concrete example
- The graph below approximately translates into
English as - the resource identified by the URI
http//example.org/ has a dccreator that is
represented by the string Andy Powell
http//purl.org/dc/elements/1.1/creator
http//example.org/
Andy Powell
72Values as resources
- values can be resources too
- means that we can then attach properties to the
value as well as to the original resource - build up quite complex graphs
http//example.org/
dccreator
myname
myphoneNumber
Andy Powell
01225 383933
73Typed and blank nodes
- nodes can be blank (to represent resources that
have not be assigned a URI) - can also indicate the class of a resource using
the rdftype property
myPerson
rdftype
http//example.org/
dccreator
myname
myphoneNumber
Andy Powell
01225 383933
74Qualified DC in RDF
- now ready to look at some more complex examples
- for full details about how to encode DC in RDF
seeExpressing Simple Dublin Core in
RDF/XMLhttp//dublincore.org/documents/dcmes-xml/
Expressing Qualified Dublin Core in
RDF/XMLhttp//dublincore.org/documents/dcq-rdf-xm
l/
75Case study 1 dccreator
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsmy"http//purl.org"gt
ltrdfDescriptiongt ltdccreatorgt
ltrdfDescriptiongt ltrdfvaluegt
Andy Powell lt/rdfvaluegt
ltmyemailgt a.powell_at_ukoln.ac.uk
lt/myemailgt lt/rdfDescriptiongt
lt/dccreatorgt lt/rdfDescriptiongt lt/rdfRDFgt
Example RDF description using dccreator
76Case study 1 dccreator
Andy Powell
rdfslabel
dccreator
Andy Po
myname
myemail
a.powell_at_uko
a.powell_at_uko
myaffiliation
UKOLN, Univ
and the RDF model it represents.
77Case study 1 dccreator
relatedMetadata
Andy Powell
rdfslabel
dccreator
Andy Po
myname
myemail
a.powell_at_uko
a.powell_at_uko
myaffiliation
UKOLN, Univ
But we dont want to embed all this information
into every instance metadata record do we?
78Case study 1 dccreator
Andy Powell
rdfslabel
dccreator
Andy Po
myname
myemail
a.powell_at_uko
a.powell_at_uko
myaffiliation
UKOLN, Univ
Need to separate part of the information out and
store it in a single place in this case in a
directory service
79Case study 1 dccreator
Andy Powell
rdfslabel
dccreator
Andy Po
valueURI
valueURI
myname
myemail
a.powell_at_uko
a.powell_at_uko
myaffiliation
UKOLN, Univ
To do this we need to assign a URI (the
valueURI) to the anonymous value node
80Case study 1 dccreator
relatedMetadataURI
Andy Powell
rdfslabel
dccreator
Andy Po
valueURI
valueURI
myname
myemail
a.powell_at_uko
a.powell_at_uko
myaffiliation
UKOLN, Univ
The document containing this information is
itself an RDF resource (the relatedMetadata)
and has a URI
81Case study 1 dccreator
relatedMetadataURI
Andy Powell
rdfslabel
dccreator
Andy Po
valueURI
valueURI
myname
myemail
rdfsseeAlso
a.powell_at_uko
a.powell_at_uko
myaffiliation
UKOLN, Univ
Use rdfseeAlso to form linkage between
description and relatedMetadata
82Case study 2 dcsubject
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt
lt/dcsubjectgt lt/rdfDescriptiongt lt/rdfRDFgt
Example RDF description using dcsubject (taken
from Qualified DC in RDF recommendation
83Case study 2 dcsubject
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt
lt/dcsubjectgt lt/rdfDescriptiongt lt/rdfRDFgt
Formated
rdfslabel
dcsubject
rdfsvalue
D08.586
rdftype
rdftype
dctermsMESH
and the RDF model it represents.
84Case study 2 dcsubject
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt
lt/dcsubjectgt lt/rdfDescriptiongt lt/rdfRDFgt
relatedMetadata
Formated
rdfslabel
dcsubject
rdfsvalue
D08.586
rdftype
rdftype
dctermsMESH
But we dont want to embed all this information
into every instance metadata record do we?
85Case study 2 dcsubject
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt lt/rdfRDFgt
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
lt/dctermsMESHgt lt/dcsubjectgt
lt/rdfDescriptiongt lt/rdfRDFgt
D08.586
Formated
rdfslabel
dcsubject
Formated
rdftype
rdftype
dctermsMESH
dctermsMESH
Need to separate part of the information out and
store it in a single place in this case with
the terminology owner
86Case study 2 dcsubject
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt lt/rdfRDFgt
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
lt/dctermsMESHgt lt/dcsubjectgt
lt/rdfDescriptiongt lt/rdfRDFgt
D08.586
Formated
rdfslabel
dcsubject
valueURI
valueURI
Formated
rdftype
rdftype
dctermsMESH
dctermsMESH
To do this we need to assign a URI (the
valueURI) to the anonymous value node
87Case study 2 dcsubject
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt lt/rdfRDFgt
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
lt/dctermsMESHgt lt/dcsubjectgt
lt/rdfDescriptiongt lt/rdfRDFgt
relatedMetadataURI
D08.586
Formated
rdfslabel
dcsubject
valueURI
valueURI
Formated
rdftype
rdftype
dctermsMESH
dctermsMESH
The document containing this information is
itself an RDF resource (the relatedMetadata)
and has a URI
88Case study 2 dcsubject
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt lt/rdfRDFgt
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
lt/dctermsMESHgt lt/dcsubjectgt
lt/rdfDescriptiongt lt/rdfRDFgt
relatedMetadataURI
D08.586
Formated
rdfslabel
dcsubject
valueURI
valueURI
rdfsseeAlso
Formated
rdftype
rdftype
dctermsMESH
dctermsMESH
Use rdfseeAlso to form linkage between
description and relatedMetadata
89Abstract DC model
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
ltrdfslabelgt Formate Dehydrogenase
lt/rdfslabelgt lt/dctermsMESHgt lt/rdfRDFgt
lt?xml version"1.0"?gt ltrdfRDF xmlnsrdfhttp//ww
w. xmlnsrdfshttp//www.w3.org/ xmlnsdchttp/
/purl.org/dc/ xmlnsdcterms"http//purl.org"gt
ltrdfDescriptiongt ltdcsubjectgt
ltdctermsMESHgt ltrdfvaluegt
D08.586.682.075.400 lt/rdfvaluegt
lt/dctermsMESHgt lt/dcsubjectgt
lt/rdfDescriptiongt lt/rdfRDFgt
relatedMetadataURI
resource
resource
related description
D08.586
Formated
valueString
value string (language)
rdfslabel
dcsubject
valueURI
property
property
valueURI
valueURI
value URI
rdfsseeAlso
Formated
rdftype
rdftype
dctermsMESH
dctermsMESH
In terms of abstract DC model we now have
resource, property, value URI, value string (and
value string language), encoding scheme, related
description
encoding scheme
90Practical examples OAI and RSS
91OAI-PMH
- OAI Protocol for Metadata Harvesting
- simple protocol for sharing metadata records
between applications - currently at version 2.0
- based on HTTP, XML, XML Schema and XML namespaces
- allows a harvester to ask a remote repository for
some or all of its metadata records
92OAI-PMH (2)
- simple DC is default (mandatory) record format
- supports any record format provided it can be
encoded using XML (e.g. DC, IMS, MARC, ODRL,
)Open Archives Initiativehttp//www.openarchiv
es.org/
93OAI-PMH example
- record from the American Memory repository at the
Library of Congresshttp//memory.loc.gov/cgi-bin
/oai2_0 - example 8
- ScreenCam of using the repository explorer
- GetRecord for record identifieroailcoa1.loc.gov
loc.gmd/g3701p.rr003570
94RSS
- RDF Site Summary or Rich Site Summary (or even
Really Simple Syndication) - at least 3 different versions (0.91, 1.0 and 2.0)
- all based on XML but not compatible
- simple format for sharing news feeds on the Web
- RSS channels list of items
- channels updated by updating XML file
- RSS clients gather XML on regular basis
95RSS 1.0 and DC example
- RSS 1.0 based on RDF
- most flexible and extensible of the RSS family
- not necessarily the most widely deployed - can include DC in both channel and item
descriptions - example 9
- full documentation atRDF Site Summary 1.0
Modules Qualified Dublin Corehttp//web.resource
.org/rss/1.0/modules/dcterms/
96Assigning identifiers tometadata terms
97Whats the problem?
- the terms used in DCMI metadata records must be
assigned a URI reference before they can be used - qualified DC application profiles generally use
local additions to DCMI terms - therefore these additional terms must be assigned
a URI reference - a URI reference is a URI with an optional
fragment identifier
98DCMI terms URI references
- all DCMI terms have already been assigned URI
references - for examplehttp//purl.org/dc/elements/1.1/titl
ehttp//purl.org/dc/terms/dateCopyrighted
99Namespace-name issues
- encoding syntaxes split the term URI reference
into two parts - namespace
- name
- the namespace is shortened to a namespace prefix
- for example
- DC.title (XHTML)
- dctitle (XML, RDF/XML)
100Guidelines
- for groups of related terms, URI references are
typically assigned such that they can share a
namespace prefix - all term URI references should resolve to a human
and/or machine readable description of the term - term URI references should use a registered URI
scheme - term URI references should be assigned with the
intention of them being as persistent as the
Internet
101A note on namespaces
- DCMI namespaceA DCMI namespace is a collection
of DCMI terms (a collection of names) - DCMI termA DCMI term is a DCMI element, a DCMI
qualifier or term from a DCMI-maintained
controlled vocabulary - each DCMI namespace is identified by a URI each
name in the namespace is also a URI - a mechanism for making DCMI terms unique
102How do I assign URIs?
- no clear recommended best practice in this area
yet! - four strategies for assigning URIs are presented
here - there are other strategies!
103Using project/service URIs
- simple to do
- but danger of lack of persistence
- exampleshttp//myservice.org/terms/priceltmyser
vicepricegt (XML, RDF/XML)MYSERVICE.price
(XHTML)http//myproject.org/metadata/vocabs/colo
rRedltmyprojectRedgt (RDF/XML)
104Using PURLs
- PURLs are persistent URLs (under the purl.org
domain) - used by DCMI, RSS and others to provide
persistent term URI references - exampleshttp//purl.org/dc/elements/1.1/titlelt
dctitlegt (XML, RDF/XML)DC.title
(XHTML)http//purl.org/rss/1.0/linkltrsslinkgt
(XML, RDF/XML)
105Using xmlns.com
- domain registered explicitly for use for XML
namespaces - but persistence policy a little unclear
- used for FOAF terms
- examplehttp//xmlns.cm/foaf/0.1/firstNameltfoaf
firstNamegt (RDF/XML)
106Using info URIs
- info URIs specifically designed for identifying
vocabulary terms - but not a registered scheme yet and there is
currently some discussion (i.e. argument!) on
various lists about whether they are a good idea - exampleinfoddc/22/eng//004.678
107What have we learned?
- an abstract model for DC
- encoding DC in XHTML
- encoding DC in XML
- encoding DC in RDF/XML
- two practical examples
- OAI Protocol forMetadata Harvesting
- RSS
- how to assign identifiersto new metadata terms
108Questions?