Title: Dublin Core, XML and RDF: Tools for Resource Discovery and Information Exchange
1Dublin Core, XML and RDF Tools for Resource
Discovery and Information Exchange
T.B. RajashekarNational Centre for Science
InformationIndian Institute of ScienceBangalore
560 012(raja_at_ncsi.iisc.ernet.in)
Prepared for presentation in the
UNESCO-NISSAT-University of Mysore Workshop On
Creation and Management of Digital Resources,
June 18 - 22, 2001
2The Problem
- Rapidly growing Internet-based content
- 21 million websites (Sept2000)
- 93 million Internet hosts (Dec1999)
- Over 1 billion static web pages (Nov1999)
- Problem of resource discovery
- Uncertain quality
- Poor quality searches information overload
- Problem of data exchange and interoperability
3Key Reasons
- Lack of metadata
- No way to ascertain quality and relevance of
documents, before accessing them - No semantics for effective searching
- Limitations of HTML
- Mixes content with layout
- Does not identify structural elements Data
extraction and interchange not possible - Limited tags not extensible, not self-defining
4Dublin Core Metadata Initiative (DCMI)
- International standard for describing electronic
information resources - Consists of 15 elements, each repeatable, none
mandatory - Conceived in 1994
- Has reached standard status W3C, NISO, ISO
- Widely used in several projects around the world
- Being refined further
5What is Metadata?
- Data about other data
- Webspeak for what librarians have been doing much
before the Internet surrogates, catalogs - Commonly refers to descriptive information about
Web resources - A metadata record consists of a set of
attributes, or elements, necessary to describe
the resource in question
6What Does Metadata Describe?
- papers, articles
- information pages
- images
- sound
- collections
- user profiles
- Spatial data
...Digital and physical manifestations
7Forms of Metadata
- Descriptive
- Structural
- Administrative
- Rights management
- Security and authentication
- Content rating, etc.
8The Dublin CoreMetadata Element Set
- Title
- Author/Creator
- Subject /Keywords
- Description
- Publisher
- Other Contributor
- Date
- Resource Type
- Format
- Resource Identifier
- Source
- Language
- Relation
- Coverage
- Rights Management
9Key Features of DC
- Simplicity of creation and maintenance
- Small and simple element set
- Non-specialists can create metadata records
- Enable effective search and retrieval
- Commonly understood semantics
- Generic, common element set facilitates
cross-domain accessibility (e.g. creator -
document, music) - International scope
- DC element set in several languages
- Extensibility
- Linkages with other metadata sets
10Uses of DC
- Used mainly for describing document-like objects
metadata standards for other domains exist
(e.g. e-commerce, education) - DC record can be embedded in the resource itself
(e.g. Meta tag of HTML) - DC elements may be contained in a record separate
from the resource - Database of DC element records, each describing a
separate electronic resource (e.g. subject
gateways)
11DC in HTML
- lthtmlgtltheadgt
- lttitlegtUKOLN Home Pagelt/titlegt
- ltmeta name"DC.Title content"UKOLN UK Office
for Library and Information Networking"gt - ltmeta name"DC.Subject" content"national centre,
network information support, library community,
awareness, research, information services, public
library networking, bibliographic management,
distributed library systems, metadata, resource
discovery, conferences, lectures, workshops"gt - ltmeta name"DC.Description" content"UKOLN is a
national centre for support in network
information management in the library and
information communities. It provides awareness,
research and information services"gt - ltmeta name"DC.Creator" contentUKOLN
Information Services Group"gt - lt/headgt
- ...
12DC Projects
- Implemented in over 100 projects in several
countries - Government
- Science, Mathematics, Education, Humanities
- Libraries and DLs (e.g. CORC Cooperative Online
Resource Catalogue, of OCLC) - Intranets
13DC Further Developments
- Initial focus was on defining the element set for
resource description - Work now in progress for defining qualifiers for
elements (e.g. Creator type) and values
(encoding rules e.g. ISO standard for Date) - Vocabularies for rendering content
- Tools for generating, editing and processing DC
- Crosswalks
14Syntax for DC
- DC does not specify the syntax for representing
DC elements - Implementation dependent
- HTML use of Meta tag
- Preferred format for syntax XML
- DC also needs a framework and data model for
ensuring semantic interoperability across
different metadata sets - RDF provides such a framework
15XML Extensible Markup Language
- Specifically designed for the web as a data
interchange format - Simplified subset of Standard Generalized Markup
Language (SGML) - Overcomes limitations of HTML with several
additional advantages - Formally ratified as a W3C standard in February
1998 - Specification provides a set of grammar and
syntax rules for describing the structure of data
16XML Extensible Markup Language
- ASCII of the Web OS independent create once
and use in different ways - Quality searching
- Can search for field-based content, not just any
content - computer (name/model) price lt 700
- Enables data interchange and sharing between
applications - Limitation of XML Does not convey meaning of
structural elements applications have to
achieve this also through common XML
vocabularies (e.g. DC)
17SGML, HTML and XML
- XML is a compromise between the non-extensible,
limited capabilities of HTML and the full power
and complexity of SGML - Claimed to be better than SGML
- 80 of capabilities, 20 of complexity
- Small enough to be supported by browser vendors
- Explicitly includes hyper linking
- Supports style sheets
- Better than HTML
- Extensible (can create your own tags)
- Tags identify content e.g. item and its price
18XML Document Structure
- All XML documents start with the XML declaration
lt?xml version"1.0"?gt - DTD (Document Type Definition) (part of document
type declaration in prolog), provides the
definition for the XML documents, and defines
the document hierarchy, elements, tags and
syntactic rules for the document structure - DTD can accompany XML documents or reside outside
the documents - Lets take an example and see the two
possibilities, using IE browser (version 5.0 or
above) (adbook.xml)
19XML Document With External DTD
lt?xml version"1.0"?gtlt!DOCTYPE addressbook
SYSTEM "adbook.dtd"gtltaddressbookgt ltperson
id"B.WALLACE" gender"male"gt ltnamegt
ltfamilygtWallacelt/familygt
ltgivengtBoblt/givengt lt/namegt
ltemailgtbwallace_at_megacorp.comlt/emailgt
ltlink manager"C.TUTTLE"/gt lt/persongt
ltperson id"C.TUTTLE" gender"female"gt
ltnamegtltfamilygtTuttlelt/familygtltgivengtClairelt/givengt
lt/namegt ltemailgtctuttle_at_megacorp.comlt/email
gt ltlink subordinates"B.WALLACE"/gt
lt/persongtlt/addressbookgt
XML Declaration
Body
20DTD Stored in adbook.dtd File
lt!-- DTD for a simple address book --gtlt!ELEMENT
addressbook (person)gtlt!ELEMENT person
(name,email,link?)gtlt!ATTLIST person id ID
REQUIREDgtlt!ATTLIST person gender (malefemale)
IMPLIEDgtlt!ELEMENT name (family,given)gtlt!ELEMENT
family (PCDATA)gtlt!ELEMENT given
(PCDATA)gtlt!ELEMENT email (PCDATA)gtlt!ELEMENT
link EMPTYgtlt!ATTLIST link manager IDREF IMPLIED
subordinates IDREFS IMPLIEDgt
21XML Document With Accompanying DTD
lt?xml version"1.0"?gtlt!DOCTYPE addressbook
lt!ELEMENT addressbook (person)gtlt!ELEMENT
person (name,email,link?)gtlt!ATTLIST person id
ID REQUIREDgtlt!ATTLIST person gender
(malefemale) IMPLIEDgtlt!ELEMENT name
(family,given)gtlt!ELEMENT family
(PCDATA)gtlt!ELEMENT given (PCDATA)gtlt!ELEMENT
email (PCDATA)gtlt!ELEMENT link EMPTYgtlt!ATTLIST
link manager IDREF IMPLIED subordinates IDREFS
IMPLIEDgtgt
XML Declaration
DTD
22XML Document With Accompanying DTD
ltaddressbookgt ltperson id"B.WALLACE"
gender"male"gt ltnamegt
ltfamilygtWallacelt/familygt
ltgivengtBoblt/givengt lt/namegt
ltemailgtbwallace_at_megacorp.comlt/emailgt
ltlink manager"C.TUTTLE"/gt lt/persongt
ltperson id"C.TUTTLE" gender"female"gt
ltnamegtltfamilygtTuttlelt/familygtltgivengtClairelt/givengt
lt/namegt ltemailgtctuttle_at_megacorp.comlt/email
gt ltlink subordinates"B.WALLACE"/gt
lt/persongtlt/addressbookgt
Body
23Do We Need a DTD?
- DTD is not mandatory for XML documents
- Lets try this out (use adbook.xml)
- Why do we need DTD then?
- DTD provides the Definition for a specific
class of documents (e.g. reports, theses) -
document hierarchy, elements, tags and syntactic
rules for the document structure - Essential for information exchange between
applications and services - Forms the basis for validating the correctness of
XML documents - Lets see another example IOP DTD and XML files.
24Do We Need a DTD?
25Inside an XML Document (File)
- An XML document file contains one and only one
document root element - Root element contains one or more documents
- Document content is marked up using user defined
elements (tags) and element attributes - Document content may also contain entity
references (replacement text, external
references, etc.). - Document content may also contain comments and
processing instructions
26Inside XML DTD
- DTD defines the document hierarchy in terms of
the elements, elements themselves and their
syntactic structure, entities and rules for their
usage - This is used for validating XML documents
Examplelt!--DTD for Books--gtlt!ENTITY cright
"169"gtlt!ELEMENT books (book)gtlt!ELEMENT book
(title, isbn, authors, description?,price)gtlt!ELEM
ENT title (PCDATA)gtlt!ATTLIST title lang
(englishfrench) REQUIREDgtlt!ELEMENT isbn
(PCDATA)gtlt!ELEMENT authors (PCDATA)gtlt!ELEMENT
description (PCDATA)gtlt!ELEMENT price
(PCDATA)gtlt!ATTLIST price curr (RsDollar)
IMPLIEDgt
27Well Formed and Valid XML Document
- Well formed XML documents must be syntactically
correct. What does this mean? - There is only one Root element and it contains
the documents content - Match start-tags with end-tags (except for empty
element tag) - Nested elements never overlap
- Attributes are unique and values are in quotes
- Only entity references permitted are amp for
, lt for lt, gt for gt, apos for , and
quot for .
28Well Formed and Valid XML Document
- Valid XML documents
- Valid XML documents are well-formed XML documents
which include an XML declaration and document
type declaration - Valid documents must also adhere to the DTD
Lets look at an example using sample Medline
XML document
29Rendering of XML
- How can we view XML documents on the Web?
- XML is about data, not presentation
presentation has to be handled separately - This can be handled using CSS (Cascading Style
Sheets) and XSLT (Extensible Stylesheet Language
Transformations) - CSS and XSLT can be used for rendering XML
documents on the Web - IE 5.0 supports viewing XML document trees and
HTML rendering using CSS - Lets see a CSS example (syllabus2)
30Creating and Processing XML Documents
- XML documents can be created using any text
editor - XML editors and browsers are also available (e.g.
Softquads Xmetal, XMLSPY) - Complete publishing solutions are also available
(e.g. Cocoon in Java) - Computers (applications) can exchange XML data
and process these using DTD for extraction and
validation - API specifications are also available for
simplifying XML document processing (e.g. W3Cs
DOM specification) - Most RDBMS packages have begun to support import/
export of SQL database content in XML format
31XML Applications and Vocabularies
- XML has found rapid use in a large number of
domains - Domain-specific vocabularies (DTDs and processing
tools, techniques, practices) have been developed - Mathematics MathML
- Chemistry CML
- Instruments Markup Language IML
- BioInformatics Sequence Markup Language (BSML)
- E-Books Specs developed by Open eBook forum
- Medicine Health Level 7
- Business/e-commerce eXML, BizTalk
- Mobile communications - WML
32XML and Bibliographic Information
- XML appears tailor made for documentary
information (bibliographic or full text) since
document content is already structured XSLT can
be used for viewing, transforming and presenting
the content in different ways - Several bibliographic databases are now available
in XML (e.g. Medline, MARC) - Journal publishers are also embracing XML (e.g.
IOP) - Digital publishing Books and other full-text
documents mark-up once and view differently - e.g. Tobacco war book in escholarship website
in the California Digital Library - Interoperability among digital libraries OAI
initiative in using MARC XML for interoperability
among DLs
33RDF Resource Description Framework
- RDF Resource Description Framework
- W3C Recommendation, Feb 1999
- Data Model
- Designed to impose structural constraint on
syntax to support consistent encoding, exchange
and processing of metadata - Schema
- Enables resource description communities to
define (and share) vocabularies (museum, library,
e-commerce)
34RDF
- Grew out of work on PICS (Platform for Internet
Content Selection) used for content rating - Empowers effective creation, exchange and use of
metadata on the World Wide Web - Unlike DC, RDF does not specify semantics
- Defines a coherent structure (for the expression
of semantics) and recommends a powerful transport
syntax in the form of XML - RDF, DC, XML thus complement each other
35RDF Basics
- Basic premise that an identifiable, addressable
'resource' may be described by means of a
selection of 'properties' (size, name, maker,
etc.), each of which has an associated 'value'
(23 x 45 cm, Moonlight over Columbus, Fred
Bloggs, etc.) - XML provides the default syntax for encoding RDF
descriptions
36RDF Basics
- RDF descriptions are often expressed in
diagrammatic form for clarity
A simple RDF assertion
37RDF Basics
- A 'resource', http//doc, has a 'property',
author, describing some aspect of it. The value
of the 'property' is Joe Smith. In other words,
Joe Smith is the author of http//doc. - Resources are always represented as ovals.
Properties are always represented by arrows which
point from the subject of any statement to the
object of the statement (Joe Smith is the author
of the resource. The resource is not the author
of Joe Smith).
38RDF Basics
- Presented in XML, the diagram above would be
coded as follows - lt?xml version"1.0"?gt ltrdfRDF
xmlnsrdf"http//www.w3.org/1999/02/22-rdf-syntax
-ns"gt ltrdfDescription rdfabout"http//d
oc"gt ltauthorgt Joe Smith lt/authorgt
lt/rdfDescriptiongt lt/rdfRDFgt
39XML Namespaces
- There are several resource description
communities like DC - How do we enable these communities to share and
extend the metadata vocabularies and facilitate
interoperability? - XML and RDF offer the interoperable foundation
whereby a single description may comprise
elements drawn from any number of accessible
recording practices.
40XML Namespaces
- It therefore becomes possible, for example, for
all communities to share the notion of
'authorship' and use the same element for
describing this notion, whilst one community then
makes use of its own subject classification
system and another embeds complex date-related
information. - This interoperable diversity is achieved through
XML Namespaces
41XML Namespaces
- An XML Namespace may be seen as providing human
and computer-parseable context for any resource
description element. - The term, Creator, for example, means a variety
of things to different communities. - If it were clear, though, that this were the
Dublin Core element, Creator, then the semantics
of the element and its contents become somewhat
clearer.
42XML Namespaces
- To uniquely identify elements drawn from
different name spaces, they are prefixed with
namespace abbreviation (e.g. dc) - Further, RDF descriptions begin with a
declaration of the namespaces to be used, the
location at which they may be resolved by
appropriate software, and the abbreviated form by
which they may be recognised - Example
ltrdfRDF xmlnsrdf"http//www.w3.org/1999
/02/22-rdf-syntax-ns"
xmlnsdc"http//purl.org/dc/elements/1.0/"gt
43A Simple DC Record in RDF/XML
lt?xml version"1.0"?gt ltrdfRDF
xmlnsrdf"http//www.w3.org/1999/02/22-rdf-syntax
-ns" xmlnsdc"http//purl.org/dc/element
s/1.0/"gt ltrdfDescription rdfabout"http//d
oc"gt ltdccreatorgt Joe Smith
lt/dccreatorgt lt/rdfDescriptiongt
lt/rdfRDFgt
44A Complete Example
lt?xml version"1.0"?gt ltrdfRDF
xmlnsrdfhttp//www.w3.org/1999/02/22-rdf-syntax-
ns xmlnsdc"http//purl.org/dc/elements/1
.0/"gt ltrdfDescription rdfabout"http//www.
ukoln.ac.uk/metadata/resources/dc/datamodel/
WD-dc-rdf/"gt ltdctitlegt Guidance on
expressing the Dublin Core within the Resource
Description Framework (RDF) lt/dctitlegt
ltdccreatorgt Eric Miller lt/dccreatorgt
ltdccreatorgt Paul Miller lt/dccreatorgt
ltdccreatorgt Dan Brickley lt/dccreatorgt
ltdcsubjectgt Dublin Core Resource
Description Framework RDF eXtensible Markup
Language XML lt/dcsubjectgt
ltdcpublishergt Dublin Core Metadata Initiative
lt/dcpublishergt ltdccontributorgt Dublin
Core Data Model Working Group lt/dccontributorgt
ltdcdategt 1999-07-01 lt/dcdategt
ltdcformatgt text/html lt/dcformatgt ltdclanguagegt
en lt/dclanguagegt lt/rdfDescriptiongt
lt/rdfRDFgt
45Another Example
ltrdfRDF xmlnsrdf"http//www.w3.org/1999
/02/22-rdf-syntax-ns" xmlnsdc"http//pu
rl.org/dc/elements/1.1/"gt ltrdfDescription
rdfabout"http//media.example.com/audio/guide.ra
"gt ltdccreatorgtRose Bushlt/dccreatorgt
ltdctitlegtA Guide to Growing
Roseslt/dctitlegt ltdcdescriptiongtDescri
bes process for planting and nurturing different
kinds of rose bushes.lt/dcdescriptiongt
ltdcdategt2001-01-20lt/dcdategt
lt/rdfDescriptiongt lt/rdfRDFgt
46Related Resources
- Dublin Core
- dublincore.org
- purl.oclc.org/metadata/
- XML
- www.oasis-open.org/cover/ (Robin Covers SGML/XML
page) - www.ucc.ie/xml/ (XML FAQ and links to other
topics related to XML) - XML4Lib A discussion forum related to XML use on
libraries (sunsite.berkeley.edu/XML4Lib/) - RDF
- www.w3.org/RDF/
47Conclusion
- DC, RDF and XML together provide a framework for
meaningful resource discovery and data exchange
on the Internet - Dc provides the means for describing discovery in
an interdisciplinary context - XML/RDF provides the structure for unambiguous
expression of this Dublin Core information