Dublin Core, XML and RDF: Tools for Resource Discovery and Information Exchange - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Dublin Core, XML and RDF: Tools for Resource Discovery and Information Exchange

Description:

Prepared for presentation in the UNESCO-NISSAT-University of Mysore Workshop On ... has an associated 'value' (23 x 45 cm, Moonlight over Columbus, Fred Bloggs, etc. ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 48
Provided by: tbraja6
Category:

less

Transcript and Presenter's Notes

Title: Dublin Core, XML and RDF: Tools for Resource Discovery and Information Exchange


1
Dublin Core, XML and RDF Tools for Resource
Discovery and Information Exchange
T.B. RajashekarNational Centre for Science
InformationIndian Institute of ScienceBangalore
560 012(raja_at_ncsi.iisc.ernet.in)
Prepared for presentation in the
UNESCO-NISSAT-University of Mysore Workshop On
Creation and Management of Digital Resources,
June 18 - 22, 2001
2
The Problem
  • Rapidly growing Internet-based content
  • 21 million websites (Sept2000)
  • 93 million Internet hosts (Dec1999)
  • Over 1 billion static web pages (Nov1999)
  • Problem of resource discovery
  • Uncertain quality
  • Poor quality searches information overload
  • Problem of data exchange and interoperability

3
Key Reasons
  • Lack of metadata
  • No way to ascertain quality and relevance of
    documents, before accessing them
  • No semantics for effective searching
  • Limitations of HTML
  • Mixes content with layout
  • Does not identify structural elements Data
    extraction and interchange not possible
  • Limited tags not extensible, not self-defining

4
Dublin Core Metadata Initiative (DCMI)
  • International standard for describing electronic
    information resources
  • Consists of 15 elements, each repeatable, none
    mandatory
  • Conceived in 1994
  • Has reached standard status W3C, NISO, ISO
  • Widely used in several projects around the world
  • Being refined further

5
What is Metadata?
  • Data about other data
  • Webspeak for what librarians have been doing much
    before the Internet surrogates, catalogs
  • Commonly refers to descriptive information about
    Web resources
  • A metadata record consists of a set of
    attributes, or elements, necessary to describe
    the resource in question

6
What Does Metadata Describe?
  • papers, articles
  • information pages
  • images
  • sound
  • collections
  • user profiles
  • Spatial data

...Digital and physical manifestations
7
Forms of Metadata
  • Descriptive
  • Structural
  • Administrative
  • Rights management
  • Security and authentication
  • Content rating, etc.

8
The Dublin CoreMetadata Element Set
  • Title
  • Author/Creator
  • Subject /Keywords
  • Description
  • Publisher
  • Other Contributor
  • Date
  • Resource Type
  • Format
  • Resource Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights Management

9
Key Features of DC
  • Simplicity of creation and maintenance
  • Small and simple element set
  • Non-specialists can create metadata records
  • Enable effective search and retrieval
  • Commonly understood semantics
  • Generic, common element set facilitates
    cross-domain accessibility (e.g. creator -
    document, music)
  • International scope
  • DC element set in several languages
  • Extensibility
  • Linkages with other metadata sets

10
Uses of DC
  • Used mainly for describing document-like objects
    metadata standards for other domains exist
    (e.g. e-commerce, education)
  • DC record can be embedded in the resource itself
    (e.g. Meta tag of HTML)
  • DC elements may be contained in a record separate
    from the resource
  • Database of DC element records, each describing a
    separate electronic resource (e.g. subject
    gateways)

11
DC in HTML
  • lthtmlgtltheadgt
  • lttitlegtUKOLN Home Pagelt/titlegt
  • ltmeta name"DC.Title content"UKOLN UK Office
    for Library and Information Networking"gt
  • ltmeta name"DC.Subject" content"national centre,
    network information support, library community,
    awareness, research, information services, public
    library networking, bibliographic management,
    distributed library systems, metadata, resource
    discovery, conferences, lectures, workshops"gt
  • ltmeta name"DC.Description" content"UKOLN is a
    national centre for support in network
    information management in the library and
    information communities. It provides awareness,
    research and information services"gt
  • ltmeta name"DC.Creator" contentUKOLN
    Information Services Group"gt
  • lt/headgt
  • ...

12
DC Projects
  • Implemented in over 100 projects in several
    countries
  • Government
  • Science, Mathematics, Education, Humanities
  • Libraries and DLs (e.g. CORC Cooperative Online
    Resource Catalogue, of OCLC)
  • Intranets

13
DC Further Developments
  • Initial focus was on defining the element set for
    resource description
  • Work now in progress for defining qualifiers for
    elements (e.g. Creator type) and values
    (encoding rules e.g. ISO standard for Date)
  • Vocabularies for rendering content
  • Tools for generating, editing and processing DC
  • Crosswalks

14
Syntax for DC
  • DC does not specify the syntax for representing
    DC elements
  • Implementation dependent
  • HTML use of Meta tag
  • Preferred format for syntax XML
  • DC also needs a framework and data model for
    ensuring semantic interoperability across
    different metadata sets
  • RDF provides such a framework

15
XML Extensible Markup Language
  • Specifically designed for the web as a data
    interchange format
  • Simplified subset of Standard Generalized Markup
    Language (SGML)
  • Overcomes limitations of HTML with several
    additional advantages
  • Formally ratified as a W3C standard in February
    1998
  • Specification provides a set of grammar and
    syntax rules for describing the structure of data

16
XML Extensible Markup Language
  • ASCII of the Web OS independent create once
    and use in different ways
  • Quality searching
  • Can search for field-based content, not just any
    content
  • computer (name/model) price lt 700
  • Enables data interchange and sharing between
    applications
  • Limitation of XML Does not convey meaning of
    structural elements applications have to
    achieve this also through common XML
    vocabularies (e.g. DC)

17
SGML, HTML and XML
  • XML is a compromise between the non-extensible,
    limited capabilities of HTML and the full power
    and complexity of SGML
  • Claimed to be better than SGML
  • 80 of capabilities, 20 of complexity
  • Small enough to be supported by browser vendors
  • Explicitly includes hyper linking
  • Supports style sheets
  • Better than HTML
  • Extensible (can create your own tags)
  • Tags identify content e.g. item and its price

18
XML Document Structure
  • All XML documents start with the XML declaration
    lt?xml version"1.0"?gt
  • DTD (Document Type Definition) (part of document
    type declaration in prolog), provides the
    definition for the XML documents, and defines
    the document hierarchy, elements, tags and
    syntactic rules for the document structure
  • DTD can accompany XML documents or reside outside
    the documents
  • Lets take an example and see the two
    possibilities, using IE browser (version 5.0 or
    above) (adbook.xml)

19
XML Document With External DTD
lt?xml version"1.0"?gtlt!DOCTYPE addressbook
SYSTEM "adbook.dtd"gtltaddressbookgt ltperson
id"B.WALLACE" gender"male"gt ltnamegt
ltfamilygtWallacelt/familygt
ltgivengtBoblt/givengt lt/namegt
ltemailgtbwallace_at_megacorp.comlt/emailgt
ltlink manager"C.TUTTLE"/gt lt/persongt
ltperson id"C.TUTTLE" gender"female"gt
ltnamegtltfamilygtTuttlelt/familygtltgivengtClairelt/givengt
lt/namegt ltemailgtctuttle_at_megacorp.comlt/email
gt ltlink subordinates"B.WALLACE"/gt
lt/persongtlt/addressbookgt
XML Declaration
Body
20
DTD Stored in adbook.dtd File
lt!-- DTD for a simple address book --gtlt!ELEMENT
addressbook (person)gtlt!ELEMENT person
(name,email,link?)gtlt!ATTLIST person id ID
REQUIREDgtlt!ATTLIST person gender (malefemale)
IMPLIEDgtlt!ELEMENT name (family,given)gtlt!ELEMENT
family (PCDATA)gtlt!ELEMENT given
(PCDATA)gtlt!ELEMENT email (PCDATA)gtlt!ELEMENT
link EMPTYgtlt!ATTLIST link manager IDREF IMPLIED
subordinates IDREFS IMPLIEDgt
21
XML Document With Accompanying DTD
lt?xml version"1.0"?gtlt!DOCTYPE addressbook
lt!ELEMENT addressbook (person)gtlt!ELEMENT
person (name,email,link?)gtlt!ATTLIST person id
ID REQUIREDgtlt!ATTLIST person gender
(malefemale) IMPLIEDgtlt!ELEMENT name
(family,given)gtlt!ELEMENT family
(PCDATA)gtlt!ELEMENT given (PCDATA)gtlt!ELEMENT
email (PCDATA)gtlt!ELEMENT link EMPTYgtlt!ATTLIST
link manager IDREF IMPLIED subordinates IDREFS
IMPLIEDgtgt
XML Declaration
DTD
22
XML Document With Accompanying DTD
ltaddressbookgt ltperson id"B.WALLACE"
gender"male"gt ltnamegt
ltfamilygtWallacelt/familygt
ltgivengtBoblt/givengt lt/namegt
ltemailgtbwallace_at_megacorp.comlt/emailgt
ltlink manager"C.TUTTLE"/gt lt/persongt
ltperson id"C.TUTTLE" gender"female"gt
ltnamegtltfamilygtTuttlelt/familygtltgivengtClairelt/givengt
lt/namegt ltemailgtctuttle_at_megacorp.comlt/email
gt ltlink subordinates"B.WALLACE"/gt
lt/persongtlt/addressbookgt
Body
23
Do We Need a DTD?
  • DTD is not mandatory for XML documents
  • Lets try this out (use adbook.xml)
  • Why do we need DTD then?
  • DTD provides the Definition for a specific
    class of documents (e.g. reports, theses) -
    document hierarchy, elements, tags and syntactic
    rules for the document structure
  • Essential for information exchange between
    applications and services
  • Forms the basis for validating the correctness of
    XML documents
  • Lets see another example IOP DTD and XML files.

24
Do We Need a DTD?
25
Inside an XML Document (File)
  • An XML document file contains one and only one
    document root element
  • Root element contains one or more documents
  • Document content is marked up using user defined
    elements (tags) and element attributes
  • Document content may also contain entity
    references (replacement text, external
    references, etc.).
  • Document content may also contain comments and
    processing instructions

26
Inside XML DTD
  • DTD defines the document hierarchy in terms of
    the elements, elements themselves and their
    syntactic structure, entities and rules for their
    usage
  • This is used for validating XML documents
    Examplelt!--DTD for Books--gtlt!ENTITY cright
    "169"gtlt!ELEMENT books (book)gtlt!ELEMENT book
    (title, isbn, authors, description?,price)gtlt!ELEM
    ENT title (PCDATA)gtlt!ATTLIST title lang
    (englishfrench) REQUIREDgtlt!ELEMENT isbn
    (PCDATA)gtlt!ELEMENT authors (PCDATA)gtlt!ELEMENT
    description (PCDATA)gtlt!ELEMENT price
    (PCDATA)gtlt!ATTLIST price curr (RsDollar)
    IMPLIEDgt

27
Well Formed and Valid XML Document
  • Well formed XML documents must be syntactically
    correct. What does this mean?
  • There is only one Root element and it contains
    the documents content
  • Match start-tags with end-tags (except for empty
    element tag)
  • Nested elements never overlap
  • Attributes are unique and values are in quotes
  • Only entity references permitted are amp for
    , lt for lt, gt for gt, apos for , and
    quot for .

28
Well Formed and Valid XML Document
  • Valid XML documents
  • Valid XML documents are well-formed XML documents
    which include an XML declaration and document
    type declaration
  • Valid documents must also adhere to the DTD

Lets look at an example using sample Medline
XML document
29
Rendering of XML
  • How can we view XML documents on the Web?
  • XML is about data, not presentation
    presentation has to be handled separately
  • This can be handled using CSS (Cascading Style
    Sheets) and XSLT (Extensible Stylesheet Language
    Transformations)
  • CSS and XSLT can be used for rendering XML
    documents on the Web
  • IE 5.0 supports viewing XML document trees and
    HTML rendering using CSS
  • Lets see a CSS example (syllabus2)

30
Creating and Processing XML Documents
  • XML documents can be created using any text
    editor
  • XML editors and browsers are also available (e.g.
    Softquads Xmetal, XMLSPY)
  • Complete publishing solutions are also available
    (e.g. Cocoon in Java)
  • Computers (applications) can exchange XML data
    and process these using DTD for extraction and
    validation
  • API specifications are also available for
    simplifying XML document processing (e.g. W3Cs
    DOM specification)
  • Most RDBMS packages have begun to support import/
    export of SQL database content in XML format

31
XML Applications and Vocabularies
  • XML has found rapid use in a large number of
    domains
  • Domain-specific vocabularies (DTDs and processing
    tools, techniques, practices) have been developed
  • Mathematics MathML
  • Chemistry CML
  • Instruments Markup Language IML
  • BioInformatics Sequence Markup Language (BSML)
  • E-Books Specs developed by Open eBook forum
  • Medicine Health Level 7
  • Business/e-commerce eXML, BizTalk
  • Mobile communications - WML

32
XML and Bibliographic Information
  • XML appears tailor made for documentary
    information (bibliographic or full text) since
    document content is already structured XSLT can
    be used for viewing, transforming and presenting
    the content in different ways
  • Several bibliographic databases are now available
    in XML (e.g. Medline, MARC)
  • Journal publishers are also embracing XML (e.g.
    IOP)
  • Digital publishing Books and other full-text
    documents mark-up once and view differently
  • e.g. Tobacco war book in escholarship website
    in the California Digital Library
  • Interoperability among digital libraries OAI
    initiative in using MARC XML for interoperability
    among DLs

33
RDF Resource Description Framework
  • RDF Resource Description Framework
  • W3C Recommendation, Feb 1999
  • Data Model
  • Designed to impose structural constraint on
    syntax to support consistent encoding, exchange
    and processing of metadata
  • Schema
  • Enables resource description communities to
    define (and share) vocabularies (museum, library,
    e-commerce)

34
RDF
  • Grew out of work on PICS (Platform for Internet
    Content Selection) used for content rating
  • Empowers effective creation, exchange and use of
    metadata on the World Wide Web
  • Unlike DC, RDF does not specify semantics
  • Defines a coherent structure (for the expression
    of semantics) and recommends a powerful transport
    syntax in the form of XML
  • RDF, DC, XML thus complement each other

35
RDF Basics
  • Basic premise that an identifiable, addressable
    'resource' may be described by means of a
    selection of 'properties' (size, name, maker,
    etc.), each of which has an associated 'value'
    (23 x 45 cm, Moonlight over Columbus, Fred
    Bloggs, etc.)
  • XML provides the default syntax for encoding RDF
    descriptions

36
RDF Basics
  • RDF descriptions are often expressed in
    diagrammatic form for clarity

A simple RDF assertion
37
RDF Basics
  • A 'resource', http//doc, has a 'property',
    author, describing some aspect of it. The value
    of the 'property' is Joe Smith. In other words,
    Joe Smith is the author of http//doc.
  • Resources are always represented as ovals.
    Properties are always represented by arrows which
    point from the subject of any statement to the
    object of the statement (Joe Smith is the author
    of the resource. The resource is not the author
    of Joe Smith).

38
RDF Basics
  • Presented in XML, the diagram above would be
    coded as follows
  • lt?xml version"1.0"?gt ltrdfRDF
    xmlnsrdf"http//www.w3.org/1999/02/22-rdf-syntax
    -ns"gt ltrdfDescription rdfabout"http//d
    oc"gt ltauthorgt Joe Smith lt/authorgt
    lt/rdfDescriptiongt lt/rdfRDFgt

39
XML Namespaces
  • There are several resource description
    communities like DC
  • How do we enable these communities to share and
    extend the metadata vocabularies and facilitate
    interoperability?
  • XML and RDF offer the interoperable foundation
    whereby a single description may comprise
    elements drawn from any number of accessible
    recording practices.

40
XML Namespaces
  • It therefore becomes possible, for example, for
    all communities to share the notion of
    'authorship' and use the same element for
    describing this notion, whilst one community then
    makes use of its own subject classification
    system and another embeds complex date-related
    information.
  • This interoperable diversity is achieved through
    XML Namespaces

41
XML Namespaces
  • An XML Namespace may be seen as providing human
    and computer-parseable context for any resource
    description element.
  • The term, Creator, for example, means a variety
    of things to different communities.
  • If it were clear, though, that this were the
    Dublin Core element, Creator, then the semantics
    of the element and its contents become somewhat
    clearer.

42
XML Namespaces
  • To uniquely identify elements drawn from
    different name spaces, they are prefixed with
    namespace abbreviation (e.g. dc)
  • Further, RDF descriptions begin with a
    declaration of the namespaces to be used, the
    location at which they may be resolved by
    appropriate software, and the abbreviated form by
    which they may be recognised
  • Example

ltrdfRDF xmlnsrdf"http//www.w3.org/1999
/02/22-rdf-syntax-ns"
xmlnsdc"http//purl.org/dc/elements/1.0/"gt
43
A Simple DC Record in RDF/XML
lt?xml version"1.0"?gt ltrdfRDF
xmlnsrdf"http//www.w3.org/1999/02/22-rdf-syntax
-ns" xmlnsdc"http//purl.org/dc/element
s/1.0/"gt ltrdfDescription rdfabout"http//d
oc"gt ltdccreatorgt Joe Smith
lt/dccreatorgt lt/rdfDescriptiongt
lt/rdfRDFgt
44
A Complete Example
lt?xml version"1.0"?gt ltrdfRDF
xmlnsrdfhttp//www.w3.org/1999/02/22-rdf-syntax-
ns xmlnsdc"http//purl.org/dc/elements/1
.0/"gt ltrdfDescription rdfabout"http//www.
ukoln.ac.uk/metadata/resources/dc/datamodel/
WD-dc-rdf/"gt ltdctitlegt Guidance on
expressing the Dublin Core within the Resource
Description Framework (RDF) lt/dctitlegt
ltdccreatorgt Eric Miller lt/dccreatorgt
ltdccreatorgt Paul Miller lt/dccreatorgt
ltdccreatorgt Dan Brickley lt/dccreatorgt
ltdcsubjectgt Dublin Core Resource
Description Framework RDF eXtensible Markup
Language XML lt/dcsubjectgt
ltdcpublishergt Dublin Core Metadata Initiative
lt/dcpublishergt ltdccontributorgt Dublin
Core Data Model Working Group lt/dccontributorgt
ltdcdategt 1999-07-01 lt/dcdategt
ltdcformatgt text/html lt/dcformatgt ltdclanguagegt
en lt/dclanguagegt lt/rdfDescriptiongt
lt/rdfRDFgt
45
Another Example
ltrdfRDF xmlnsrdf"http//www.w3.org/1999
/02/22-rdf-syntax-ns" xmlnsdc"http//pu
rl.org/dc/elements/1.1/"gt ltrdfDescription
rdfabout"http//media.example.com/audio/guide.ra
"gt ltdccreatorgtRose Bushlt/dccreatorgt
ltdctitlegtA Guide to Growing
Roseslt/dctitlegt ltdcdescriptiongtDescri
bes process for planting and nurturing different
kinds of rose bushes.lt/dcdescriptiongt
ltdcdategt2001-01-20lt/dcdategt
lt/rdfDescriptiongt lt/rdfRDFgt
46
Related Resources
  • Dublin Core
  • dublincore.org
  • purl.oclc.org/metadata/
  • XML
  • www.oasis-open.org/cover/ (Robin Covers SGML/XML
    page)
  • www.ucc.ie/xml/ (XML FAQ and links to other
    topics related to XML)
  • XML4Lib A discussion forum related to XML use on
    libraries (sunsite.berkeley.edu/XML4Lib/)
  • RDF
  • www.w3.org/RDF/

47
Conclusion
  • DC, RDF and XML together provide a framework for
    meaningful resource discovery and data exchange
    on the Internet
  • Dc provides the means for describing discovery in
    an interdisciplinary context
  • XML/RDF provides the structure for unambiguous
    expression of this Dublin Core information
Write a Comment
User Comments (0)
About PowerShow.com