Metadata - PowerPoint PPT Presentation

About This Presentation
Title:

Metadata

Description:

Lycos. 6. Where is metadata? Embedded within resource. HTML META tags. Linked to resource ... Developed originally for white pages' applications ... – PowerPoint PPT presentation

Number of Views:7100
Avg rating:3.0/5.0
Slides: 44
Provided by: andyp74
Category:
Tags: lycos | metadata | pages | white

less

Transcript and Presenter's Notes

Title: Metadata


1
Metadata
  • Andy Powell
  • Technical Development and Research
  • UKOLN
  • University of Bath
  • http//www.ukoln.ac.uk/
  • a.powell_at_ukoln.ac.uk

2
Metadata
  • What is metadata?
  • an introduction
  • The Dublin Core
  • metadata for the Web
  • Metadata management
  • Models for dealing with Web-site metadata
  • UKOLN metadata projects
  • overviews (and problems)

3
What is metadata?
  • by definition
  • ..data about data..
  • ..data which provides information
  • about a resource..
  • by example
  • title, author, subject classification, shelf mark
  • digital format, terms and conditions, location
    (URL)

4
What is metadata? (2)
  • by usage
  • Resource discovery
  • Searching, location
  • Authentication
  • Quality/rating
  • Semantic interoperability
  • Resource management
  • User interface
  • Grouping resources for printing
  • 3-D visualisations

5
Range of formats
Simple
Rich
Dublin Core IAFA SOIF
MARC TEI headers CIMI
Alta Vista NetFirst Lycos
robot generated
hand crafted
6
Where is metadata?
  • Embedded within resource
  • HTML ltMETAgt tags
  • Linked to resource
  • Remote database
  • distributed
  • union (centralised)

7
Who creates metadata?
  • Publisher side
  • author
  • webmaster
  • institution
  • Service side
  • search service
  • third party creators

robot generated
hand crafted
8
Dublin Core
  • 15 element core metadata set
  • Primarily intended to aid resource discovery on
    the Web
  • Main usage currently embedded into HTML META tags
  • All elements optional and repeatable
  • Status?
  • Agreed syntax for embedding in HTML
  • Still discussion about the use of some of the
    elements

http//www.ukoln.ac.uk/metadata/resources/dc.html
9
Dublin Core History
  • 4 DC meetings
  • Dublin, Warwick, Dublin, Canberra
  • (DC-5 - Helsinki coming soon)
  • Mailing list discussions
  • meta2_at_lut.ac.uk
  • W3C interest
  • RDF (PICS-NG), MCF
  • Various projects
  • Still no significant interest yet from the big
    search engines -(

10
DC Elements - 1
  • Title
  • Subject
  • intended to promote use of controlled
    vocabularies but in practice likely to be used
    for uncontrolled list of keywords
  • Description
  • abstract
  • Creator
  • Publisher

11
DC Elements - 2
  • Contributor
  • Date
  • the date the resource was made available in its
    present form. Agreed default format uses subset
    of ISO 8601, e.g. 1997-09-15
  • Type
  • category of resource - document, image, sound,
    home page, novel, poem, etc. Still much
    discussion about the content of this element
  • Format
  • MIME type
  • Identifier

12
DC Elements - 3
  • Source
  • Language
  • language of the resource - NOT the metadata
  • Relation
  • no guidelines for usage currently
  • Coverage
  • separate working party looking at usage
  • Rights
  • rights management seen as too complex for DC.
    This will give a URL to some external information

13
Simple Example
  • ltHTMLgtltHEADgt
  • ltTITLEgtUKOLN Home Pagelt/TITLEgt
  • ltMETA NAME"DC.title CONTENT"UKOLN UK Office
    for Library and Information Networking"gt
  • ltMETA NAME"DC.subject" CONTENT"national centre,
    network information support, library community,
    awareness, research, information services, public
    library networking, bibliographic management,
    distributed library systems, metadata, resource
    discovery, conferences, lectures, workshops"gt
  • ltMETA NAME"DC.description" CONTENT"UKOLN is a
    national centre for support in network
    information management in the library and
    information communities. It provides awareness,
    research and information services"gt
  • ltMETA NAME"DC.creator" CONTENTStark, Isobel"gt
  • lt/HEADgt
  • ...

14
Element qualifiers
  • Need to refine meaning in some cases
  • TYPE
  • Refines meaning of element - sub-divides element
    namespace
  • SCHEME
  • Element value taken from external schema, e.g.
    LCSH for DC.subject, Z39.53 for DC.language
  • LANGUAGE
  • Language of element value (not of the resource
    being described!)

15
Examples - TYPE
  • Original DC.creator tag
  • ltMETA NAME"DC.creator" CONTENTStark, Isobel"gt
  • Non-personal author
  • ltMETA NAME"DC.creator.corporate" CONTENTUKOLN
    Information Services Group"gt
  • Authors email address
  • ltMETA NAME"DC.creator.email CONTENTisg_at_ukoln.a
    c.uk"gt

16
Examples - SCHEME
  • Library of Congress Subject Heading
  • ltMETA NAME"DC.subject" CONTENT(SCHEMELCSH)
    Library information networks -- Great Britain"gt
  • ltMETA NAME"DC.subject" CONTENT"(SCHEMELCSH)
    Information technology -- higher education"gt
  • or
  • ltMETA NAME"DC.subject" SCHEMELCSH
    CONTENTLibrary information networks -- Great
    Britain"gt
  • ltMETA NAME"DC.subject" SCHEMELCSH
    CONTENT"Information technology -- higher
    education"gt

17
Metadata Management
  • Practical issues of using Dublin Core for
    Internet resource description...
  • UKOLN metadata system
  • Requirements
  • 3 models for metadata management
  • Implementation at UKOLN

18
UKOLN metadata system requirements
  • Easy to use
  • Work with a variety of methods of creating HTML
  • Simple migration to future metadata formats
  • Separate metadata from resource

19
Managing Dublin Core (1)HTML Authoring tool
Embed by hand using HTML or text editor
  • Pros
  • Simple
  • May be useful for training and familiarisation
  • Cons
  • May not be possible with all editors
  • Maintenance problems
  • Easy to make errors

20
DC-dot
  • A Web based tool for creating Dublin Core ltmetagt
    tags
  • Automatic generation of some tags based on
    content of the resource
  • Forms based editing of tags
  • Cut-and-paste output into HTML
  • Conversion to other formats
  • SOIF, ROADS/WHOIS, USMARC, GILS...

http//www.ukoln.ac.uk/metadata/dcdot/
21
Managing Dublin Core (2)Web-site management tool
Use Web-site management tool, for example
NetObjects Fusion
  • Pros
  • Use of Web-site management tools likely to
    increase
  • Object-oriented database approach
  • Cons
  • Proprietry formats
  • Early days - too early to evaluate use for
    metadata yet?

22
Managing Dublin Core (3)On the fly generation
Hold Dublin Core separately and embed on-the-fly
using server-side include (SSI)
  • Pros
  • Separates metadata from resource
  • Future migration fairly simple
  • Cons
  • Performance
  • Lack of integration with HTML tools
  • Server specific

23
UKOLN metadata system (1)
  • Embed on-the-fly
  • Apache SSI script
  • Store metadata using SOIF records
  • Use MS-Access as tool to create the records
  • Associate metadata with resource by co-locating
    them in the Web server filestore

24
UKOLN metadata system (2)
intro.html
Apache syntax for calling server-side
script lt!--exec cmd"getmeta" --gt
lthtmlgt ltheadgt lttitlegtlt/titlegt lt!--exec
cmd"getmeta" --gt lt/headgt ...
HTML editor
intro.html.soif
_at_FILE http//www.ukoln.ac. ... keywords13
xxx, yyy, zzz description14 blah blah
b author13 Stark, Isobel ...
MS-Access Database
25
UKOLN metadata system (3)
MS-Access front end... Filename browser Text
boxes Name choosers UKOLN specific metadata
26
UKOLN metadata system (4)
intro.html
Web robot
lthtmlgt ltheadgt lttitlegtlt/titlegt lt!--exec
cmd"getmeta" --gt lt/headgt ...
1
2
UKOLN Web server
6
intro.html.soif
_at_FILE http//www.ukoln.ac. ... keywords13
xxx, yyy, zzz description14 blah blah
b author13 Stark, Isobel ...
3
4
SSI script
5
27
Issues
  • Performance
  • Interaction with Web caches
  • Dublin Core vs Alta Vista style metadata
  • ltMETA NAMEDescription CONTENTblah, blah"gt
  • ltMETA NAME"Keywords CONTENT"xxx, yyy, zzz"gt
  • Granularity
  • Which pages should have metadata?

28
What's the point...
  • of embedding DC ltmetagt tags?
  • Alta Vista isn't going to look for them
  • But, worth doing...
  • within individual projects
  • within specific communities (e.g. eLib)
  • Improve local search facilities
  • e.g. load SOIF records into a Netscape Catalogue
    Server
  • Web-site management benefits

29
UKOLN Metadata projects
  • ROADS
  • Software for Subject Service
  • DESIRE
  • European Web indexing
  • NewsAgent
  • Current awareness service for Library and
    Information Staff
  • BIBLINK
  • Information flow from publishers to National
    Bibliographic Agencies

30
ROADS
  • Resource Organisation and Discovery in
    Subject-based Services
  • Web based tools for Subject Services
  • SOSIG, ADAM, OMNI,
  • Manage and search Internet resource descriptions
  • ROADS templates (based on IAFA templates)
  • WHOIS

http//www.ukoln.ac.uk/roads/
31
ROADS - WHOIS (1)
  • Simple client-server search and retrieve protocol
  • Developed originally for white pages
    applications
  • Offer search facilities across several Subject
    Services
  • Distribute a Subject Service across several
    physical servers
  • Query routing - centroids and CIP

32
ROADS - WHOIS (2)
  • Centroid generated by ADAM contains youll find
    the string mona in the title attribute of at
    least one record in the ADAM database.

SOSIG
2
CGI-based WHOIS client
3
OMNI
CIP sharing of centroids
1
4
6
5
Web browser
ADAM
33
DESIRE
  • European Web cataloguing
  • Subject Services
  • EuroSOSIG (Bristol), EELS (Lund), Arts
    (Koninklijke Bibliotheek)
  • Manually created ROADS templates
  • European Web Index
  • based on Nordic Web Index (NWI)
  • Robot generated, all resources
  • Multiple servers linked with Z39.50
  • GILS

http//www.nic.surfnet.nl/surfnet/projects/desire/
desire.html
34
DESIRE - current work (1)
  • Internationalisation of ROADS
  • Use of robots to
  • aid manual cataloguing of resources
  • build indexes based on list of URLs in a ROADS
    database
  • Robot will use embedded Dublin Core if available

35
DESIRE - current work (2)
  • Re-design of EWI robot - including
  • support for Dublin Core
  • EWI records GILS-II compatible
  • Allow users to search across subject services and
    the EWI using Z39.50
  • by converting ROADS records into GILS records
  • by building a WHOIS to Z39.50 gateway

http//roads.ukoln.ac.uk/cgi-bin/egwcgi/egwirtcl/t
argets.egw
36
NewsAgent
  • Current awareness service for LIS...
  • Distributed database
  • servers at LITC, FD, UKOLN - Z39.50
  • metadata (and some full-text)
  • based on DALI
  • Mixture of content streams
  • Variety of access methods
  • Web, e-mail and Z39.50 clients
  • user-configurable profiles

http//www.ukoln.ac.uk/metadata/NewsAgent/
37
NewsAgent - Content
  • Journals
  • Program, VINE, Journal of Librarianship and
    Information Science
  • News and briefing material
  • LA, IIS, UKOLN (Ariadne), BL, LITC
  • Web pages
  • E-mail lists and USENET news

38
NewsAgent - Harvesting
  • Web crawler
  • looking for embedded Dublin Core
  • Limiting the harvest
  • simple heuristics
  • use of Dublin Core Relation element
  • E-mail parser

http//www.ukoln.ac.uk/metadata/NewsAgent/dcusage.
html
39
BIBLINK
  • Information flow between publishers
  • traditional
  • new - CD-ROM or Web (new to publishing)
  • and National Bibliographic Agencies
  • British Library, UK
  • Biblioteca Nacional, Madrid, Spain
  • Bibliothèque Nationale de France, Paris
  • Koninklijke Bibliotheek, Den Haag, Netherlands
  • Nasjonalbiblioteket, Rana, Norway
  • Universitat Oberta de Catalunya, Barcelona, Spain

http//www.ukoln.ac.uk/metadata/BIBLINK/
40
BIBLINK - research
  • Scope
  • Electronic publications suitable for inclusion in
    National Bibliographies
  • Metadata
  • Dublin Core (with extensions!), SGML DTD
  • Identifiers
  • ISBN, ISSN, SICI, DOI, URN
  • Transmission
  • Simple e-mail or Web crawler
  • Authentication
  • MD5 hash assigned to each resource

41
BIBLINK - data set
  • Minimum data set
  • Author, Title, Publisher, Place of Publication,
    Price, Extent (size), Keywords, Description,
    Edition/Version, Date of Publication, System
    Requirements, Format, Language, Terms and
    Conditions, Frequency, Identifier, Contributor,
    Checksum
  • Similar to DC but some dont fit
  • ltMETA NAMEBIBLINK.placePublication
    CONTENTBath, UKgt
  • ltMETA NAMEBIBLINK.frequency
    CONTENTmonthlygt
  • Issues over conversion to MARC


42
BIBLINK - demonstrator
Publishers
  • Cataloguing in Publication(CIP) level records

Dublin Core
E-mail
NBAs/National Libraries
Dublin Core
  • Enhanced records optionally returned to publishers

UNIMARC
  • Conversion on to local MARC format using USEMARCON

??MARC
43
Conclusions
  • Think about metadata as a process
  • Dublin Core syntax now stable enough to use
  • Use within projects initially
  • Choose metadata management model appropriate to
    your site
  • Consider long term maintenance and transition to
    other formats
Write a Comment
User Comments (0)
About PowerShow.com