Parallel Session on Metadata

About This Presentation

Title:

Parallel Session on Metadata

Description:

Varied or no methods of central co-ordination (2 sites or campuses) ... Harder to co-ordinate, easier to resource? More often out of date? ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 28

Provided by: denn116

more less

Transcript and Presenter's Notes

Title: Parallel Session on Metadata

1
Parallel Session on Metadata

The Value of Metadata and how to Realise it..
Date 18th June 2002
Facilitator Dennis Nicholson
Centre for Digital Library Research

2
Notes and Slides
3
Theme Examine, Discuss

.the value of using metadata as a aid to
reliable retrieval both within individual Web
sites and across distributed sites
.what the barriers to effective use of metadata
are and how they can be overcome
.Who should be responsible for creating and
maintaining metadata - resource creators
web-masters librarians?

4
Theme Examine, Discuss

.Whether embedding and harvesting or a central
database is the best approach.
plus (if time allows)
A step beyond, the value of Content Management
Systems
Focus General
My background...

5
Responsibility to...

Stimulate
Thought Discussion Debate
Draw out the important points
Impart ability to apply what weve discovered
Ensure participation
So

6
Individual needs and circumstances?
7
Effective Retrieval

What is it?
Balance of precision and recall best suited to a
given problem
High precision and low recall usually preferred
but in some cases (e.g. patents) there may be an
advantage in lowering precision to boost recall
Level of precision and recall should be under the
users control not a side effect of poor metadata

8
Effective Retrieval

Why does it matter?
Costs University, public purse to create the
material - a waste if the people it is aimed at
cant find it
Strategic/PR considerations - If they cant find
your courses or expertise registers or digital
images for sale if and when you want or need them
to they wont use you or talk or write about you

9
Effective Retrieval

When does it matter?
Only if it is stuff you want found
The bigger they come, the sooner they fail
The more stuff you have, the more campuses, or
organisations in a collaboration,the harder it is
to ensure effective retrieval
Especially with no or poor metadata

10
What is metadata?

Metadata is data about data
Consists of things like
Author Title Subject Description Level
Language Viewer
Appropriate to function
The route to effective retrieval
Maybe...

11
What can go wrong?

Limited penetration (i.e. only some available
documents covered)
Misleading results for users
Different metadata record formats
Can the software cope? Is there a cross-walk?
Incompatible core field sets
Cross-walk not possible

12
What can go wrong?

Different field sub-sets used (Both use DC but
different field set)
Full service limited to common fields
Different fields used for same data element (I
put subject headings in subject field and free
form keywords in the keyword field but you put
subject headings in the keyword field)
Misleading results

13
What can go wrong?

Different or no standards applied in creating
data element content (e.g. Darwin, C. or Charles
Darwin)
Reduced retrieval varied results
Different or no subject schemes and/or category
lists (Educational levels, LCSH v. UNESCO v. made
up)
Reduced retrieval varied results
Insufficient granularity (If everything physical
is physics)
Poor precision, high recall

14
What can go wrong?

Varied or no methods of central co-ordination (2
sites or campuses)
Can cause some of the other problems listed above
and below
Different sites index different fields (One has
subjects, keywords in one index, another in
separate indices)
Misleading for users

15
What can go wrong?

Missing indices (Nothing on the subject in the
index or no subject index? (2 sites))
Misleading retrieval
Humans can cope but machines cant (A machine
finds it harder to spot different usages of the
same word or alternative words for the same
thing than a human does)
Semantic web wont work

16
Safeguards against

Limited penetration
Policy? Training? DC Dot? Human monitor?
Different formats
Discover need, agree policy, set standards,
ensure software can cope with formats
Incompatible core field sets
Identify formats (DC, IMS, MARC?) then agree core
set of fields (e.g. 15 in DC base)

17
Safeguards against

Different field sub-sets used
Agree, monitor, one core set
Different fields used for same data element
Templates and examples, Central co-ordination,
Guidelines, Training

18
Safeguards against

Different or no standards applied in creating
data element content
Template with examples
Different or no subject schemes and/or category
lists
Agree single schemes or lists, have drop down
lists, upgrade centrally

19
Safeguards against

Insufficient granularity
Agree usable level, training, examples
Varied or no methods of central co-ordination
(2 sites or campuses)
Make sure it doesnt happen!
Different sites index different fields
Agree approach, implement and monitor standards

20
Safeguards against

Missing indices
Agree not to do this, and warn users if you cant
agree
Humans can cope but machines cant (semantic web)
Use standard schemes, ontologies in standard ways
and map between different ones in a way that your
software can process

21
Where to keep it?

Pros and Cons of
Embedding and harvesting
Metadata creation more likely? Harder to
co-ordinate, easier to resource? More often out
of date? Harder to ensure standardised metadata?
A central database
Easier to co-ordinate, more expensive to
resource? Easier to maintain standards? How to
ensure new stuff notified?

22
Where to keep it?

Pros and Cons of
A mix of the two?
Worst of both worlds? Or best? How to ensure the
latter? Optimise author input of embedded
metadata but allow central upgrades by metatada
experts? I this feasible? Is it cost-effective?
Depends on other factors?
A question of designing to be fit for purpose?

23
Whose Responsibility?

Candidates Their pros and cons
Resource creators?
Au fait with the resource Labour saving
Web-masters?
Au fait with the technical landscape
Librarians?
Au fait with knowledge and metadata domains
Public Relations?
Au fait with the needs of the University
Anybody else?
All of the above? Co-ordinated by?

24
Other Related Issues

A CMS would ensure
Currency Accuracy Legality Authority of
Content retrieved by metadata
Not to mention
Uniform look and feel control easy total
redesign and global changes all content tracked
joint authorship across departments, units,
different institutions easy repurposing
All who have some responsibility can be involved
in controlled way?

25
Facilities

It would provide
Content authoring collaborative authoring
editing and workflow preventing unauthorised
editing or creation scheduling publication
tracking changes personalising repurposing
metadata creation knowledge management through
semantic control

26
Closing Discussion

Who has/plans to have a CMS?
What does it/will it cost?
Are they
Essential? Optional? Impractical? A threat to
academic freedom?
Do they help solve the metadata problem?

27
Useful URLs

Metadata
http//content.lib.washington.edu/METADATA/ (Why
should we care?)
http//www.ukoln.ac.uk/metadata/dcdot/
http//www.ukoln.ac.uk/web-focus/metadata/seminar-
materials/exercises/dc-dot/dc-dot.doc
http//www.ukoln.ac.uk/metadata/dcassist/
Content Management Systems
http//www.ukoln.ac.uk/nof/support/help/papers/cms
.htm (what are they?)
http//www.ariadne.ac.uk/issue30/techwatch/ (Who
needs them?)
http//www.cultivate-int.org/issue5/cms/ (CMSs
available)