Quality: Traditional and Possible Mechanisms - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Quality: Traditional and Possible Mechanisms

Description:

Build a large-scale digital library from web resources but maintain 'quality' ... sites on 'Buffy the Vampire Slayer', 'Simpsons', 'Smashing Pumpkins', 'Tori Amos' ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 47
Provided by: carll8
Category:

less

Transcript and Presenter's Notes

Title: Quality: Traditional and Possible Mechanisms


1
Quality Traditional and Possible Mechanisms
  • CS 502 200200312
  • Carl Lagoze Cornell University

2
The Problem
  • Build a large-scale digital library from web
    resources but maintain quality
  • National Science Digital Library
  • Traditional library technique
  • Acquisitions librarians
  • Trusted Sources
  • Professional societies, publishers
  • Patron request
  • Problems in NSDL environment
  • Unfocused audience
  • Scale
  • Variability of resources
  • COPPA (http//www.cdt.org/legislation/105th/privac
    y/coppa.html)
  • What is quality?

3
General observations of quality
  • Is there such a thing as a shared notion of
    quality?
  • Few good studies
  • Studies with popular culture
  • Amento, Terveen, Hill ACM TOIS study of web sites
    on Buffy the Vampire Slayer, Simpsons,
    Smashing Pumpkins, Tori Amos
  • Studies show that expert agreement on quality is
    around 75
  • Expert agreement differs across different
    categories of information
  • Often a confusion between relevance and quality,
    which are different

4
What is quality on the Web?
  • Factors
  • Site Layout
  • Site Organization
  • Uniqueness of information
  • Reputation of publisher

5
(No Transcript)
6
Current Quality Strategy 1 The Reader Looks for
Clues
Internal clues can inform an experienced reader
All that glisters is not gold. And vice versa.
7
(No Transcript)
8
Considerations
Publisher, ACM, is a well-known scientific
society that follows standard procedures for peer
review. Editor-in-chief is a well-known professor
in a strong department. (http//www.acm.org/jacm/E
ditors.html) Papers in theoretical computer
science can be reviewed from their content.
Gold
9
(No Transcript)
10
Considerations
Looks the same as the Journal of the ACM. but
... Procedures for selecting and reviewing
conference papers are loosely controlled. Papers
in applications research are difficult to
evaluate by superficial reading.
Not gold
11
(No Transcript)
12
Considerations The appearance looks like a
draft. Nothing technical from 1981 is
current. Who is DARPA anyway? yet
... This is the official definition of
IP. http//www.ietf.org/rfc/rfc0791.txt?number791

Gold
13
(No Transcript)
14
Considerations The appearance looks like a
joke. URL looks suspicious (strange
spelling). Whats with the graphic?
yet ... This is the working literature of
physics research. http//arxiv.org
Gold
15
Current Quality Strategy 2 The Publisher as
Creator
Materials are written by authors or selected by
curators who are employed by the publisher.
Quality is tied to the reputation of the
publisher.
16
(No Transcript)
17
(No Transcript)
18
Current Quality Strategy 3 External Readers
Chosen by the Publisher
Publishers ask external experts to review
materials
19
(No Transcript)
20
(No Transcript)
21
Observations about Peer Review
At its best, it is superb. At its worst, it
validates junk. Some topics can be reviewed from
a paper, e.g., mathematics. Some topics cannot be
reviewed from a paper, e.g., computer systems.
"Whatever you do, write a paper. Some journal
will publish it." Advice to young faculty
member, University of Sussex, 1972.
22
Current Quality Strategy 4 Independent Reviews
Reviewers, hopefully independent of the author
and publisher, describe their opinion of the
item. Value of the review to the user depends on
(a) the reputation of where the review is
published and (b) how well it is done.
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Citation Analysis
  • Understanding citation patterns among scholarly
    journals
  • Quality metric on journals (not on individual
    articles or scholars)
  • Cost/benefit analysis what basic journals
    should a library have in its holdings
  • Eugene Garfield Father of citation analysis
  • Science Citation Index
  • Origins circa 1950s
  • Hand analysis of printed journals showing
    patterns of citations into and out from journals

28
Concepts References and Citations
Doc2
Doc1
Doc1 references (Doc2, Doc3)
Doc1 citations (Doc2)
Doc3
29
Concepts References and Citations
  • of references of a document is finite, stable,
    and easy to determine/compute
  • of citations of a document is dynamic,
    impossible to computer (infinite)
  • Generally, references are at the work, or
    manifestation level, NOT at the item level

30
Results of citation analysis
acks. Garfield, Science, 1972
31
Citation analysis in the digital age
  • Automatic citation linking among papers in arXiv
  • Citebase (Open Citations Project)
  • http//citebase.eprints.org/cgi-bin/search?submit
    1authorHawking2C20S20W20
  • Scientometrics - Automation of methods reveals
    lots of data
  • Longevity of interest in paper
  • Journal and ePrint citation patterns
  • Automatic citation analysis as a reviewing tool?

32
Are papers downloaded then cited or cited then
downloaded?(2)
  • If all these time differences are plotted the
    above graph is produced.

Acks S. Harnad
33
Citation Latencies
  • The raw data show that the latency of the
    citation peak has been reducing over the period
    of the archive

Acks S. Harnad
34
Author Impact Quartiles
  • High impact authors update more than medium or
    low
  • High and medium impact authors deposit more
    papers than low

Acks S. Harnad
35
Citation Quality
  • Papers generally cite papers of like impact

Acks S. Harnad
36
Citation Spread
  • A small number of papers receive a very large
    number of citations

Acks S. Harnad
37
How Paper Impact Effects Usage
  • Higher impact papers have a longer download life
    expectancy.

Acks S. Harnad
38
What is the correlation between citations and
downloads?
  • There is a significant positive correlation
    between citations and downloads for high impact
    papers.

Acks S. Harnad
39
Automatic Reviewing Techniques
  • Traditional Collaborative Filtering
  • Estimate what score a reviewer might give to an
    item that he/she has not scored yet
  • Frequently used by recommender systems
  • Use of user profiles
  • Collaborative quality filtering
  • http//www.cs.berkeley.edu/tracyr/project/
  • Attempts to automatically determine which
    reviewers are "good" in an open reviewing system,
    in order to provide the same (or better) benefits
    as peer review

40
Collaborative Quality Filtering Algorithm
  • Assume true value of an item is the asymptotic
    average of review scores
  • Good reviewers are those who consistently predict
    this average
  • Normalize according to of reviews of an item,
    of reviews by reviewer, review latency
  • Adjust by expertise
  • Use similarity of term vectors of items reviewed

41
(No Transcript)
42
Annotation Systems
  • Worked successfully in many cases in Web
    environment
  • Amazon
  • Most successful when combined with reputation
    systems
  • E-bay
  • Problems with existing systems
  • Natural language
  • Closed/private systems
  • Non-extensible

43
Annotea Open Web Infrastructure for Shared Web
Annotations
  • http//www.w3.org/2001/Annotea/
  • Annotations as class of metadata
  • External to the document and stored on an
    annotation server
  • Primitive RDF class annotation.
  • Sub-classed in various ways Advice, Change,
    Example, Explanation, Question, See Also
  • Ratings can be formally expressed and machine
    readable
  • Storage of annotations in RDF database on
    annotations servers that can be queried.
  • Information in multiple annotation servers can be
    merged

44
Annotea System Architecture
45
Annotea RDF data model
46
Annotations in the NSDL
Write a Comment
User Comments (0)
About PowerShow.com