Title: Quality: Traditional and Possible Mechanisms
1Quality Traditional and Possible Mechanisms
- CS 502 200200312
- Carl Lagoze Cornell University
2The Problem
- Build a large-scale digital library from web
resources but maintain quality - National Science Digital Library
- Traditional library technique
- Acquisitions librarians
- Trusted Sources
- Professional societies, publishers
- Patron request
- Problems in NSDL environment
- Unfocused audience
- Scale
- Variability of resources
- COPPA (http//www.cdt.org/legislation/105th/privac
y/coppa.html) - What is quality?
3General observations of quality
- Is there such a thing as a shared notion of
quality? - Few good studies
- Studies with popular culture
- Amento, Terveen, Hill ACM TOIS study of web sites
on Buffy the Vampire Slayer, Simpsons,
Smashing Pumpkins, Tori Amos - Studies show that expert agreement on quality is
around 75 - Expert agreement differs across different
categories of information - Often a confusion between relevance and quality,
which are different
4What is quality on the Web?
- Factors
- Site Layout
- Site Organization
- Uniqueness of information
- Reputation of publisher
5(No Transcript)
6Current Quality Strategy 1 The Reader Looks for
Clues
Internal clues can inform an experienced reader
All that glisters is not gold. And vice versa.
7(No Transcript)
8Considerations
Publisher, ACM, is a well-known scientific
society that follows standard procedures for peer
review. Editor-in-chief is a well-known professor
in a strong department. (http//www.acm.org/jacm/E
ditors.html) Papers in theoretical computer
science can be reviewed from their content.
Gold
9(No Transcript)
10Considerations
Looks the same as the Journal of the ACM. but
... Procedures for selecting and reviewing
conference papers are loosely controlled. Papers
in applications research are difficult to
evaluate by superficial reading.
Not gold
11(No Transcript)
12Considerations The appearance looks like a
draft. Nothing technical from 1981 is
current. Who is DARPA anyway? yet
... This is the official definition of
IP. http//www.ietf.org/rfc/rfc0791.txt?number791
Gold
13(No Transcript)
14Considerations The appearance looks like a
joke. URL looks suspicious (strange
spelling). Whats with the graphic?
yet ... This is the working literature of
physics research. http//arxiv.org
Gold
15Current Quality Strategy 2 The Publisher as
Creator
Materials are written by authors or selected by
curators who are employed by the publisher.
Quality is tied to the reputation of the
publisher.
16(No Transcript)
17(No Transcript)
18Current Quality Strategy 3 External Readers
Chosen by the Publisher
Publishers ask external experts to review
materials
19(No Transcript)
20(No Transcript)
21Observations about Peer Review
At its best, it is superb. At its worst, it
validates junk. Some topics can be reviewed from
a paper, e.g., mathematics. Some topics cannot be
reviewed from a paper, e.g., computer systems.
"Whatever you do, write a paper. Some journal
will publish it." Advice to young faculty
member, University of Sussex, 1972.
22Current Quality Strategy 4 Independent Reviews
Reviewers, hopefully independent of the author
and publisher, describe their opinion of the
item. Value of the review to the user depends on
(a) the reputation of where the review is
published and (b) how well it is done.
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Citation Analysis
- Understanding citation patterns among scholarly
journals - Quality metric on journals (not on individual
articles or scholars) - Cost/benefit analysis what basic journals
should a library have in its holdings - Eugene Garfield Father of citation analysis
- Science Citation Index
- Origins circa 1950s
- Hand analysis of printed journals showing
patterns of citations into and out from journals
28Concepts References and Citations
Doc2
Doc1
Doc1 references (Doc2, Doc3)
Doc1 citations (Doc2)
Doc3
29Concepts References and Citations
- of references of a document is finite, stable,
and easy to determine/compute - of citations of a document is dynamic,
impossible to computer (infinite) - Generally, references are at the work, or
manifestation level, NOT at the item level
30Results of citation analysis
acks. Garfield, Science, 1972
31Citation analysis in the digital age
- Automatic citation linking among papers in arXiv
- Citebase (Open Citations Project)
- http//citebase.eprints.org/cgi-bin/search?submit
1authorHawking2C20S20W20 - Scientometrics - Automation of methods reveals
lots of data - Longevity of interest in paper
- Journal and ePrint citation patterns
- Automatic citation analysis as a reviewing tool?
32Are papers downloaded then cited or cited then
downloaded?(2)
- If all these time differences are plotted the
above graph is produced.
Acks S. Harnad
33Citation Latencies
- The raw data show that the latency of the
citation peak has been reducing over the period
of the archive
Acks S. Harnad
34Author Impact Quartiles
- High impact authors update more than medium or
low - High and medium impact authors deposit more
papers than low
Acks S. Harnad
35Citation Quality
- Papers generally cite papers of like impact
Acks S. Harnad
36Citation Spread
- A small number of papers receive a very large
number of citations
Acks S. Harnad
37How Paper Impact Effects Usage
- Higher impact papers have a longer download life
expectancy.
Acks S. Harnad
38What is the correlation between citations and
downloads?
- There is a significant positive correlation
between citations and downloads for high impact
papers.
Acks S. Harnad
39Automatic Reviewing Techniques
- Traditional Collaborative Filtering
- Estimate what score a reviewer might give to an
item that he/she has not scored yet - Frequently used by recommender systems
- Use of user profiles
- Collaborative quality filtering
- http//www.cs.berkeley.edu/tracyr/project/
- Attempts to automatically determine which
reviewers are "good" in an open reviewing system,
in order to provide the same (or better) benefits
as peer review
40Collaborative Quality Filtering Algorithm
- Assume true value of an item is the asymptotic
average of review scores - Good reviewers are those who consistently predict
this average - Normalize according to of reviews of an item,
of reviews by reviewer, review latency - Adjust by expertise
- Use similarity of term vectors of items reviewed
41(No Transcript)
42Annotation Systems
- Worked successfully in many cases in Web
environment - Amazon
- Most successful when combined with reputation
systems - E-bay
- Problems with existing systems
- Natural language
- Closed/private systems
- Non-extensible
43Annotea Open Web Infrastructure for Shared Web
Annotations
- http//www.w3.org/2001/Annotea/
- Annotations as class of metadata
- External to the document and stored on an
annotation server - Primitive RDF class annotation.
- Sub-classed in various ways Advice, Change,
Example, Explanation, Question, See Also - Ratings can be formally expressed and machine
readable - Storage of annotations in RDF database on
annotations servers that can be queried. - Information in multiple annotation servers can be
merged
44Annotea System Architecture
45Annotea RDF data model
46Annotations in the NSDL