Through the Bytes Darkly, - PowerPoint PPT Presentation

About This Presentation
Title:

Through the Bytes Darkly,

Description:

3. The Data Farm Experiment: Tools That Serve Access Can Also Serve ... Number of pages, forms and directories constituting the library web site.32,000. Inputs ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 33
Provided by: zuc7
Category:

less

Transcript and Presenter's Notes

Title: Through the Bytes Darkly,


1
Through the Bytes Darkly,
Management Information and the Digital Library
Information Technology Interest Group ACRL, New
England Chapter May 17, 2002
Joe Zucca Assessment, Planning and Publications
Librarian University of Pennsylvania Library
2
Four Sections of This Presentation
1. Environmental Audit Key Factors That
Influence Our Ability to Measure Digital
Information Use 2. From Low Resolution to High
Resolution Data Mining the Server Logs 3. The
Data Farm Experiment Tools That Serve Access Can
Also Serve Measurement 4. Why the Data Are
Important
3
Measuring Electronic Use at Penn Environmental
Influences
1. Organization and Culture
Strategic Focus Base planning, goal
setting/assessment on empirical evidence. From
1996- an element of Penns Strategic
Plan Operational Imperatives 1) Make
evaluation and measurement a component of each
program and project 2) Construct relays
that feed data to people who need quantitative
information to strategize and manage Experimen
tal Attitude Leverage the data you have usually
theyre good enough to validate organizational
experience and knowledge
4
Measuring Electronic Use at Penn Environmental
Influences
2. Proliferation of Electronic Resources
Article indexes, e-journals and other full-text
resources
5
Measuring Electronic Use at Penn Environmental
Influences
2.1. Growth of Expenditures for Electronic
Resources
Annual Growth of Expenditures for Electronic
Information Based on 1991
E-Resources as a percent of acquisitions budget
1991 1993 1996
1999 2000 2001 3.7
3.2 5.5 13.2
13.9 15.7
6
Measuring Electronic Use at Penn Environmental
Influences
3. Technologys Hostility to Measurement
  • Volatile metrics (The new system doesnt count
    that way!)
  • Ever-changing data elements (sets are out
    searches are in)
  • No common metrics (log-ins, sessions, searches,
    browses, page hits)
  • No measurement standards (Whats a search?,
    Whats a Web session?)
  • Non existent or inaccessible data (the vendor
    problem)
  • Approximate hard to obtain statistics (lots of
    data, no information)
  • Fleeting benchmarks

7
From Low Resolution to High Resolution Data

Mining the Server Logs for Descriptive
Statistics
dial-123-130.dial. indiana.edu - - 04/ Feb/2001
001802 -0500 "GET /special/ photos/
theater/504.html HTTP/1.0" 200 3247
"http//www.library.upenn. edu /special/photos/
theater /503.html" "Mozilla/4.7 C-CCK MCD C-UDP
EBM-APPLE (Macintosh I PPC) dialin1085.
upenn.edu--04/Feb/ 20010018 04
-0500"GET/facilities/count_ use.html?resource
China20Economic20 Review method ejs url
http//www.sciencedirect.com/ science/journal/
1043951XHT TP/1.0" 200 2027 "http//
www.library.upenn.edu/webbin 5/ resources/ejspubl
ic5.cgi?homepagehttp// www. library.upenn.edu/li
pp incott/community Business" "Mozilla/ 4.0
(compatible MSIE 5.0 Windows 98 DigExt SPIKE
5) 203.197. 226.240 - - 04/Feb/2001001807
-0500 "GET /etext/sasia/aiis/ architecture/khajur
aho/ 010a.jpg HTTP/1.0" 200 89117
"http//www.library.upenn.edu/etext/sasia/
aiis/arch itecture/khajuraho/010.html"
"Mozilla/4.7 en (Win95 I)
8
Low Resolution
Inputs
Records in locally-managed databases (including
the OPAC)26,332,138 Number of journal
article indexes full-text files (e.g. Academic
Index)....267 Number of e-journals (from
publishers such as Elsevier and free
sources)....6,608 Number of digital books
(locally created, aggregated and
licensed)....110,000 Number of locally
digitized and accessible images (e.g. fine art
slides, ms facsimiles)..82,356 Number of records
in the OPAC ........2,879,696
Number of pages, forms and directories
constituting the library web site.32,000
9
Low Resolution
The Load on Our Machines
Web Pages Served 1995-2001 from
www.library.upenn.edu. 3-month moving average
10
Low Resolution
Changing Machine Demand
BlackBoard
Pages Served by the Main Library Web Server
OPAC Server
25,000,000
OPAC
Web
20,000,000
15,000,000
10,000,000
5,000,000
0
2002
1996
1997
1998
1999
2000
2001
Projected
11
Low Resolution
Search Activity Over Time
Annual Searches in Licensed Databases (e.g.,
MEDLINE), FY97-01
searches
12
Correlation Matrix of Use Metrics Available for
Ovid Files
Pearson r for Sessions, Connect Time, Sets,
Documents Viewed
99 cases
Sessions Time Sets Docs.Viewed Sessions
1.00 Time .980 1.00 Sets
.905 .971 1.00 Documents Viewed .844
.932 .983 1.00
13
Correlation Matrix of Use Metrics Available for
SilverPlatter Files
Pearson r for Sessions, Connect Time, Searches,
Documents Viewed
Sessions Time Searches Abs.
Viewed Sessions 1.00 Time
.975 1.00 Searches .899
.901 1.00 Abstracts Viewed .840
.870 .855 1.00
94 cases
14
High Resolution Data User Input Good Program
Liaison and Knowledge Support Resource
Management, and Inform Basic Questions, e.g.
  • Are we choosing the right information sources
    for our audiences?
  • optimizing the delivery of electronic
    information?
  • making access as easy and seamless as possible?
  • spending our dollars wisely?
  • able to detect and respond to change in the
    patterns of resource use?

15
Using the Architecture of the Web to Increase
Data Resolution
www.library.upenn.edu/facilities/count_use.html
16
Beginning with a stream of unprocessed log data...
dial-123-130.dial. indiana.edu - -
04/Feb/2001001738-0500 "GET/special/photos
/theater/505.html HTTP/1.0" 200 3086
"http//www.library. upenn.edu/special/photos/thea
ter/504.html" "Mozilla/4.7C-CCK-MCD C-UDP
EBM-APPLE (Macintosh I PPC) recrawler
1.bos2.fastsearch.net - -04/Feb/200100 1821-
0500 "GET /etext/ sasia/skt-mss/1549 /15a.html
HTTP/1.0" 200 2736 "-" "FAST -WebCrawler/2.2-pre27
(crawler_at_ fast.no http//www .fast.no/faq/
faqfastweb search/faqfastwebcrawler.html)"
130.91.196.245.in-addr.arpa--04/Feb/200100
1740 -0500 "GET /facilities/count_use.html?reso
urce ABI/Inform 20 20Ovid method
Ovidurlhttp// www.abi-ovid.library.upenn.edu/ov
id web/ovidweb.cgi? TJS PAGE mainMODEovid
Dinfoz HTTP/1.1" 200 2039 "http//www.library.up
enn.edu/webbin5/resources/ databases.cgi?
business" "Mozilla/4.0 (compatible MSIE 5.5
Windows NT 4.0) 203.197.226.240 - -
04/Feb/2001001741 -0500 "GET
/etext/sasia/aiis/architecture /khajuraho/010.html
HTTP/1.0" 200 4427 "http//www.
library.upenn.edu/etext/ sasia/
aiis/architecture/ khajur aho/" "Mozilla/4.7 en
(Win95 I) 203.197.226. 240- -04/Feb/200
1001744 -0500 "GET /images/banner.
gifHTTP/1.0" 404 2814 "http//www.library. upenn.
edu/etext/sasi a/aiis/architecture
/khajuraho/010.html" "Mozilla /4.7 en (Win95
I)"pub237.lib.upenn.edu - - 04/Feb/
2001001748 -0500 "GET / HTTP/1.0" 200 8070
"-" "WebTrends Alert dial-123-130.dial.
indiana.edu - - 04/ Feb/2001 001802 -0500
"GET /special/ photos/ theater/504.html HTTP/1.0"
200 3247 "http//www.library.upenn. edu
/special/photos/ theater /503.html" "Mozilla/4.7
C-CCK MCD C-UDP EBM-APPLE (Macintosh I PPC)
dialin1085. upenn.edu--04/Feb/ 20010018 04
-0500"GET/facilities/count_use.html?resourceChin
a20Economic20 Review method ejs url
http//www.sciencedirect.com/ science/journal/
1043951XHT TP/1.0" 200 2027 "http//
www.library.upenn.edu/webbin 5/ resources/ejspubl
ic5.cgi?homepagehttp// www. library.upenn.edu/li
pp incott/community Business" "Mozilla/ 4.0
(compatible MSIE 5.0 Windows 98 DigExt SPIKE
5) 203.197. 226.240 - - 04/Feb/2001001807
-0500 "GET /etext/sasia/aiis/ architecture/khajur
aho/ 010a.jpg HTTP/1.0" 200 89117
"http//www.library.upenn.edu/etext/sasia/
aiis/arch itecture/khajuraho/010.html"
"Mozilla/4.7 en (Win95 I)
17
and information culled from databases that
generate our Web pages...
Æ http//www.uqtr.uquebec.ca/AE/index.htmlWorld
History of ArtF-TNo07-16-1999
111110-25-2000 1130 ABA Bank
Compliance http//proquest.umi.com/pqdlink?Ver1
Exp07-01-2003REQ3PUB14954Cert0CEccdp7
aMS6kuCDmdhPNL2bQ2tTOLTrDEHAz2bYmHN172RUqZPCJ2Sv
ATX2bFGA7htIYkVlFVWSyawE0NvKlpBZ2bO2f2bLEWBnch
nwLT92b2fdGGHSlx0PO3dxUQd3g2S9QP2FghKaQ2ncl5EdDK
Bum2vykhvxsyRQutjuMGKfxAKHOA4-PennABI/InformB
usiness,FinanceF-TPI No03-13-2001
000103-14-2001 1131mw ABA
Journal http//proquest.umi.com/pqdlink?Ver1Exp
07-012003REQ3PUB27585CertPfySiFXf1
0i6kuCDmdhPNL2bQ2tTOLTrDEHAz2bYmHN172RUqZPCJ2SvA
TX2bFGA7ht1pGvDP2bFxrGwE0NvKlpBZ2bO2f2bLEWBnc
hnwLT92b2fdGGHSlx0PO3dxUQd3g2S9QP2FghKaQ2ncl5EdD
KBum2vykhvxsyRQutjuAyIsegc4Y7Y-PennABI/Inform
FinanceF-TPINo03-13-2001 0001mw ABI/Inform
http//www.umi.com/pqdautoPennBiomedical
Research,Management,Business,Clinical
Medicine,Clinical Medicine,Nursing, Econo mics,
Health Care Policy Management
F-TSDbNo07-16-1999 111102-09-2001 1214
18
to extracting, parsing, storing, and mining for
significant content.
19
Use of Licensed Resources
What Databases Do Our Clients Use at What Cost?
15 Most Frequently Used Index/Abstract/Full-text
Databases in FY 2001
Database
Log-ins Pct Total Cost Per Login
MEDLINE 205,150 22.9 0.10
LEXIS/NEXIS 63,817 7.1 0.42
Academic Index 52,407 5.9 0.58
Dow Jones 39,828 4.5 0.68
ISI Citation Indexes 39,753 4.4 2.75
ABI/Inform 36,190 4.0 1.09
PsycINFO 27,636 3.1 0.89
Investext 17,695 2.0 0.68
Business Industry 16,797 1.9 0.55
CINAHL/Nursing 16,232 1.8 0.36
PubMed 15,610 1.7 -
MLA International 13,359 1.5 0.41
Multex 12,196 1.4 0.10
ERIC 10,852 1.2 0.54
EconLit 8,940 1.0 0.80
Hoovers Online 8,905 1.0 0.22
Inter Bibliog Soc Science 8,152 0.9 0.38
Sociological Abstracts 7,703 0.9 1.58
SP Industry Surveys 7,346 0.8 0.63
DB Million Database 6,376 0.7 1.74
All others 894,416 100.0

20
Use of Licensed Resources
What Are the High Use E-Journals, Data for FY2001
Title

Log-ins Pct Total Log-ins
Log-ins


On Campus Off Campus
Science 4,232 1.5 3,114 1,057
Nature 4,081 1.4 2,880 1,173
Journal of Biological Chemistry 2,408 0.8 1,883 519
Journal of the American Chemical Society 2,405 0.8 2,153 247
New England Journal of Medicine 1,994 0.7 1,359 620
Angewandte Chemie (international edition) 1,836 0.6 1,665 167
Journal of Organic Chemistry 1,660 0.6 1,504 150
Proceedings of the National Academy of Sciences 1,608 0.6 1,246 360
Tetrahedron Letters 1,361 0.5 1,218 143
Organic Letters 1,308 0.5 1,208 99
Proceedings of the National Academy of Sciences, U.S. 1,285 0.5 1,017 266
Journal of Molecular Biology 1,060 0.4 850 210
JAMA The Journal of the American Medical Association 1,023 0.4 650 352
Journal of Chemical Physics 992 0.3 819 172
Journal of Finance 887 0.3 423 378
Lancet 867 0.3 637 227
American Journal of Sociology 860 0.3 384 373
Medicine 849 0.3 580 263
Applied Physics Letters 834 0.3 751 83
Physical Review B 826 0.3 727 98
21
Use of Licensed Resources
How Much Bang Do We Get on the Dollar For
E-Journals?
E-Journal Subscription Costs Per Log-In, FY2002
(July-April)
Publisher Log-ins Pct
of Total Cost Per Login
ScienceDirect 139,727 27.1 0.63 ECO
70,730 13.7 0.09 JSTOR 48,668 9.4 0.35 Wil
ey 38,255 7.4 0.09 ACS
31,865 6.2 0.12 Ideal 30,568 5.9 5.51 Blac
kwell/Munksgaard 28,940 5.6 0.27 Journals_at_Ovid
26,982 5.2 n/a Oxford 14,819 2.9 0.20 Sprin
gerLINK 13,507 2.6 n/a ABI/Inform
12,785 2.5 3.08 Project Muse
11,438 2.2 1.22 AIP 7,873 1.5 5.01 Cambrid
ge 7,835 1.5 n/a Annual Reviews
7,215 1.4 0.08 IEEE 7,132 1.4 6.73 RSC
5,661 1.1 n/a Others 11,451 2.2 Total 515
,451 100 11 publishers
22
Use of Licensed Resources
How Does Use Scatter Across Databases
Use Measured in Log-ins for FY 2001
23
Database Use by Penns Schools Centers
Use of Licensed Resources
School Pct of Log-ins
How Does Database Use Distribute By Communities?
Per Capita Use of Databases by Penns Schools and
Centers, FY 2001
55
50
45
40
35
30
Log-ins Per Capita
25
20
15
10
5
0
LAW
VET
ASC
MED
NUR
SAS
GSE
SSW
SEAS
GSFA
WHRT
ADM
DENTAL
School and Center Domains
Does not include resources licensed by the Law
Library for Law school affiliates
24
Use of Licensed Resources
Database E-Journal Log-ins by Subject (based on
log samples from FY2001)
Subject focus
Human. Life Social Business Physical Total Sc
ience Science Science Administration 21.1 36.
5 13.9 07.0 21.6 100.0 Wharton 02.9 74.3
03.2 19.2 00.5 100.0 Annenberg
15.2 32.1 42.3 08.9 01.5 100.0 Medical 0
2.3 86.0 01.9 01.0 08.8 100.0 Dental 01.8
87.7 08.9 00.2 01.4 100.0 Veterinary 01.7
96.0 00.6 00.4 01.3 100.0 Dialin 08.5 63.
2 09.9 15.4 02.9 100.0 Education 24.6 13.1
61.5 00.8 00.0 100.0 Fine
Arts 29.0 18.5 45.7 5.6 01.2 100.0 Law 13
.0 26.6 20.9 37.0 02.4 100.0 Library 21.3
54.8 09.1 08.5 06.3 100.0 Nursing 15.9 73
.1 07.8 03.2 00.0 100.0 Student
Residences 18.9 57.0 12.6 09.0 02.5 100.0 A
rts and Sciences 08.2 26.3 5.7 09.9 49.9 100.
0 Engineering 0 1.5 29.5 2.3 01.2 65.6 10
0.0 Social Work 20.6 29.1 41.6 06.1 02.7 1
00.0 Unresolved 18.9 44.7 17.8 10.0 08.6 1
00.0 Total 14.7 50.7 11.9 8.6
14.1 100.0
Network Domain
25
Use of Licensed Resources
Where Do Our Clients Access Information?
Database Log-ins by Domain, FY2001
Campus Residences 10
Off-Campus 15
In-Library 25
On-Campus Depts 50
26
Use of Licensed Resources
Where Do Communities of Clients Work?
Database Log-ins from Off Campus as a Percent of
Total Log-ins, FY2001
Pct. of Log-ins
School or Center
27
Use of Licensed Resources
When Are They Working?
Database Use by Time of Day, FY2001
28
Use of Licensed Resources
How Does Audience Composition Change Through the
Day?
Database Use by hour, FY2001
29
The Data Farm Experiment Tools That Serve
Information Access Can Also Serve Measurement

30
Schematic of the Data Farm As of May 2002
31
Why Are the Data Important?
If you dont know where youre going, youll
probably end up somewhere else - Casey Stengel
To Demonstrate Accountability Is the library
spending the Schools money effectively?
(Pressures of Penns responsibility center
budget environment) To Understand and Describe
the Transfer of Technology Is the academic
information universe a digital universe (as some
at Penn believe)? Is the digital universe
more cost efficient than the paper one (as some
at Penn believe)? To Guide the Improvement of
Existing and the Development of New Services To
Ensure the Successful Fulfillment of Our Mission
32
Through the Bytes Darkly,
Management Information and the Digital Library
Joe Zucca
University of Pennsylvania Library
zucca_at_pobox.upenn.edu
Write a Comment
User Comments (0)
About PowerShow.com