Title: Through the Bytes Darkly,
1Through the Bytes Darkly,
Management Information and the Digital Library
Information Technology Interest Group ACRL, New
England Chapter May 17, 2002
Joe Zucca Assessment, Planning and Publications
Librarian University of Pennsylvania Library
2Four Sections of This Presentation
1. Environmental Audit Key Factors That
Influence Our Ability to Measure Digital
Information Use 2. From Low Resolution to High
Resolution Data Mining the Server Logs 3. The
Data Farm Experiment Tools That Serve Access Can
Also Serve Measurement 4. Why the Data Are
3Measuring Electronic Use at Penn Environmental
1. Organization and Culture
Strategic Focus Base planning, goal
setting/assessment on empirical evidence. From
1996- an element of Penns Strategic
Plan Operational Imperatives 1) Make
evaluation and measurement a component of each
program and project 2) Construct relays
that feed data to people who need quantitative
information to strategize and manage Experimen
tal Attitude Leverage the data you have usually
theyre good enough to validate organizational
experience and knowledge
4Measuring Electronic Use at Penn Environmental
2. Proliferation of Electronic Resources
Article indexes, e-journals and other full-text
5Measuring Electronic Use at Penn Environmental
2.1. Growth of Expenditures for Electronic
Annual Growth of Expenditures for Electronic
Information Based on 1991
E-Resources as a percent of acquisitions budget
1991 1993 1996
1999 2000 2001 3.7
3.2 5.5 13.2
13.9 15.7
6Measuring Electronic Use at Penn Environmental
3. Technologys Hostility to Measurement
- Volatile metrics (The new system doesnt count
that way!) - Ever-changing data elements (sets are out
searches are in) - No common metrics (log-ins, sessions, searches,
browses, page hits) - No measurement standards (Whats a search?,
Whats a Web session?) - Non existent or inaccessible data (the vendor
problem) - Approximate hard to obtain statistics (lots of
data, no information) - Fleeting benchmarks
7From Low Resolution to High Resolution Data
Mining the Server Logs for Descriptive
dial-123-130.dial. indiana.edu - - 04/ Feb/2001
001802 -0500 "GET /special/ photos/
theater/504.html HTTP/1.0" 200 3247
"http//www.library.upenn. edu /special/photos/
theater /503.html" "Mozilla/4.7 C-CCK MCD C-UDP
EBM-APPLE (Macintosh I PPC) dialin1085.
upenn.edu--04/Feb/ 20010018 04
-0500"GET/facilities/count_ use.html?resource
China20Economic20 Review method ejs url
http//www.sciencedirect.com/ science/journal/
1043951XHT TP/1.0" 200 2027 "http//
www.library.upenn.edu/webbin 5/ resources/ejspubl
ic5.cgi?homepagehttp// www. library.upenn.edu/li
pp incott/community Business" "Mozilla/ 4.0
(compatible MSIE 5.0 Windows 98 DigExt SPIKE
5) 203.197. 226.240 - - 04/Feb/2001001807
-0500 "GET /etext/sasia/aiis/ architecture/khajur
aho/ 010a.jpg HTTP/1.0" 200 89117
aiis/arch itecture/khajuraho/010.html"
"Mozilla/4.7 en (Win95 I)
8Low Resolution
Records in locally-managed databases (including
the OPAC)26,332,138 Number of journal
article indexes full-text files (e.g. Academic
Index)....267 Number of e-journals (from
publishers such as Elsevier and free
sources)....6,608 Number of digital books
(locally created, aggregated and
licensed)....110,000 Number of locally
digitized and accessible images (e.g. fine art
slides, ms facsimiles)..82,356 Number of records
in the OPAC ........2,879,696
Number of pages, forms and directories
constituting the library web site.32,000
9Low Resolution
The Load on Our Machines
Web Pages Served 1995-2001 from
www.library.upenn.edu. 3-month moving average
10Low Resolution
Changing Machine Demand
Pages Served by the Main Library Web Server
OPAC Server
11Low Resolution
Search Activity Over Time
Annual Searches in Licensed Databases (e.g.,
12Correlation Matrix of Use Metrics Available for
Ovid Files
Pearson r for Sessions, Connect Time, Sets,
Documents Viewed
99 cases
Sessions Time Sets Docs.Viewed Sessions
1.00 Time .980 1.00 Sets
.905 .971 1.00 Documents Viewed .844
.932 .983 1.00
13Correlation Matrix of Use Metrics Available for
SilverPlatter Files
Pearson r for Sessions, Connect Time, Searches,
Documents Viewed
Sessions Time Searches Abs.
Viewed Sessions 1.00 Time
.975 1.00 Searches .899
.901 1.00 Abstracts Viewed .840
.870 .855 1.00
94 cases
14High Resolution Data User Input Good Program
Liaison and Knowledge Support Resource
Management, and Inform Basic Questions, e.g.
- Are we choosing the right information sources
for our audiences? - optimizing the delivery of electronic
information? - making access as easy and seamless as possible?
- spending our dollars wisely?
- able to detect and respond to change in the
patterns of resource use?
15Using the Architecture of the Web to Increase
Data Resolution
16Beginning with a stream of unprocessed log data...
dial-123-130.dial. indiana.edu - -
04/Feb/2001001738-0500 "GET/special/photos
/theater/505.html HTTP/1.0" 200 3086
"http//www.library. upenn.edu/special/photos/thea
ter/504.html" "Mozilla/4.7C-CCK-MCD C-UDP
EBM-APPLE (Macintosh I PPC) recrawler
1.bos2.fastsearch.net - -04/Feb/200100 1821-
0500 "GET /etext/ sasia/skt-mss/1549 /15a.html
HTTP/1.0" 200 2736 "-" "FAST -WebCrawler/2.2-pre27
(crawler_at_ fast.no http//www .fast.no/faq/
faqfastweb search/faqfastwebcrawler.html)"
1740 -0500 "GET /facilities/count_use.html?reso
urce ABI/Inform 20 20Ovid method
Ovidurlhttp// www.abi-ovid.library.upenn.edu/ov
id web/ovidweb.cgi? TJS PAGE mainMODEovid
Dinfoz HTTP/1.1" 200 2039 "http//www.library.up
enn.edu/webbin5/resources/ databases.cgi?
business" "Mozilla/4.0 (compatible MSIE 5.5
Windows NT 4.0) - -
04/Feb/2001001741 -0500 "GET
/etext/sasia/aiis/architecture /khajuraho/010.html
HTTP/1.0" 200 4427 "http//www.
library.upenn.edu/etext/ sasia/
aiis/architecture/ khajur aho/" "Mozilla/4.7 en
(Win95 I) 203.197.226. 240- -04/Feb/200
1001744 -0500 "GET /images/banner.
gifHTTP/1.0" 404 2814 "http//www.library. upenn.
edu/etext/sasi a/aiis/architecture
/khajuraho/010.html" "Mozilla /4.7 en (Win95
I)"pub237.lib.upenn.edu - - 04/Feb/
2001001748 -0500 "GET / HTTP/1.0" 200 8070
"-" "WebTrends Alert dial-123-130.dial.
indiana.edu - - 04/ Feb/2001 001802 -0500
"GET /special/ photos/ theater/504.html HTTP/1.0"
200 3247 "http//www.library.upenn. edu
/special/photos/ theater /503.html" "Mozilla/4.7
dialin1085. upenn.edu--04/Feb/ 20010018 04
a20Economic20 Review method ejs url
http//www.sciencedirect.com/ science/journal/
1043951XHT TP/1.0" 200 2027 "http//
www.library.upenn.edu/webbin 5/ resources/ejspubl
ic5.cgi?homepagehttp// www. library.upenn.edu/li
pp incott/community Business" "Mozilla/ 4.0
(compatible MSIE 5.0 Windows 98 DigExt SPIKE
5) 203.197. 226.240 - - 04/Feb/2001001807
-0500 "GET /etext/sasia/aiis/ architecture/khajur
aho/ 010a.jpg HTTP/1.0" 200 89117
aiis/arch itecture/khajuraho/010.html"
"Mozilla/4.7 en (Win95 I)
17and information culled from databases that
generate our Web pages...
Æ http//www.uqtr.uquebec.ca/AE/index.htmlWorld
History of ArtF-TNo07-16-1999
111110-25-2000 1130 ABA Bank
Compliance http//proquest.umi.com/pqdlink?Ver1
usiness,FinanceF-TPI No03-13-2001
000103-14-2001 1131mw ABA
Journal http//proquest.umi.com/pqdlink?Ver1Exp
FinanceF-TPINo03-13-2001 0001mw ABI/Inform
Medicine,Clinical Medicine,Nursing, Econo mics,
Health Care Policy Management
F-TSDbNo07-16-1999 111102-09-2001 1214
18to extracting, parsing, storing, and mining for
significant content.
19Use of Licensed Resources
What Databases Do Our Clients Use at What Cost?
15 Most Frequently Used Index/Abstract/Full-text
Databases in FY 2001
Log-ins Pct Total Cost Per Login
MEDLINE 205,150 22.9 0.10
LEXIS/NEXIS 63,817 7.1 0.42
Academic Index 52,407 5.9 0.58
Dow Jones 39,828 4.5 0.68
ISI Citation Indexes 39,753 4.4 2.75
ABI/Inform 36,190 4.0 1.09
PsycINFO 27,636 3.1 0.89
Investext 17,695 2.0 0.68
Business Industry 16,797 1.9 0.55
CINAHL/Nursing 16,232 1.8 0.36
PubMed 15,610 1.7 -
MLA International 13,359 1.5 0.41
Multex 12,196 1.4 0.10
ERIC 10,852 1.2 0.54
EconLit 8,940 1.0 0.80
Hoovers Online 8,905 1.0 0.22
Inter Bibliog Soc Science 8,152 0.9 0.38
Sociological Abstracts 7,703 0.9 1.58
SP Industry Surveys 7,346 0.8 0.63
DB Million Database 6,376 0.7 1.74
All others 894,416 100.0
20Use of Licensed Resources
What Are the High Use E-Journals, Data for FY2001
Log-ins Pct Total Log-ins
On Campus Off Campus
Science 4,232 1.5 3,114 1,057
Nature 4,081 1.4 2,880 1,173
Journal of Biological Chemistry 2,408 0.8 1,883 519
Journal of the American Chemical Society 2,405 0.8 2,153 247
New England Journal of Medicine 1,994 0.7 1,359 620
Angewandte Chemie (international edition) 1,836 0.6 1,665 167
Journal of Organic Chemistry 1,660 0.6 1,504 150
Proceedings of the National Academy of Sciences 1,608 0.6 1,246 360
Tetrahedron Letters 1,361 0.5 1,218 143
Organic Letters 1,308 0.5 1,208 99
Proceedings of the National Academy of Sciences, U.S. 1,285 0.5 1,017 266
Journal of Molecular Biology 1,060 0.4 850 210
JAMA The Journal of the American Medical Association 1,023 0.4 650 352
Journal of Chemical Physics 992 0.3 819 172
Journal of Finance 887 0.3 423 378
Lancet 867 0.3 637 227
American Journal of Sociology 860 0.3 384 373
Medicine 849 0.3 580 263
Applied Physics Letters 834 0.3 751 83
Physical Review B 826 0.3 727 98
21Use of Licensed Resources
How Much Bang Do We Get on the Dollar For
E-Journal Subscription Costs Per Log-In, FY2002
Publisher Log-ins Pct
of Total Cost Per Login
ScienceDirect 139,727 27.1 0.63 ECO
70,730 13.7 0.09 JSTOR 48,668 9.4 0.35 Wil
ey 38,255 7.4 0.09 ACS
31,865 6.2 0.12 Ideal 30,568 5.9 5.51 Blac
kwell/Munksgaard 28,940 5.6 0.27 Journals_at_Ovid
26,982 5.2 n/a Oxford 14,819 2.9 0.20 Sprin
gerLINK 13,507 2.6 n/a ABI/Inform
12,785 2.5 3.08 Project Muse
11,438 2.2 1.22 AIP 7,873 1.5 5.01 Cambrid
ge 7,835 1.5 n/a Annual Reviews
7,215 1.4 0.08 IEEE 7,132 1.4 6.73 RSC
5,661 1.1 n/a Others 11,451 2.2 Total 515
,451 100 11 publishers
22Use of Licensed Resources
How Does Use Scatter Across Databases
Use Measured in Log-ins for FY 2001
23Database Use by Penns Schools Centers
Use of Licensed Resources
School Pct of Log-ins
How Does Database Use Distribute By Communities?
Per Capita Use of Databases by Penns Schools and
Centers, FY 2001
Log-ins Per Capita
School and Center Domains
Does not include resources licensed by the Law
Library for Law school affiliates
24Use of Licensed Resources
Database E-Journal Log-ins by Subject (based on
log samples from FY2001)
Subject focus
Human. Life Social Business Physical Total Sc
ience Science Science Administration 21.1 36.
5 13.9 07.0 21.6 100.0 Wharton 02.9 74.3
03.2 19.2 00.5 100.0 Annenberg
15.2 32.1 42.3 08.9 01.5 100.0 Medical 0
2.3 86.0 01.9 01.0 08.8 100.0 Dental 01.8
87.7 08.9 00.2 01.4 100.0 Veterinary 01.7
96.0 00.6 00.4 01.3 100.0 Dialin 08.5 63.
2 09.9 15.4 02.9 100.0 Education 24.6 13.1
61.5 00.8 00.0 100.0 Fine
Arts 29.0 18.5 45.7 5.6 01.2 100.0 Law 13
.0 26.6 20.9 37.0 02.4 100.0 Library 21.3
54.8 09.1 08.5 06.3 100.0 Nursing 15.9 73
.1 07.8 03.2 00.0 100.0 Student
Residences 18.9 57.0 12.6 09.0 02.5 100.0 A
rts and Sciences 08.2 26.3 5.7 09.9 49.9 100.
0 Engineering 0 1.5 29.5 2.3 01.2 65.6 10
0.0 Social Work 20.6 29.1 41.6 06.1 02.7 1
00.0 Unresolved 18.9 44.7 17.8 10.0 08.6 1
00.0 Total 14.7 50.7 11.9 8.6
14.1 100.0
Network Domain
25Use of Licensed Resources
Where Do Our Clients Access Information?
Database Log-ins by Domain, FY2001
Campus Residences 10
Off-Campus 15
In-Library 25
On-Campus Depts 50
26Use of Licensed Resources
Where Do Communities of Clients Work?
Database Log-ins from Off Campus as a Percent of
Total Log-ins, FY2001
Pct. of Log-ins
School or Center
27Use of Licensed Resources
When Are They Working?
Database Use by Time of Day, FY2001
28Use of Licensed Resources
How Does Audience Composition Change Through the
Database Use by hour, FY2001
29The Data Farm Experiment Tools That Serve
Information Access Can Also Serve Measurement
30Schematic of the Data Farm As of May 2002
31Why Are the Data Important?
If you dont know where youre going, youll
probably end up somewhere else - Casey Stengel
To Demonstrate Accountability Is the library
spending the Schools money effectively?
(Pressures of Penns responsibility center
budget environment) To Understand and Describe
the Transfer of Technology Is the academic
information universe a digital universe (as some
at Penn believe)? Is the digital universe
more cost efficient than the paper one (as some
at Penn believe)? To Guide the Improvement of
Existing and the Development of New Services To
Ensure the Successful Fulfillment of Our Mission
32Through the Bytes Darkly,
Management Information and the Digital Library
Joe Zucca
University of Pennsylvania Library