Balancing EvidenceBased Librarianship and Protecting Patron Privacy through the Bibliomining Process - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Balancing EvidenceBased Librarianship and Protecting Patron Privacy through the Bibliomining Process

Description:

USA PATRIOT Act. Bibliomining Process. Data Warehousing. Evidence-Based Librarianship ... USA PATRIOT and USA PATRIOT II. Problem with PATRIOT. Agents can ... – PowerPoint PPT presentation

Number of Views:153
Avg rating:5.0/5.0
Slides: 89
Provided by: scott372
Category:

less

Transcript and Presenter's Notes

Title: Balancing EvidenceBased Librarianship and Protecting Patron Privacy through the Bibliomining Process


1
Balancing Evidence-Based Librarianship and
Protecting Patron Privacy through the
Bibliomining Process
Protecting Patron Privacy
Evidence-Based Librarianship
Bibliomining Process
  • Scott Nicholson
  • Assistant Professor
  • Syracuse University School of Information Studies

2
Overview
  • Evidence-Based Librarianship
  • Information Seeking in Context
  • Threats to Patron Privacy
  • USA PATRIOT Act
  • Bibliomining Process
  • Data Warehousing

3
Evidence-Based Librarianship
4
Evidence-Based Librarianship Idea
  • Basic Idea
  • Use data-based evidence to make decisions

5
Dont we already do that?
  • Do we?
  • Many times, decision is
  • Made on beliefs of user needs based on tacit
    knowledge
  • Evaluated afterwards
  • EBL is focused on using data-based evidence first
    to make the decision
  • Then evaluate as well

6
Evidence-Based Librarianship conceptualization
  • EBL is based upon Evidence-Based Medicine
  • EBM concept Combine study results (evidence) to
    make a decision.
  • EBL translation Use library research projects
    (best available evidence) and combine results to
    better understand a phenomenon

7
EBL Levels of Evidence
  • Reviews of rigorous studies
  • Reviews of less rigorous students
  • Randomized controlled trials
  • Controlled comparison studies
  • Cohort studies
  • Descriptive surveys
  • Case studies
  • Decision analysis
  • Qualitative research (focus groups, etc.)

From Eldredge, J. (2000). Evidence-based
librarianship An overview. Bulletin of the
Medical Library Association 88(4). 289-302. Table
2.
8
EBL Conceptualization
Patrons
Patron representations
Generalization from study
Generalizations across studies Evidence
Problems? Bias in elicitation, different
elicitation methods, not in my library
9
Problems with Traditional EBL
  • Small base of library research that is similar
    enough to be used together
  • EBM has a much larger base of controlled
    randomized research
  • Different elicitation methods different types
    of generalizations
  • Argument Those studies dont apply to _my_
    patrons.

10
Thinking about Context
  • Information Seeking in Context framework
  • Importance of capturing context and not just
    seeking behavior
  • Dervin, Kuhlthau, Taylor

11
Contexts for Library Decision-Making
  • Taylors Information Use Environments(IUE)
  • Assumption People within a group are more
    similar than people outside the group (in some
    way)
  • Similar tasks, settings, constraints
  • Similar value of what is useful

12
Groups of Users
  • Professions
  • Entrepreneurs
  • Special interest groups
  • Special socioeconomic groups
  • Institution-specific groupings
  • Department Major
  • Educational Level
  • Groups of users Communities

13
Model combining ISiCand IR Evaluation
Järvelin, K. Ingwersen, P. (2004). "Information
seeking research needs extension toward tasks and
technology". Information Research, 10(1) paper
212.
14
One User, Several Communities
Profession
Special Interest Group
Socioeconomic Group
But what about Privacy?
15
Protecting User Privacy
16
Privacy Issues
  • Librarians want to provide a safe space for users
  • US government threats to library data
  • Library Awareness Program (70s-80s, tracking
    reading patterns)
  • Patchwork of state laws covering data
  • USA PATRIOT and USA PATRIOT II

17
Problem with PATRIOT
  • Agents can request data on one person
  • They arent picky - they can get much more data
  • But, of course, they wont use it
  • Result Anything the library keeps could be
    taken.

18
First Response Destroy!
  • Just dont keep it!
  • Libraries were discussed in the news as deleting
    records to protect patrons
  • Why? Were not using it
  • Permanent solution
  • Causes more problems

19
Ways of protecting user privacy
  • Ignorance data audit
  • Backups?
  • Deletion
  • Potential problems
  • Encoding
  • Being selective in archival
  • Data Warehouse

20
The Data Warehouse
21
The Data Warehouse
  • Data Warehouse One place for data
  • Collected from Different Systems
  • Cleaned and Joined
  • Outside of Functional System
  • Place for Reporting and Analysis
  • But also support for normal library operations
  • Creates awareness of library data
  • Key Concept Operational vs. Archival data

22
Data Produced through Library Services
  • Data about Materials
  • Data about Users
  • Data about Services

23
Work
24
Representation of Work
25
Bibliographic surrogate
  • Information taken from the work
  • Title, Author, Abstract (?), Publisher
  • Information created to describe the work
    (metadata)
  • Subject headings, Classification, Type, Keywords
  • Information about access to the work
  • Call Number, General Location, Form(s)

26
Bibliographic Surrogate
27
(No Transcript)
28
User
29
Data about Users
  • Personal information (remove)
  • Demographic Surrogate
  • Demographic Surrogate
  • Information Use Environment
  • Collected during application or enrollment
    process
  • External from other sources
  • Matching Zip code to demographic database
  • Match Proxy ID to student database or company
    database
  • Assumptions
  • IP address -gt physical location (on or
    off-campus)

30
Protecting the Privacy of Users
  • Demographic Surrogate
  • Contexts without Identification
  • HIPAA
  • Upcoming in JASIST
  • 18 Items in 4 groups
  • Direct identifiers and identifiers that connect
    into other databases
  • Address and location information
  • Dates related to an individual
  • Contact information.

31
Methods for dealing with Personally Identifiable
Information
  • Use codes, Ids for matching and discard

32
Coding and not discarding
  • Use when some component of ID is important
  • Example IP addresses
  • Useful to know when it was the same
  • Extract important info
  • Recode into new variable

33
Methods for dealing with Personally Identifiable
Information
  • Use for matching and discard

34
Dealing with categories
Make sure that combinations of categories dont
identify an individual.
35
DemographicSurrogate
36
Enter the LibraryConnecting Users to Information
  • Different methods of connection (based upon
    material)
  • Searching
  • Circulation
  • In-House Use
  • Reference
  • Interlibrary Loan

37
Baseline for Library Services
  • Time (length of time when appropriate)
  • Date
  • Location
  • Method
  • Physical
  • Digital
  • Staff involved
  • Concurrent with other resources/services

38
Differences in Services
  • Searching
  • Path of search, success of search
  • Reference
  • Content of transaction, Path of referral
  • ILL
  • Cost, speed, pattern of use
  • All have same baseline, but different additional
    fields

39
Library Services Searching, Circulation/Use,
Reference, Outreach/Training, ILL/Request Time,
Date, Location, Format, Staff Involved, Cost,
Concurrent use
DemographicSurrogate
40
Data-Based DifferencesPractitioners and
Researchers
  • Problems in cooperation
  • Different purposes to research
  • Practitioners
  • Specific to own library operations
  • Generalizability not as important
  • Researchers
  • Look across multiple library operations
  • Different applications of research
  • Data Warehouse can serve both needs

41
Library Operations Library personnel Selection/Acq
uisition Cataloging Staffing
Library Services Searching, Circulation/Use,
Reference, Outreach/Training, ILL/Request Time,
Date, Location, Format, Staff Involved,
Concurrent use
DemographicSurrogate
42
Library Operations Library personnel Selection/Acq
uisition Cataloging Staffing
Library Services Searching, Circulation/Use,
Reference, Outreach/Training, ILL/Request Time,
Date, Location, Format, Staff Involved,
Concurrent use
Bibliometric Data Social networks Citations /
Links Disciplines Affiliations
DemographicSurrogate
43
Library Operations Library personnel Selection/Acq
uisition Cataloging Staffing
Bibliometric Data Social networks Citations /
Links Disciplines Affiliations
Library Services Searching, Circulation/Use,
Reference, Outreach/Training, ILL/Request Time,
Date, Location, Format, Staff Involved,
Concurrent use
E-Resource Activity From vendors Currently
Aggregates Desired Individual Items Proxy
server
DemographicSurrogate
44
Dealing with Textual data
  • Digital Reference transactions
  • Easy to deal with the metadata
  • Hard to deal with the text
  • Manual cleaning of PII
  • Similar problem with deidentification of medical
    records
  • Natural Language Processing research

45
Finding the Connections
  • Data mining is about patterns
  • Patterns can come from links between works
  • Connecting the data sources allows for more links
    between works

46
Links used for Bibliometrics
Citations
Patterns from Creation and Publication
Work
Work
Work
Authors
Author
Collection Journal, Package,Vendor, etc.
Collection
Collection
Collection
Subject
Subject
Subject
47
Links used for Data Mining
Work
Work
Work
Authors
Author
Patterns from user selection
Demographic
Demographic
Demographic
Demographic ANY context
Demographic
Demographic
Demographic
Demographic
Demographic
Demographic
48
Anonymization
Work
Work
Work
Authors
Author
Demographic
Demographic
Demographic
Demographic ANY context
Demographic
Demographic
Demographic
Demographic
Demographic
Demographic
49
Data sources for Bibliomining
Citations
Work
Work
Work
Bibliometrics Data Mining Bibliomining
Authors
Author
Collection
Collection
Collection
Subject
Subject
Subject
Demographic
Demographic
Demographic
Demographic
Demographic
Demographic
Demographic
Demographic
Demographic
50
Library Data Warehouse
Cleaned, Archived, Anonymized data kept separate
from the operational systems
Data Warehouse
51
Traditional method of applying concepts
Data Warehouse Bibliographic Surrogate Library
Service Demographic Surrogate
Bibliometrics Add Citations, Author Affiliation,
Connections
Library Operations Add Publishers/Vendors,
Plans, Staff
Practitioner Data Mart (Internal Validity)
Researcher Data Mart (External Validity)
52
Current method of applying concepts
Data Warehouse
Practitioner Data Mart
Researcher Data Mart
Reports
Reports
Models
Models
Tools
Tools
Results Improve local services Occasionally
external
Results Generalized scholarship Sometimes local
53
Open Efforts Project
Data Warehouse
Standards and Schema
Practitioner Data Mart
Researcher Data Mart
Reports Models Tools
Reports Models Tools
Infrastructure
Results Librarians benefit directly from
researchers Researchers get data and gain
better understanding of librarian needs
Results Improve local services Occasionally
external
Results Generalized scholarship Sometimes local
54
Youve collected the data
  • Now what?

55
Dealing with Data
  • Traditional Methods of Analysis
  • Online Analytical Processing
  • Visualization
  • Data Mining
  • Rolling it all into the Bibliomining Process

56
Traditional Analysis
  • Aggregates and Averages
  • Individual Reports
  • Who has large late fees?
  • What books havent circulated in years?
  • What databases get the heaviest usage
  • Stable, dependable reports
  • Useful for comparing over time
  • Great as a baseline

57
Penn Library Data Farm
  • Web-based front end for ad-hoc reporting
  • More data going in from additional sources
  • Project is moving from exploratory to central in
    reporting
  • Library staff producing more quantitative reports
  • Trickle-down and grass-roots effects

58
(No Transcript)
59
Humanities
Archeology
Patterns? Bibliometric laws predict that there
are.
60
The Problem with Aggregates
  • Topics are set ahead of time
  • Time-consuming to ask new questions
  • Aggregates and averages mask differences in
    groups
  • Example
  • On average, our service is rated very good.
  • The reality
  • 10 people rate the service Excellent (5 out of 5)
  • 5 people rate the service Good (3 out of 5)
  • 5 people rate the service Poor (1 out of 5)
  • Average 3.5
  • Much more useful to understand the breakdown
  • More importantly, what variables correspond with
    each group?
  • Time of day? Demographics? Other services used?

61
On-Line Analytical Processing (OLAP)
  • OLAP Reports on demand
  • Excel Pivot Tables
  • Interactive headings and entries
  • Designed to be used by managers
  • Allow for easy ad-hoc reporting
  • Demo Normative Data Project
  • Sirsi/Dynix

62
Normative Data Project - Circulation
63
(No Transcript)
64
(No Transcript)
65
OLAP Concept
66
Data mining
  • History of data mining
  • Extraction of patterns from large data sets
  • Requires the same metadata about each record in
    the dataset
  • Uses statistical, visual, and AI techniques
  • And, more importantly, people.
  • Different types of data mining
  • Directed vs. Undirected
  • Descriptive vs. Predictive

67
Data Mining Goal
  • Locate patterns that are
  • Novel
  • Meaningful
  • Actionable
  • Many patterns will be
  • Trivial (if freshman, then undergrad)
  • Not meaningful (Odd birth month -gt Late)
  • Not actionable (anti-redlining laws)
  • To sort requires a domain expert

68
Data Mining Concept
OLAP
DM
69
Case Study - Library
  • Penn State Data Farm
  • 10,000 circulation records from Fall 2004
  • Items that had been returned
  • Patron classification Circulation information

70
(No Transcript)
71
  • Data Cleaning
  • Dropped non-LC
  • Took only first letter
  • Started with first 2
  • Too much

72
(No Transcript)
73
(No Transcript)
74
More about Prediction
  • Assign a value or category
  • Requires training data from the past
  • Result may be rules or formulas
  • Important Correlation is not causation.
  • People with sun stroke are sunburned.
  • Does sun stroke cause sunburn?
  • Requires domain expert to confirm

75
Considering Patterns
Question Asked What are the best predictors of
circulation length? Clementine says If renewal
count of 0 or 1, the average circulation is 15
days. If renewal count is higher than 1, the
average circulation is 60 days. What about this
rule?
76
Considering Patterns
  • If item_type in "audio "bound jrnl" "music"
    "non-circ" "reference" "reserve"
    "special" "video"
  • Ave 2.812
  • 2. If item_type in "bestseller" "book/seria"
    "twoweek"
  • Ave 28.044
  • then if LCName in "MEDICINE" "MUSIC"
  • "NAVAL
    SCIENCE" "SCIENCE"
  • gt 14.645

  • else gt 33.888

77
Case of Prediction
  • Virginia Tech Paul Metz and John Cosgriff
  • Gathered a data source with both bibliometric and
    library use data to determine which journals to
    keep.
  • Received one or more individual or departmental
    votes
  • Were profiled on CARL Reveal by five or more
    individuals
  • Were borrowed twenty or more times on ILL
  • Contained ten or more publications by Virginia
    Tech authors
  • Were cited fifty or more times by Virginia Tech
    authors
  • Were reshelved fifty or more times
  • Think about the data for each criterion. Where
    did it come from?

78
Prediction Uses
  • Use data from the past to predict staffing needs
    for the future
  • Use past examples to determine when to intervene
    with late material
  • Predict what a user needs based upon
  • Searching behavior
  • Works examined
  • Predict when a digital reference question could
    be handled by a match to a database

79
The Bibliomining Process
  • Data collection
  • Data cleaning
  • Asking questions
  • Analysis (data mining, bibliometrics)
  • Presenting patterns
  • Asking new questions

80
EBL Conceptualization Multiple Library
Research Project
Patrons
Data warehouse of patron representations
  • Warehouse holds evidence for decision-making
  • Many analysis options
  • Maintain library identify
  • Libraries can benefit from consortial research

81
Importance of standard-creation projects
(COUNTER, DREW)
  • Standards for library records make
    multi-consortia data warehouses a possibility.
  • Allows for generalizable evidence for EBL
  • Supports the creation and testing of theories
    (patterns over multiple settings)
  • Entices researchers, who can create models and
    tools

82
Power of Bibliomining in Consortia
  • Library consortia using the same systems
  • Different systems need bridge programs
  • Data warehouse of information from multiple
    libraries
  • Consortia can make much stronger decisions
  • Takes competitive advantage away from
    e-publishers
  • Researchers creates generalizable results
  • Services can apply research to own library

83
Conclusions
Bibliomining is the combination of data
warehousing, data mining, bibliometrics,
statistics, and reporting tools used to extract
patterns of behavior-based artifacts from library
systems.
  • Bibliomining
  • Data Mining Bibliometrics

84
Goals of Bibliomining
  • Improved decision-making through better
    understanding of
  • Patterns in Resource Creation
  • Patron Behavior
  • Library Staff Behavior
  • Behavior of outside organizations
  • Can provide justification for
  • Library management policies and decisions
  • Acquisitions and ILL source selection
  • Collection development decisions
  • Use of library services (funding bodies)

85
Concerns with Bibliomining
  • May get no novel, actionable patterns
  • Threatening to domain experts
  • Must keep them involved
  • Time-consuming startup
  • Ensure they have input regarding patterns
  • Model may go out of date with changing
    circumstances
  • Monitoring procedures to detect when modeling
    variables change considerably

86
But
  • Bibliomining doesnt provide the whole story
  • What people didnt do
  • Who didnt visit the library
  • What was useful
  • It provides a strong baseline
  • Give you models of communities

87
People to Involve
  • Institutional Research Board (IRB)
  • Legal counsel
  • Ensures you are following state laws for library
    data
  • Library administration / Board
  • Patrons
  • If there are policies, follow them
  • If there are not, create them

88
For More Information
  • Bibliomining.com
  • Discussion list
  • Bibliography
  • OCLC WebJunction
  • Learning Center
  • Introduction to Bibliomining
  • Free!

89
Striking a Balance
  • A well-designed data warehouse strikes the
    balance between
  • Protecting Privacy
  • and
  • Maintaining a Data-Based History
  • For Evidence-Based Librarianship

90
RememberIf we delete our data-based
historythen none of this is possible.
91
Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com