Title: Data Preservation Imperatives: The Role of the US National Science Foundation
1Data Preservation Imperatives The Role of the US
National Science Foundation
- Lucy Nowell, Ph.D.
- Office of Cyberinfrastructure
- Conference on Permanent Access to the Records of
Science - Brussels, Belgium
- 15 November 2007
2Outline
- NSF Office of Cyberinfrastructure
- Motivation for Data Preservation
- Role of Universities and Academic Libraries
- Characteristics of the Digital Age
- NSF OCI Data Strategic Vision and Goals
3(No Transcript)
4NSF Act of 1950
- To promote the progress of science
- Encourage develop a national policy for the
promotion of basic research and education in the
math, physical, medical, biological, engineering
and other sciences - Initiate support basic scientific research in
the sciences
5U.S. President
Science Advisor
Office of Management and Budget
Other boards, councils, etc.
Office of Science and Technology Policy
Science Advisor
Major Departments
Health and Human Services
Commerce
Agriculture
Interior
Homeland Security
Defense
Energy
Independent Agencies
National Aeronautic and Space Administration
Environmental Protection Agency
Nuclear Regulatory Commission
Smithsonian Institution
Other agencies
6- Research Directorates
- Biological Sciences
- Computer Info. Science Eng.
- Education Human Resources
- Engineering
- Geosciences
- Mathematical Physical Sciences
- Social, Behaviorial Econ. Sciences
7New Modes of Investigation The conduct of
science and engineering is changing and evolving.
This is due, in large part, to the expansion of
networked cyberinfrastructure
NSF Strategic Plan 2006-2011
8Office of CyberInfrastructure (OCI)
Dan Atkins Office Director José Muñoz Dep.
Office Dir.
Judy Hayden Mary Daley Irene Lombardo Deborah
White
Terry Langendoen
Lucy Nowell
Diana Rhoten
Kevin Thompson
Steve Meacham, Abani Patra
Learning Workforce
Virtual Organizations
Software/ Middleware
High Performance Computing
Data
9Cyberinfrastructure
is the organized aggregate of technologies that
enable us to access and integrate todays
information technology resourcesdata and
storage, computation, communication,
visualization, networking, scientific
instruments, expertiseto facilitate science and
engineering goals. - Fran Berman, Director,
SDSC
10CI Vision 4 Interrelated Perspectives
Collaboratories, Observatories Virtual
Organizations
Learning Workforce Development
11The Fragility of Memory in a Digital Age
In 1964, the first electronic mail message was
sent from either MIT, the Carnegie Institute, or
Cambridge University. The message does not
survive, however, and so there is no documentary
record to determine which group sent the
pathbreaking message.
Report of the Task Force on Archiving of Digital
Information Commission on Preservation and Access
and the Research Libraries Group
12NASA plans new search for missing moon tapes
- Aug. 15, 2006, 513PM
- Seth Borenstein, Associated Press
- WASHINGTON NASA said today it was launching an
official search for more than 13,000 original
tapes of the historic Apollo moon missions.
13Study Resource type Resource half-life
Koehler (1999 and 2002) Random Web pages 2.0 years
Nelson and Allen (2002) Digital Library Object 24.5 years
Harter and Kim (1996) Scholarly Article Citations 1.5 years
Rumsey (2002) Legal Citations 1.4 years
Markwell and Brooks (2002) Biological Science Education Resources 4.6 years
Spinellis (2003) Computer Science Citations 4.0 years
Source Koehler W. (2004) Information Research,
9 (2), 174
14Replication of Results A Cornerstone of Science
- the results of one scientist's experiment are
not considered reliable until another scientist
has replicated them. The reproducibility of
results plays several different, crucial roles in
sciencebut in many circumstances,
considerations of time and money often make
reproducibility impractical. - The Key Role of Replication in Science, Nancy S.
Hall, The Chronicle of Higher Education, 10
November 2000
15Replication of Results
- First and foremost, scientists attempt to
reproduce someone else's experiment if they doubt
that the results are accurate, or if the results
contradict a view that is widely accepted in the
field. - An experiment is so reproducible that replicating
it becomes a test of the student if the student
cannot replicate the experiment, it is the
student who is at fault. - As a training exercise, a new person in a group
might be asked to repeat experiments that others
have already performed, both to familiarize the
newcomer with the work of the group and to give
the older members a sense of the newcomer's
expertise. - The Key Role of Replication in Science, Nancy S.
Hall, The Chronicle of Higher Education, 10
November 2000
16Replication of Data Collection Not Always Feasible
- Medical experiments carried out over years or
decades, involving hundreds or even thousands of
human subjects. - Events that are singular and beyond the
experimenter's control, like comets, earthquakes,
and volcanic eruptions. - The Key Role of Replication in Science, Nancy S.
Hall, The Chronicle of Higher Education, 10
November 2000
17A Global Response
- Ensuring research data are easily accessible,
so that they can be used as often and as widely
as possible, is a matter of sound stewardship of
public resources.
Organization for Economic Cooperation and
Development (OECD) Promoting Access to Public
Research Data for Scientific, Economic, and
Social Development
18A Challenge for Society
- If we are effectively to preserve for future
generations the . corpus of information in
digital form that represents our cultural record,
we need to commit ourselves technically,
legally, economically, and organizationally to
the full dimensions of the task.
Report of the Task Force on Archiving of Digital
Information, 1996 Commission on Preservation and
Access and the Research Libraries Group
19The Universities
- Ever since their inception, universities have
been occupied with the fundamental elements of
what we now call 'knowledge management', i.e. the
creation, collection, preservation and
dissemination of knowledge.
Andre Oesterlinck, Knowledge Management in
Post-Secondary Education Universities
20- The distinctive mission of the University is to
serve society as a center of higher learning,
providing long-term societal benefits through
transmitting advanced knowledge, discovering new
knowledge, and functioning as an active working
repository of organized knowledge.
Mission Statement of the University of California
21The Academic Libraries
- It is to the research library community that
others will look for the preservation of
digital assets, as they have looked to us in the
past for reliable, long-term access to the
traditional resources and products of research
and scholarship.
Association of Research Libraries (ARL) Strategic
Plan 2005-2009
22- Information is the currency of the digital age
and information integration is the means for
mobilizing that currency for discovery,
innovation, learning, and progress.
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Before the Digital Age A World Constrained to 4
Dimensions
285th Dimension
CI
29Opening a 5th dimension through
cyberinfrastructure is the revolutionary force of
the digital age
30Characteristics of a 5D World (in priority order)
- Time and place are no longer barriers to
participation and interaction - Access is open to specialists and non-specialists
alike - Information is the primary driver for progress
- The realm of the possible is expanded through new
capabilities, resources, and mechanisms
31Individuals, groups, organizations, and nations
that dont embrace the 5th dimension will fall
behind in the digital age
32The World Is Flat - Thomas Friedman
The flat world is expanding -Anonymous OCI
program director
- More room for innovation
- New spaces for learning and discovery
- Expanded opportunities for collaboration and
interaction - Greater capabilities for research and
education
33NSF Draft Strategic Plan for Data, Data
Analysis, and Visualization
Chapter 3
http//www.nsf.gov/pubs/2007/nsf0728/index.jsp
34Vision
- Science and engineering digital data are
routinely deposited in a well-documented form,
are regularly and easily consulted and analyzed
by specialists and non-specialists alike, are
openly accessible while suitably protected, and
are reliably preserved. - NSF Cyberinfrastructure Vision for 21st Century
Discovery, Chapter 3
35Goals
- To catalyze the development of a system of
science and engineering data collections that is
open, extensible and evolvable. - To support development of a new generation of
tools and services facilitating data acquisition,
mining, integration, analysis, and visualization.
36Principles
- Data generated with NSF funding will be
accessible and reliably preserved - Research/education opportunities determine
investment priorities - Broad community engagement is necessary in
reviewing and prioritizing data activities
37Principles (contd)
- Data is only useful if it can be found,
understood, and analyzed - Legitimate privacy, confidentiality, and
intellectual property rights must be protected - International, interagency, and public- private
partnerships are essential
38Digital Data Preservation and Access Framework
39DataNet
- A robust and resilient national and global
digital data framework for preservation and
access to the resources and products of the
digital age - Provide reliable digital preservation, access,
integration and analysis capabilities for science
and/or engineering over a decades-long timeline
sustainability - Continuously anticipate and adapt to changes in
technologies user needs and expectations - Engage at the frontiers of science engineering
research education, with research development
to drive the leading edge forward - Serve as component elements of an interoperable
data preservation and access network, spanning
national and international boundaries shared
governance and standards - Creation of new types of organizations that fully
integrate all of these capabilities
40DataNet Partners
- Combine expertise in library and archival
sciences computer, computational and information
sciences cyberinfrastructure and domain
sciences and engineering - Develop models for economic and technological
sustainability over multiple decades - Engage at the frontiers of science and
engineering research and education - Work cooperatively and in coordination to to
create a functional data network with
revolutionary new capabilities for information
access, use, and integration without regard to
conventional barriers such as data type and
format, discipline or subject area, and time and
place/institution.
41DataNet Partner Responsibilities
- Provide for full data management life cycle
- Data deposition/acquisition/ingest
- Data curation metadata management
- Data protection, including privacy
- Data discovery, access, use, dissemination
- Data interoperability, standard, integration
- Data evaluation, analysis, visualization
- Engage in research central to DataNet
responsibilities - Education training
- Community user input assessment
- International engagement collaborate
coordinate closely with preservation access
organizations to catalyze formation of a global
data network - Foreign collaborators are expected to secure
support from their own national sources.
42Summary Strategic Plan
- Promote a change in culture
- Catalyze development of a national digital data
framework - Support new generations of tools, services, and
capabilities
43NSFNet Traffic September 1991
44The World Wide DataNet _at_ TT0
Data point-of-presence
45The World Wide DataNet _at_ TTN
46The Whole Is Greater Than the Sum of Its Parts
- Climate Change
- Pandemic
- Drought and Starvation
- Sustainable Energy
- Aging Populations
- Human Behavior under Stress
- Etc.
47Thank you!