Title: Toward a Canadian National Collaborative Data Infrastructure
1Toward a Canadian National Collaborative Data
Infrastructure
- Name
- Contributions by
- Lynn Copeland, Kathleen Shearer, Chuck Humphrey,
Mike Ridley - date
1
2Introduction CNCDI
- Collaborative initiative
- Enable Canada to be a research innovation leader
- Form locus for entrepreneurial innovation
- Seeking CFI funding
2
3Scope
- Focus on research data
- factual records
- primary sources for research
- validate research findings
- Will
- Be widely available across sectors
- Facilitate cross-fertilization, solutions,
products, understanding - Ensure all necessary privacy rules are enforced
4Data infrastructure
- Flexible, reliable
- Secure, privacy, open
- Local, global
- Affordable, high performance
- Ensure protection of privacy
4
55
6Why is data stewardship important?Needs
addressed
- Enables replication and verification of research
results. - Avoids duplication of research.
- Increases the visibility and impact of research.
- Encourages collaboration
- Accelerates scientific progress.
6
7Anticipated result
- Research data
- widely available to academy, industry, citizenry
- Facilitate x-fertilization of ideas, domains
- Producing novel solutions
- Support new product development
- Promote greater understanding of complex problems
7
8Proposed Canadian data stewardship principles
- All research data are recognized to be valuable
assets for the Canadian and global research
community. As assets, the proper treatment of
research data includes full lifecyle management,
asset assessment, risk management and
preservation. - Emerging information technologies are to be
monitored, evaluated and applied to improve
methods of producing, providing open access to
and preserving research data. - Sustainable solutions to data stewardship are to
be achieved through institutional commitments and
collaboration with communities of practice. - Data stewardship skills and norms, including
roles and responsibilities, are to be established
from within research communities of practice
through policy support, training and curriculum
development. - Research Data Strategies Working Group
8
9Context international
- We are on the verge of a great new leap in
scientific capability, fuelled by data. - (HLEG on Scientific Data, EU)
- Scientific data are not homogenous in any
manner. The disciplines generating data have
widely varying practices with respect to the
reporting of experimental, observational and
calculation conditions and the resulting
metadata. Archiving practices, in terms of direct
deposition into community databases, inclusion in
peer-reviewed papers, etc. differ greatly. Yet
because almost all data are generated and managed
electronically, the dream exists of making
everything available. - John Rumble, Jr.
9
10Context Canada
- Sci/Tech strategy
- 10B on RD with associated data
- Granting councils data stewardship policies
(SSHRC, selective CIHR) - Previous consultations
- NDAC (SSHRC, NAC)
- NCASRD (NRC, CIHR, NSERC, CFI)
- CDIS (LAC)
- Open data initiatives municipal (Edmonton,
Ottawa, Vancouver), federal - A large, complex issue!
10
1111
1212
13Long overdue
13
14Libraries unique contributions
- the digital equivalent of libraries
- preserve and provide access to other types of
content - strong links with the disciplinary communities
- organized collections required
- In partnership with researchers and technology
experts
14
15Libraries contributions
- provide metadata management, access and support
for data sets - support and host similar managed content
(Institutional Repositories, Digital Content) - play an advocacy role
16Data stewardship examples
- Bioinformatics e-Fungi, ATLAS, GIMS, Columbia,
BioMART, BioWarehouse - International Social Science CESSDA, ICPSR,
GESIS - Canada
- Data Liberation Initiative
- RDC
- TREC
- ltodesigt
- Islandora
- ABACUS
- IPY ltodesigt, UA data sharing (iRODS) initiative
(w. CANARIE) - TAPoR
- VENUS, Neptune
- CANARIE Community Cloud
- Many University data centres
- has significant library leadership/involvement
16
17Campus library leadership/involvement
- Most CARL libraries provide metadata management,
access and support for data sets such as
Statistics Canada data, ICPSR social science
data, other local sets (police records, etc.) - Most CARL Libraries play an advocacy role in
continuing the Data Liberation Initiative,
creating municipal, provincial and federal Open
Access policies etc. - Most CARL libraries support and host similar
managed content (Institutional Repositories,
Digital Content)
18Related Initiatives CARL Data Management
Working Group
- Survey of data initiatives across Canada
- Data Management Awareness Toolkit, Research
Data Unseen Opportunities - Addressing the Research Data Gap A Review of
Novel Services for Libraries document - Research Data Management Seminars
- Plan to encourage Library and Information
Schools to introduce a research data stewardship
stream
18
19Related Initiatives Research Data Strategy
Working Group
- A collaborative effort to address the challenges
and issues surrounding the access and
preservation of data arising from Canadian
research - Task Group 1 Policies, Funding, and Rewards and
Recognition - Task Group 2 Infrastructure and Services
- Produced Principles of Data Stewardship and Gap
Analysis - Organizing a Data Summit September 2011 to
raise awareness of the issue with high level
policy makers
19
20CNCDI activities fall 2010-spring 2011
- Touching base
- CFI, SSHRC, CIHR, NSERC
- CISTI (RDSWG), CNC-CODATA, CRKN
- Canarie, CUCCIO, Compute Canada
- Steering Committee Vision
- Data Model WG DM, costs, plan
- Researcher consultation- March 10/11, 2011
- Proposal (ultimately)
20
21Vision
- The Canadian National Collaborative Data
Infrastructure - (CNCDI) project will build a national
infrastructure to support - the innovative re-use of data created through
publicly-funded - research. The project will build on and enhance
the existing - patchwork of data management services and
infrastructures in - Canada to create a comprehensive, integrated
network of data - repositories capable of supporting Canadian
research across - all disciplines far into the future.
21
22Vision (2)
- The data infrastructure will exist within an
ecosystem comprised of several layers - 1. A national collaborative network of digital
data repositories with trusted status and
institutional permanence (ingest and access
services) - 2. Preservation storage repositories (long term
management) - 3. Tools and applications for data re-use and
analysis - 4. Skills, training, and support services
22
23Canadian data landscape
Preservation Function Individual Centric Domain Centric Institutional Centric
Long-term preservation Domain archives Institutional repositories
Short to mid-term preservation Data centres Staging repositories
No preservation responsibilities Website FTP site Research web portals Data libraries
23
2424
2525
26Questions - Topics
1.    Â
1.  Should/could there be more collaboration
among IT services, library, library IT and data
centres? 2.   Should/could/is there
project/enterprise/funding support for managing
digital research data? 4.   Will this help a
research data management enterprise solution?
5.   What should be the relationship between
this research data management project and campus
HPC? 6.   What are possible governance
models? 7.   Who should CARL collaborate with on
a campus/national level 8.   How can the CIO and
campus research infrastructure help advance this
project?
26
27Questions/comments?
27