Title: Dr%20Liz%20Lyon,%20Associate%20Director%20Outreach
1Digital Curation Centre
a centre of support for data curation and
preservation
UK Digital Curation Centre An Introduction
- Dr Liz Lyon, Associate Director Outreach
Grand Challenge Meeting, Bath June 2005
2Repositories and digital curation
For later use? In use now (and the future)?
Static
Dynamic
Data preservation
Data curation
maintaining and adding value to a trusted body
of digital information for current and future use
3Assuring permanent access to the records of
science the humanities?
- Long term access to primary data
- Increasing data volumes from eScience and
Grid-enabled / cyberinfrastructure applications - Changing research paradigm data-driven science,
big science - Observational data, simulations, large-scale
experimentation - Multi-media resources, statistical data,
surveys, geo-spatial data
4(No Transcript)
5Facilitate post-processing and knowledge
extraction
- Enable the acquisition of newly-derived
information and knowledge - Run complex algorithms over primary datasets
- Mining (data, text, structures)
- Modelling (economic, climate, mathematical,
biological) - Analysis (statistical, lexical, pattern
matching, gene) - Presentation (visualisation, rendering)
6(No Transcript)
7Provide additional functionality beyond digital
preservation processes
- Annotations
- Gene and protein sequences
- e-Lab books (Smart Tea Project in chemistry)
8Presentation services subject, media-specific,
data, commercial portals
Searching , harvesting, embedding
Resource discovery, linking, embedding
Data creation / capture / gathering laboratory
experiments, Grids, fieldwork, surveys, media
The scholarly knowledge cycle linking research
data to publications eBank UK
Project http//www.ukoln.ac.uk/projects/ebank-uk/
Aggregator services national, commercial
Data analysis, transformation, mining, modelling
Harvestingmetadata
Research e-Science workflows
Repositories institutional,
e-prints, subject, data, learning objects
Deposit / self-archiving
Validation
Validation
Publication
Linking
Emerging policy on open access to data
Data curation databases databanks
Peer-reviewed publications journals, conference
proceedings
9DCC people (some of them)
- Management Co-ordination
- Director Chris Rusbridge (University of
Edinburgh) - Community Support Outreach
- Led by Dr Liz Lyon (UKOLN, University of Bath)
- Service Definition Delivery
- Led by Professor Seamus Ross (HATII ERPANET,
University of Glasgow) - Development
- Led by Dr David Giaretta (Astronomical Software
Services, CCLRC) - Research
- Led by Professor Peter Buneman (Informatics,
University of Edinburgh)
10(Some of) the challenges we face
- Standards Interoperability issues technical
??soluble - Scale Volume and diversity of datasets
- Culture Bringing communities together
- Library/information science/archives document
tradition - Domain research (chemists, astronomers,
biologists) - Computer science (databases)
- Commercial suppliers (storage technology)
- Process Skills Highly-distributed organisation
- Use collaborative tools, combined skills
- Engagement Existing work key players
11User requirements analysis some sound bytes
RD issues Annotation services, Ontology
development, Automating metadata creation, Tools
and toolkits, Data Format Description Language,
Identifiers, Registries, Economic and
cost-benefits studies Advisory services
Ask-a-Curator,FAQs, reports, briefings,
awareness-raising materials, best practice
guidance, Storage media, Like Erpanet, advise
Government, Research Councils, funding
bodies Professional development Short courses,
conferences, seminars, workshops, secondments to
DCC and to working repository services Outreach
Leadership for the future, case studies, sharing
solutions, collaboration with other partners,
international peers, industry links Taxonomy of
Users
12Outline Taxonomy of digital curation users by role
Data Preservers
- 4. Policy makers
- funding bodies
- other leaders
2. Data Curators
Data publishers
1. Data Creators
3. Data Re-users
13Outline Taxonomy by significant function of
organisational entity
4. Funders
3. Learning teaching
5. Policy / strategy makers
2. Service provision
Commercial
Designated communities
14Advisory services
- Responses to queriesfrom legal to technical
guidance HELPDESK_at_dcc.ac.uk - FAQs constructed
- Informing workshops and information services
- Monthly site visits (National Institute of
Environmental eScience)
15Professional development workshops
- 2005 Programme
- Persistent identifiers June, Glasgow
- Institutional repositories July University of
Cambridge, with DSpace - Cost models July British Library, London with the
Digital Preservation Coalition - Preservation of medical databases October
Gulbenkian Institute, Lisbon with ERPANET the
Wellcome Trust
16Standards Watch
- Covering existing and emerging standards
- Working with community and standards bodies (e.g.
ISO) - Organising associates groups around new standards
developments - Initiating standardisation definitions where gaps
identified - Currently re-purposing Diffuse database of
standards materials
17Digital Curation Manual
- A world class resource
- Constructed from topic-specific chapters
- written by international experts
- editorial board comprising leading researchers
and practitioners - 45 initial topics including
- Appraisal and Selection Costs Freedom of
Information Interoperability the OAIS Reference
Model Preservation Strategies and Open Source - Less in-depth insight offered by DCC Briefing
Papers, aimed at needs of senior managers
18OAIS Reference Model Functional Model
19Audit and Certification (1)
- How can people know who to entrust with their
information? - There is a demand for a certification process for
- Repositories and components e.g. archive storage
- Software
- Certification standards (ISO 9000 and ISO 17799)
do not do the job - OCLC/RLG Trusted Digital Repositories Attributes
and Responsibilities - high level model for design, delivery and
maintenance of digital repositories
20Audit and Certification (2)
- International expert group led by RLG and NARA is
drafting a Certification standard - DCC is participating aiming for international
consensus - Draft goes to Technical Editor end of June
- DCC testbeds to support development of audit and
certification standards - Commitment to
- offer guidance on self-audit and
self-certification - carry out independent audits
- issue certificates to qualifying repositories
21Tools and Technologies
- Accumulate and Maintain Registry and online
Repository of relevant tools - Repository Implementations
- Packaging Tools
- Rendering Software
- Format Converters
- Device Drivers
22Representation Registry development
Development info see http//dev.dcc.ac.uk for
details of Wiki and email list open to all
- Simple PHP prototype
- Scoping study
- Formats, standards, tools
- More robust prototype in development
- Based on ebXML JAXR
- Potentially distributed, cooperative maintenance
model - Representation information describe CCLRC
(science) data using EAST, - Links to PRONOM, GDFR and other pilots
- Aim to handover to services
23Research agenda (1)
- Publishing integrating scientific databases
- Archiving past states of volatile databases
- Database provenance and annotation
- Organisational dynamics of trusted repositories
- Automating metadata extraction
- Cost-benefit analysis of data curation
- Rights and responsibilities
24The database picture
Curated data classified, cleaned, annotated,
integrated, cross-linked
Source data
25Curated databases some issues
- Integrating, publishing and citing data so that
someone else can use it. - Annotating existing data and moving annotations
to other databases - Provenance where did this data come from?
- Archiving how do you preserve something that is
constantly changing?
26Research agenda (2)
- Publishing integrating scientific databases
- Archiving past states of volatile databases
- Database provenance and annotation
- Organisational dynamics of trusted repositories
- Automating metadata extraction
- Cost-benefit analysis of data curation
- Rights and responsibilities
- Public domain, public interest, public funding
paper Waelde McGinley
27www.dcc.ac.uk
28- www.ijdc.net
- Launch planned July
- Peer-review Editorial Board
- Peter Buneman Editor (research)
- Production editor Philip Hunter
- Papers for submission are very welcome!
291st DCC International Conference
- Location - Bath UK
- 29-30 September 2005
- Keynote speakers
- Clifford Lynch CNI
- Graham Cameron European Bio-informatics
Institute - DCC Research update
- Social highlights
30Associates Network
Goals Develop understanding, share best practice,
advance research, promote recognition, develop
consensus Membership International groups,
national bodies, industry partners, funders,
research groups, HEIs, FEIs, individuals Benefit
s Early access to RD outputs, advisory services,
training, input to definition and design,
community participation Discussion Forum
www.dcc.ac.uk Please join us!