Title: MIAMEEnv
1MIAME/Env
- Towards a minimum environmental meta-data
specification developed for functional genomics - Norman Morrison
- University of Manchester and NEBC, Oxford.
2Outline
- Where did MIAME/Env come from?
- Why was a yet another data standard needed?
- How we actually went about it developing it?
- Future considerations
- Functional Genomics Data Standards
- Env as a stand alone specification
3Remit
- To contribute to the systematic support of
environmental transcriptomic data management.
4Diversity of projects
- Stickleback and flounder treated with a range of
pollutants. - Arabidopsis (including specific ecotypes and
cross strains) at particular stages of
development subjected to various environmental
conditions. - Earthworms (sentinel species) subjected to
various environmental toxins. - Plants mounting defence to caterpillar attack as
a result of chemical signals mediated by soil
micro-organisms.
5Can we use something that already exists?
Potentially
- MIAME is the Minimal Information for the
Annotation of Microarray Experiments. - The result of a MGED (www.mged.org) driven effort
to codify the description of a microarray
experiment. - MIAME aims to define the core that is common to
most experiments. - It tries to specify the collection of information
that would be needed to allow somebody to
completely reproduce an experiment that was
performed elsewhere.
6Data Safari
77 Step Development Process
- Identification of the need for a standard
(MIAME/Tox and MIAME/Nut) - The formation of a working group (community)
- Selection of case studies
- Development of the specification checklist,
specification attributes, definitions, and
allowed terms - Knowledge acquisition (discussion of requirements
within the community, case studies) - Selection and definition of a set of attributes
- Development of allowed terms / controlled
vocabularies - Development of a suitable implementation
- Development of a repository to store, view, and
distribute annotations - Final annotation of meta-data to compliant format
and submission to the repository
8Circadian rhythmicity in tidal worms
9Algal blooms affected by viruses
107 Step Development Process
- Identification of the need for a standard
- The formation of a working group (community)
- Selection of case studies
- Development of the specification checklist,
specification attributes, definitions, and
allowed terms - Knowledge acquisition (discussion of requirements
within the community, case studies) - Selection and definition of a set of attributes
- Development of allowed terms / controlled
vocabularies - Development of a suitable implementation
- Development of a repository to store, view, and
distribute annotations - Final annotation of meta-data to compliant format
and submission to the repository
11What is my scope?
- Minimal criteria
- Context dependencies
- Some attributes that are minimally sufficient for
describing a particular strain of Mouse will not
apply to the description of a particular strain
of Bacteria, vice-versa. - Not known / not-applicable
- Derived meta-data versus Primary meta-data
- Some types of Primary meta-data can be derived
but will they ever be as accurate? - What are the overheads in producing derived
meta-data? - It would be nice to only have to do it once.
- Will the derived data change with better methods?
- Record the method.
12Investigation categories
- Field Trials
- wild organism/biosource
- natural environment
- Conditioned field trials
- wild organism/biosource
- natural environment then conditioned in the lab
- animal husbandry conditions (preconditioning)
- treatments (conditioning)
- Lab experiments
- lab reared or obtained from a standard provider
- animal husbandry conditions (preconditioning)
- treatments (conditioning)
13Environmentally important concepts
- Individuals, Populations and Communities.
- Geographic Parameters
- Topography
- Phenotypic Characteristics
- Behavioural
- Physiological
- Anatomical
- Environmental Parameters
- Climate?
- Photoperiodicity
- Lunar Phase?
- Experimental Phase
- Discrete, Relative and Absolute time
considerations.
14(No Transcript)
157 Step Development Process
- Identification of the need for a standard
- The formation of a working group (community)
- Selection of case studies
- Development of the specification checklist,
specification attributes, definitions, and
allowed terms - Knowledge acquisition (discussion of requirements
within the community, case studies) - Selection and definition of a set of attributes
- Development of allowed terms / controlled
vocabularies - Development of a suitable implementation
- Development of a repository to store, view, and
distribute annotations - Final annotation of meta-data to compliant format
and submission to the repository
16Considerations
- Prescriptive and Specific vs. Flexible and
Generic. - Too prescriptive and specific -gt page after page
of information. - Too flexible and generic -gt Thing.
17Meta-data quality
- Accuracy
- Completeness
- Currency
- Portability
- Credibility
- Important to be able to reference external
sources rather than duplicate them - Functional annotation that is not updated
- Gene names can change or obtain synonyms, without
this being reflected in the data - Chip files can be out of date even on the
manufacturers web-site
18Generic Attribute Construct
- Entity or Thing
- A concept that represents an entity that exists,
potentially described in another ontology. - Property or Modifier (Measured)
- A characteristic of the entity that is measured,
for example, size, weight, loudness, gestation
period. - Value
- The value - not necessarily quantitative.
- Unit
- Unit where appropriate.
- Assay
- The assay used to measure the property of the
entity.
19Phenotypic Characteristic
- Free text
- Calipers were employed to measure the length of
the dorsal fin. The fin was measured to be 1.2 cm.
Can also be applied to relative characteristics,
ie dissolved oxygen content in mg/l
207 Step Development Process
- Identification of the need for a standard
- The formation of a working group (community)
- Selection of case studies
- Development of the specification checklist,
specification attributes, definitions, and
allowed terms - Knowledge acquisition (discussion of requirements
within the community, case studies) - Selection and definition of a set of attributes
- Development of allowed terms / controlled
vocabularies - Development of a suitable implementation
- Development of a repository to store, view, and
distribute annotations - Final annotation of meta-data to compliant format
and submission to the repository
21maxdLoad2
227 Step Development Process
- Identification of the need for a standard
- The formation of a working group (community)
- Selection of case studies
- Development of the specification checklist,
specification attributes, definitions, and
allowed terms - Knowledge acquisition (discussion of requirements
within the community, case studies) - Selection and definition of a set of attributes
- Development of allowed terms / controlled
vocabularies - Development of a suitable implementation
- Development of a repository to store, view, and
distribute annotations - Final annotation of meta-data to compliant format
and submission to the repository
23Circadian rhythmicity in tidal worms
24Algal blooms affected by viruses
25The 8th Step
- Submission of compliant meta-data to a public
repository via a common exchange format.
26Functional Genomics Standards
- Object Model
- FuGE (Functional Genomics Experiment - Object
Model) - http//fuge.sourceforge.net
- Ontology
- FuGO (Functional Genomics Ontology)
- http//mged.sourceforge.net/ontologies/
27Introduction to FuGE
- A model for developing data standards for
functional genomics - General classes for protocols, investigation
structure, data structure - Also models equipment, software, contacts etc.
- Can be extended for use in a particular domain
- Uses ontologies extensively, such as MGED
Ontology (or next version FuGO)
28Status of FuGE
- Milestone 1 release Sept 2005
- UML (Object Model)
- XML Schema
- Milestone release being tested by MGED
(Micorarray Gene Expresion Dataand PSI
(Proteomics Standards Initiative) - Will form the basis for the next version of
MAGE-ML and protein separation standards - Also has been presented to metabolomics community
29Introduction to FuGO
- An ontology for describing information about a
functional genomics experiment - To include a top level structure of general
concepts for example Investigation, Assay, Study. - Can be extended for use in a particular domain
30Status of FuGO
- Historically, FuGO was once MO (MGED Ontology).
- Top level structure to be ratified at MGED8.
Sept 11-13, 2005. - Existing classes in MO (transcriptomics) will
continue to be reorganised as a template for
other domains to follow suite (proteomics,
metabalomics, genomics).
31RSBI - Reporting Structure for Biological
Information
32RSBI - People
- RSBI Coordinator
- Susanna Sansone
- EBI
- Environmental Genomics WG
- Norman Morrison
- NEBC, NERC Post-Genomics Proteomics programme,
EBI - Nutrigenomics WG
- Philippe Rocca-Serra,
- EBI, European Organization NuGO
- Toxicogenomics WG
- Jennifer Fostel,
- NIEHS-NCT, NCTR-FDA, HESI Genomics Committee,
EBI, - Contributors/collaborators
- Alex Garcia (EBI, PhD student at Uni of
Queensland, Australia) - Chris Taylor (EBI, PSI)
33(No Transcript)
34Future considerations
- Plug and play domain specific data standards.
- Q. I want to describe an investigation looking at
the environmental impact of certain toxins on a
sentinel species using proteomics. What data
standard(s) should I be using? - Similarly
- Q. I want to describe the environmental context
of a genome sequenced from an environmental
isolate. Are there existing standards or parts of
standards I should be using?
35meta-data overlap
Functional Genomics Investigation
36Thanks
- Development of the Env specification for
environmental biology and its application to
transcriptomics as MIAME/Env - Norman Morrison, A. Joseph Wood, David Hancock,
Sonia Shah, Luke Hakes, Bela Tiwari, Peter Kille,
Andrew Cossins, Matthew Hegarty, Michael J.
Allen, William H. Wilson, Peter Olive, Kim Last,
Cas Kramer, Thierry Bailhache, Jonathan Reeves,
Denise Pallett, Justin Warne, Karim Nashar, Helen
Parkinson, Susanna-Assunta Sansone, Philippe
Rocca-Serra, Robert Stevens, Jason Snape, Dawn
Field, Andy Brass - NERC Environmental Genomics and Post Genomics and
Proteomics Programmes. - http//envgen.nox.ac.uk