Title: ToxicogenOMICS Omics standards, resources and data at EBI
1ToxicogenOMICSOmics standards, resources and
data at EBI
Susanna-Assunta Sansone Nutr/Toxicogenomics
Projects Coordinator
- EMBL EBI
- The European Bioinformatics Institute
- Cambridge,UK
- 5th Virtual Conference on Genomics and
Bioinformatics (2005)
2Talk Structure
- Setting the scene
- EMBL-EBI
- - Activities and tox-relevant collaborations
- Data communication standards
- Standardization initiatives
- - A review
- Standards-compliant implementation _at_ EBI
- ArrayExpress infrastructure
- - Array-based toxicogenomics resources and data
3European Molecular Biology Laboratory
- Status
- Non-profit, academic
- International network
- Research institutes
- Services (EBI)
- Funds
- EU member statesIsrael
- EU, UK, NIH Grants
- Collaborations
Heidelberg
Grenoble
Hamburg
Monterotondo
Cambridge UK
EBI
Service
Research
Training
Industry
4Services _at_ EBI
Cambridge UK
EBI
Service
Research
Training
Industry
5Industry Programme _at_ EBI
- Support large companies and SMEs
- Including pharma, biotech and software vendors
- Provide training and education
- Contribute to open data communication standards
- Defining submission and publication requirements
- Optimizing database/software interoperability
- Ensuring data quality and integrity
Cambridge UK
EBI
Service
Research
Training
Industry
6Toxicogenomics - Collaborations
- Standard-compatible infrastructure, data
exchange - NCTR-FDA Center for Toxicoinformatics
- NIEHS National Center for Toxicogenomics (NCT)
- Netherlands Toxicogenomics Center (NTC)
- UK Natural Environmental Research Council (NERC)
Data Center - Data reposition, software development
- ILSI-HESI Toxicogenomics Committee
- European Nutrigenomics Organization (NuGO)
- Quality metrics, confidence measures
- UK Measurement for Biotechnology (MfB)
tox-project
7The European Scenario
- The 7th Amendment to the Cosmetics Directive
- REACH Chemical Policy
- 3 Rs (refine, reduce and replace animals)
8The European Scenario
- The 7th Amendment to the Cosmetics Directive
- REACH Chemical Policy
- 3 Rs (refine, reduce and replace animals)
9Toxicogenomics - Potentials
- Improving methods to assess toxicity
- Gain insight into the molecular mechanisms
- Reduce the length of long-term toxicology study
- Limit the numbers of animals used
- Holding promises for
- Drug/biologics discovery and development
- - Target selection, risk assessment and quality
control - Chemical/drug-induced disease processes
- - Evaluation and prediction
- Medical practice
- - Diagnostic, therapeutic decisions and
monitoring - Regulatory science
- - Support and facilitate the decision-making
process
10Toxicogenomics - Challenges
- Technical validation
- Reproducibility, specificity, sensitivity and
accuracy - QA, QC and SOPs
- International gold standards
- -gt Data comparability
- Scientific validation
- Biological/toxicological relevance of the
findings - -gt Signatures, biomarkers and mechanism of
actions - Data communication
- Data management
- Data review
- -gt Expertise acquisition
11Communicating Omics Data
- Unlock the value in the data
- Large in volume, data and metadata
- Heterogeneous in data types
- Incompatible in formats
- Standards are required
- Content -gt Minimal descriptors
- - Report the same core essentials
- Semantic -gt Controlled vocabularies or ontology
- - Use the same word and mean the same thing
- Storage -gt Database models
- - Optimize data queries and mining
- Exchange -gt Common formats
- - Enable interoperability and data integration
12Talk Structure
- Setting the scene
- EMBL-EBI
- - Activities and tox-relevant collaborations
- Data communication standards
- Standardization initiatives
- - A review
- Standards-compliant implementation _at_ EBI
- ArrayExpress infrastructure
- - Array-based toxicogenomics resources and data
13Standardization Initiatives
- Open efforts
- Community vetted
- Vendor neutral
- Multidisciplinary
- Data management
- Different driving forces
- Data submission to regulatory bodies
- Data exchange/deposition to databases
- Categories
- Regulatory driven discussion (broader
understanding) - World-wide organizations (agreed
recommendations) - Measurements and methods validation focus
- Omics technology communities
14Standardization Initiatives
- SEND Consortium (Standard for Exchange of
NonClinical Data) - v1.6 model, flat file format to report animal
toxicity study - SEND CDISC Study Data Tabulation Model
(SDTM) - -gt Electronic, toxicity data submission to FDA
- Pharmacogenomics Standards Group (CDISC, HL7 and
I3C) - Clinical pharmacology, Clinical genomics and
Pre-clinical/non-clinical genomics - Minimal descriptors, data content and format
- -gt Requirements for pharmacogenomics data
submission to FDA - DSSTox (US EPA) (Distributed Structure-
Searchable Toxicity) - Database network project
- Standard Chemical Fields and Structure Data
Format - -gt Standard, structure-annotated chemical
toxicity data
15Standardization Initiatives
- NAS (National Academy of Science)
- Committee on Emerging Issues and Data on
Environmental Contaminants - Expert workshops and recommendations
- -gt Application of genomics in risk assessment
- ECVAM (ICCVAM / NICEATM)
- (European Center for the Validation of
Alternative Methods ) - Expert workshops and recommendations
- Toxicogenomics Task Force
- -gt Validation of array-based toxicogenomics test
methods - OECD / IPCS (Organization for Economical
Cooperation and Development / International
Program for Chemical Safety) - Internationally agreed instruments, decisions
and recommendations - Expert workshops and recommendations
- -gt Toxicogenomics methods in chemical assessment
16Transcriptomics DomainMicroarray Gene
Expression Data(MGED) Since 1999
17MGED Society
18MIAME - Reporting Structure
19MAGE-ML - Exchange Format
20MGED Ontology - Semantics
21RSBI - Outreach
22Proteomics DomainHuman Proteome
Organization(HUPO) Proteomics Standard
Initiative (PSI)Since 2001
23PSI Group
24PSI MI - Exchange Format
25PSI MS - Exchange Format
26PSI GPS - General Proteomics
- Standards for integrated representation of
methods and data from proteomics experiments - Minimum Information About a Proteomics
Experiment (MIAPE) - Technology-specific modules associated with a
parent document - Extensive review process (ongoing)
- XML formats for data exchange and submission
- Ontology for unambiguously worded metadata/data
files - Include terms for mzData format
27Metabolomics DomainMetabolomics SocietySince
.few months ago!
28The SMRS Group - Reporting
29The Metabolomics Society - Journal
30Removing political barriers
31A MGED/PSI-like stamp of authority
32Synergy - Obstacles?
- ILSI HESI Toxicogenomics Genomics Committee
- Multi-stakeholders consortium (Industry,
Regulators, Academia, Government) - Data Exchange and Evaluation sub-Committee
- -gt Promote synergy Coordination and mediation
(??!!)
33Functional Genomics Context
- Standardization activities in omics technologies
- Reporting structures, object models and
CVs/ontology - Pieces of the functional genomics puzzle
- Standards should stand alone
- Standards should also function together
- - Build it in a modular way
- - Maximize interactions
- Benefits
- Facilitate integration of omics data
- - Experimentalists, data miners, reviewers
- Optimize development of tools (time and costs)
- - Manufactures and vendors covering in multiple
technologies
34Functional Genomics Context
Community-specific efforts (e.g. toxicology,
nutrition, environment)
Biology
Generic features
-gt Design of investigations -gt Sample descriptors
Technology
Significantly affect structure and content of
each standards
35Functional Genomics Standards
- Functional Genomics Experiment (FuGE) OM, XML
and API - Generic classes to fit any type of experiment
- - Does NOT replace, but underpins different
formats - Open effort, driven by MGED and PSI developers
- - Use cases supplied by the MGED RSBI WGs (Tox,
Nutr, Env) - Functional Genomics Ontology (FuGO) in OWL
- Core descriptors to support computational
analysis of datasets - - Common to any type of functional genomics
experiment - Open effort, driven by MGED and PSI developers
- Engaging with metabolomics groups
- - Use cases supplied by the MGED RSBI WGs (Tox,
Nutr, Env)
36Talk Structure
- Setting the scene
- EMBL-EBI
- - Activities and tox-relevant collaborations
- Data communication standards
- Standardization initiatives
- - A review
- Standards-compliant implementation _at_ EBI
- ArrayExpress infrastructure
- - Array-based toxicogenomics resources and data
37ArrayExpress Tox-Infrastructure
38ArrayExpress - Content
39ArrayExpress - Main Tox Datasets
- ILSI-HESI, EHP Mar 2004
- Genotoxicity E-TOXM-1-9 408 hybs
- Nephrotoxicity E-TOXM-10-12 218 hybs
- Hepatotoxicity E-TOXM-13-14 247 hybs
- Syngenta CTL, EHP Nov 2004
- Estrogens E-AFMX-12-13 61 hybs (Affy
MG_U74Av2) - Procter Gamble , Tox Science, 2004
- Estrogen receptor (ER) agonists E-TABM-12 118
hybs (Affy RG_U34A / RAE230A) - Dutch National Institute for Public Health, EHP
May 2004 - Hexachlorobenzene E-TOXM-15 96 hybs (Affy
RG_U34A) - Bayer, Tox Science, 2004
- Hepatotoxicants E-TOXM-16 137 hybs (Affy
RG_U34A) - Foreseen from our direct collaborators
- Netherland Toxicogenomics Center, NuGO (nutr),
NERC (ecotox)
40Querying Data - 3 Interfaces
41Querying Data -1- Repository
QUERY THE WAREHOUSE Simple
Tox-MIAMExpress (MySQL)
www
2
QUERY THE REPOSITORY
www
1
QUERY THE WAREHOUSE Complex
- Retrieve elements
- of the submission(s)
- Experiment descriptions
- Data
- Array descriptions
- Protocols
HTML
www
3
Expression Profiler
Other Databases _at_ EBI Genomics -gt EnsEMBL,
GO Proteins -gt UniProt Chemicals -gt ChEBI
www.ebi.ac.uk/arrayexpress
42Querying Data -1- Repository
43Querying Data -1- Repository
- Current functionalities
- - Data Selection
- - Data Transformation
- - Missing Value Imputation
- - Hierarchical Clustering K-groups Clustering
- - Clustering Comparison
- - Signature Algorithm
- - Sequence Homology
- - SPEXS Promoter Discovery
- - Visual Pattern Matching
- - Ordination (COA, PCA)
- - Between Group Analysis
- - Three-way Similarity Analysis
- - GO Annotation
- - Statistical methods (in R)
- -gt T-Test, Bonferroni and Hochberg
corrections -
- Analytical Methods (under development)
- - Text-based data mining
44Querying Data -2- Warehouse
QUERY THE WAREHOUSE Simple
Tox-MIAMExpress (MySQL)
www
2
QUERY THE REPOSITORY
- Perform gene-centric queries
- Gene descriptors
- Protein IDs
www
1
QUERY THE WAREHOUSE Complex
- Retrieve elements
- of the submission(s)
- Experiment descriptions
- Data
- Array descriptions
- Protocols
HTML
www
3
Expression Profiler
Other Databases _at_ EBI Genomics -gt EnsEMBL,
GO Proteins -gt UniProt Chemicals -gt ChEBI
www.ebi.ac.uk/arrayexpress
45Querying Data -2- Warehouse
46Querying Data -3- Warehouse
QUERY THE WAREHOUSE Simple
Tox-MIAMExpress (MySQL)
www
2
QUERY THE REPOSITORY
- Perform gene-centric queries
- Gene descriptors
- Protein IDs
www
1
QUERY THE WAREHOUSE Complex
- Retrieve elements
- of the submission(s)
- Experiment descriptions
- Data
- Array descriptions
- Protocols
HTML
www
3
Expression Profiler
- Combined queries
- Submissions elements vs
- Expression values vs
- Tox endpoints data
Other Databases _at_ EBI Genomics -gt EnsEMBL,
GO Proteins -gt UniProt Chemicals -gt ChEBI
www.ebi.ac.uk/arrayexpress
47Querying Data -3- Warehouse
Prototype interface not final!
48Querying Data -3- Warehouse
Prototype interface not final!
Show me gene expression values
for Bayer samples with
plasma glucose gt 10 mM/L and
bilirubins(s) total gt 4 uM/L
49Acknowledgements and Resources
- Toxicogenomics/Nutrigenomics Project, in
particular - Philippe Rocca-Serra
- Sergio Contrino
- ArrayExpress Group -lead by Alvis Brazma- in
particular - Ugis Sarkans
- Misha Kapushesky
- Funding
- EU projects
- BBSRC
- ILSI-HESI
- NIEHS-NCT
www.ebi.ac.uk/microarray/Projects/tox-nutri