Standards and gene expression data - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Standards and gene expression data

Description:

... Burdett, Anna Farne, Ele Holloway, James Malone, Margus Lukk, Helen Parkinson, ... The submitters and microarray collaborators. GEO especially Tanya Barrett. 27 ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 28
Provided by: exter52
Category:

less

Transcript and Presenter's Notes

Title: Standards and gene expression data


1
Standards and gene expression data from data
archiving to extracting biological knowledge
Helen Parkinson, PhD Production
Coordinator European Bioinformatics Institute
2
Talk content
3
Data sharing
4
Standards Landscape
Nature Reviews Genetics, Vol 7, p.593-605 (August
2006)
5
MIAME Minimal Information about a microarray
experiment
6
So Has MIAME been successful?
7
ArrayExpress?
8
ArrayExpress history
2003 100 Expts
2004 TIGR Export 420 Expts
2005 Re-funded SMD Export 1200 Expts
2006 New UI 1600 Expts
2002 12 Expts
2007 GEO Affy Data Import Phase 1 gt2607 Expts
2008 6631 Expts
2001
4
9
ArrayExpress Overview
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
Getting to a summary level data Atlas
14
Our Use Cases
  • Query support (e.g, query for 'cancer' and get
    also 'leukemia')
  • Over-representation analysis in groups of samples
    (analogous to the use of GO terms in
    over-representation analysis in groups of genes)
  • Ontology visualisation e.g., presenting an
    ontology tree to the user of what is in the
    database
  • Data integration by ontology terms e.g., we
    assume that 'kidney' in independent studies
    roughly means the same, so we can count how many
    kidney samples we have in the database
  • Intelligent template generation for different
    experiment types in submission or data
    presentation
  • Summary level data

15
Oh the complexity!
Publication
External links
Normalisation
16
Application Ontology Status Quo
  • Text mining at data acquisition
  • Tuned for queries, structured for use in
    ArrayExpress GUI
  • Multi-species aspect

DW
AE
06.04.2014
16
17
Semantic Roadmap
  • Position of the ArrayExpress Experimental Factor
    Ontology in the bigger picture
  • Key is orthogonal coverage, reuse of existing
    resources and shared frameworks

Chemical Entities of Biological Interest (ChEBI)
Relation Ontology
Cell Type Ontology
Various Species Anatomy Ontologies
Anatomy Reference Ontology
Disease Ontology
AE Ontology
18
What lies beneath?
19
Where does the data come from
20
What is curation?
21
2007 Affymetrix Data landscape
22
Data exchange or the failure to federate
  • We need all the data in house to re-process it
  • We do not have a data exchange agreement with GEO
  • SOFT vs. MAGE-ML/MAGE-TAB
  • No ontology usage
  • Some free text annotation, little process
    annotation
  • Mass data acquisition
  • 80 solution (or less)
  • Employing text mining
  • Data reprocessing
  • Cost effective, eliminates user support
  • Using spreadsheets (not XML)
  • We could almost eliminate the database if we can
    index the files

23
2008 Data Landscape
24
Flexible Data Access Models
  • GUIs biologists
  • Hyperlinks
  • FTP bioinformaticians
  • Web services workflows
  • XML data dumps
  • Spreadsheets
  • Direct SQL access (not for ArrayExpress)
  • Schema and code if you want it
  • Geek for a week

25
Lessons learned
  • Complex architecture means a lot of SW
    engineering
  • Biologists like excel, Bioinformaticians like
    tab-delimited files
  • Spreadsheets scale, easy to check, harder to
    parse
  • Generic systems will be future proof
  • Legacy format converters are needed
  • You dont need to keep everything
  • Text based queries most common
  • Text mining very useful
  • Scaling problems are hard to fix
  • Bleeding edge technologies should be used
    sparingly
  • Federation doesnt really work for the goals we
    have
  • Archiving alone does not add value
  • Training is important and expensive

26
Useful tools for life sciences data management
  • Excel
  • Whatizit text mining software from EBI
  • Our spreadsheet builder, checkers and format
    parsers tab2mage.sf.net
  • OBO foundry ontologies esp OBI, CTO, Disease
    Ontology
  • Taverna for building workflows
  • BASE open source microarray data management
    tool
  • BioMart data warehouse for biological data
    www.biomart.org

27
Acknowledgements
  • ArrayExpress Production Team
  • Tomasz Adamusiak, Tony Burdett, Anna Farne, Ele
    Holloway, James Malone, Margus Lukk, Helen
    Parkinson, Tim Rayner, Eleanor Williams, Holly
    Zheng
  • Ugis Sarkans ArrayExpress Development Team Leader
  • Misha Kapushesky Main Atlas Developer
  • Gabriella Rustici Training officer
  • Alvis Brazma Group Leader
  • Uniprot and Ensembl teams
  • Funders EC (FELICS, EMERALD, DIAMONDS, GEN2PHEN,
    MUGEN), NIH-NHGRI, EMBL
  • The submitters and microarray collaborators
  • GEO especially Tanya Barrett
Write a Comment
User Comments (0)
About PowerShow.com