ArrayExpress A public database for microarray based gene expression data http:www'ebi'ac'ukmicroarra - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

ArrayExpress A public database for microarray based gene expression data http:www'ebi'ac'ukmicroarra

Description:

Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza ... Support on ontologies and CVs. Minimize free text, removal of synonyms. MIAME encouragement ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 29
Provided by: jaak3
Category:

less

Transcript and Presenter's Notes

Title: ArrayExpress A public database for microarray based gene expression data http:www'ebi'ac'ukmicroarra


1
ArrayExpressA public database for microarray
based gene expression datahttp//www.ebi.ac.uk/m
icroarray/
  • European Bioinformatics Institute
  • EMBL-EBI
  • Alvis Brazma, Helen Parkinson, Ugis Sarkans,
    Mohammadreza Shojatalab, Jaak Vilo team

MGED IV, Boston, February 2002
2
ArrayExpress
Tuesday, February 12th, 2002 Opened to public
  • Standards MIAME-compliant
  • Data model MAGE-OM
  • Data input MAGE-ML, web
  • Data output HTML, MAGE-ML,
  • TAB-delimited, link to Expression
    Profiler
  • Data curation Team of curators
  • Data sets Yeast, human

3
General overview
ArrayExpress
MAGE-ML
MAGE-ML
MIAMExpress
Expression Profiler
Internet
www
4
ArrayExpress component architecture
www
Application server Java servlets MAGE-OM
ArrayExpress
Main database SQL derived from MAGE-OM
Data warehouse gene-centred queries
Submission/ curation
Images file server
MAGE-ML
5
ArrayExpress - features
  • MIAME-compliant, MAGE-ML, MAGE-OM
  • Can deal with
  • raw quantitation data
  • processed data
  • data transformations
  • Independent of
  • experimental platforms
  • image analysis methods
  • data normalization methods

6
ArrayExpress details
  • Database schema derived from MAGE-OM
  • Standard SQL, we use Oracle
  • Data loader for MAGE-ML - generated
  • Web interface (first release 12.2.2002)
  • Queries by experiment, array, sample
  • Browsing
  • Object model-based query mechanism, automatic
    mapping to SQL

7
Simplified ArrayExpress model
8
MIAMExpress
  • Data annotation and submission tool
  • MIAME based web interface
  • Experiment, Array, Protocol submissions
  • Uses CV/ontology wherever possible
  • Creates MAGE-ML files for loading into
    ArrayExpress
  • Based on MySQL, Perl, CGI, Apache

9
(No Transcript)
10
MIAMExpresssubmission procedure
11
MIAMExpress design and future
  • Species and domain specific pages and ontologies,
    ontology development
  • Life-span of data submissions is long
  • Curation control, submissions tracking
  • Interaction with ArrayExpress
  • Full MAGE-OM, data updating
  • Usability, flexibility, scalability, platform
    independence
  • User needs, free in-house installation

12
ArrayExpress curation effort
  • User support and help documentation
  • Submission support for MIAMExpress
  • Support on ontologies and CVs
  • Minimize free text, removal of synonyms
  • MIAME encouragement
  • Help on MAGE-ML
  • Goal to provide high-quality, well-annotated
    data to allow automated data analysis

13
Accession numbers
  • E-MEXP-234 Experiment 234 via MIAMExpress
  • E-SANG-25 Experiment 25 from Sanger
    Institute
  • A-AFFY-1034 Array description 1034 from
    Affymetrix
  • P-LABL-5 Protocol 5 for labeling

14
Data in ArrayExpress
Now
Work underway
  • Human data (ironchip) from EMBL
  • Yeast data from EMBL
  • S. pombe data Sanger Institute
  • TIGR array descriptions
  • Affymetrix chip designs
  • Direct pipeline from Sanger (Rob Andrews)
  • HGMP mouse
  • EMBL mosquito
  • (Add your name here!)

15
Data browsing and queries
16
(No Transcript)
17
Experiment info
18
Sample info
19
General overview
ArrayExpress
MAGE-ML
MAGE-ML
MIAMExpress
Expression Profiler
Internet
www
20
Expression Profiler EPCLUST
FOLDER
DATA
SELECT
ANALYZE
A CLUSTER
URLMAP
21
101 Sequences relative to ORF start
YGR128C 100
gtYAL036C chromo1 coord(76154-75048(C))
start-600 end2 seq(76152-76754) TGTTCTTTCTTCTT
CTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTA
GTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTGCTTC
TTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGC
ACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGC
TGCTTTCTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCG
GCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACT
CTTTTTCGCCATCTTTCACTTATCTGATGTTCCTGATTGCCCTTCTTATC
CCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTT
CAATGGGCTTAAAGCTTGAAAAATTTTTTCACATCACAAGCGACGAGGGC
CCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGA
TATTACGGTGTGATGAGGGCGCAATGATAGGAAGTGTTTGAAGCTAGATG
CAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_ gtYAL025C
chromo1 coord(101147-100230(C)) start-600
end2 seq(101145-101747) CTTAGAAGATAAAGTAGTGAATT
ACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGG
GTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACCACGAATTGCTGAG
TAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTAT
CCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTTGTA
AAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCAT
ACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAG
AATTTATAATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTT
TTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATG
CAGTAGGGTAATAAACCTTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTT
TCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGA
TATTGCATTGCTTAGTTCTTTCTTTTGACAGTGTTCTCTTCAGTACATAA
CTACAACGGTTAGAATACAACGAGGAT_ATG_ ... gtYBR084W
chromo2 coord(411012-413936) start-600 end2
seq(410412-411014) CCATGTATCCAAGACCTGCTGAAGATGCTT
ACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTT
TCGCAGCTGTTATTATCATCACCCCAGCATTACGAACATTCTCCACATCA
AAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGT
CTACATACATACATACATCTCGTACATAAATACGCATACGTATCTTCGTA
GTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTC
AAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTT
CTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGAC
GCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTC
ACTTCAACGGACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCA
GCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACAT
CAAAAAACAACTTTCATTACTGTGATTCTCTCAGTCTGTTCATTTGTCAG
ATATTTAAGGCTAAAAGGAA_ATG_
GATGAG.T 152/70 2453/508 R7.52345
BP1.02391e-33G.GATGAG.T 139/49 2193/222
R13.244 BP2.49026e-33AAAATTTT 163/77
2833/911 R4.95687 BP5.02807e-32TGAAAA.TTT
145/53 2333/350 R8.85687 BP1.69905e-31TG.A
AA.TTT 153/61 2538/570 R6.45662
BP3.24836e-31TG.AAA.TTTT 140/43 2254/260
R10.3214 BP3.84624e-30TGAAA..TTT 154/65
2608/645 R5.82106 BP1.0887e-29 ...
GATGAG.T TGAAA..TTT
22
1 mismatch
GATGAG.T TGAAA..TTT
GATGAG.T W/30 TGAAA..TTT
Upstream sequence (600bp)
23
(No Transcript)
24
Components of Expression Profiler http//ep.ebi.a
c.uk/
External data, tools pathways, function, etc.
EPPPI Prot-Prot ia.
EPGO GeneOntology
Expression data
EPCLUST Expression data
GENOMES sequence, function, annotation
URLMAP provide links
SEQLOGO
SPEXS discover patterns
PATMATCH visualise patterns
25
Ackowledgments the team (3)
1999 November
MGED 1 in Hinxton, EBI
Alvis Brazma Alan Robinson Jaak Vilo
26
Ackowledgments the team (5)
2000 August
Alvis Brazma, Alan Robinson
Database
Ugis Sarkans
Expression Profiler
Research, students
Jaak Vilo
Thomas Schlitt
27
Ackowledgments the team (9)
2001 June
Alvis Brazma
Database
Curation
MIAMExpress
Ugis Sarkans
Helen Parkinson
Mohammadreza Shojatalab
Expression Profiler
Research, students
Jaak Vilo
Thomas Schlitt
Patrick Kemmeren
Katja Kivinen
Johan Rung
28
Ackowledgments the team (19)
2002 February
Alvis Brazma
Database
Curation
MIAMExpress
Ugis Sarkans
Helen Parkinson
Mohammadreza Shojatalab
Susanna Sansone
Ahmet Oezcimen
Gonzalo Garcia
Philippe Rocca-Serra
Niran Abeyguna- wardena
Ele Holloway
Expression Profiler
Research, students
Jaak Vilo
Thomas Schlitt
Lev Soinov
Patrick Kemmeren
Katja Kivinen
Anastasia Samsonova
Misha Kapushesky
Johan Rung
Koichi Tazaki
Write a Comment
User Comments (0)
About PowerShow.com