Title: European Molecular Biology Institute European Bioinformatics Institute
1http//www.embl.de/ from 1974
http//www.ebi.ac.uk/ from 1996
2The EBI Mission
- To provide Bioinformatics Facilities for the
Scientific Community - To become a flagship laboratory for research in
bioinformatics - To provide bioinformatics training
- To help disseminate standards technologies
3Role of Bioinformatics
- To Support Experimental Biology
- To Collect and Archive Data
- To provide Framework and Integration
- To give Easy Access to Data
- To make New Discoveries through Data Analysis
- To predict through modelling
- To facilitate application and exploitation of
academic research in Medicine, Agriculture,
Health and Environment
4Dramatic Changes in Biology over last 5 years
- Data Explosion New Types of Data
- Move towards High-Throughput Biology
- Move towards Systems Biology
- Much larger community often naïve users
- Growth of Applied Biology molecular medicine,
agriculture, food, environmental sciences
5Genomes
Literature
Expression- profiling
Metabolic data
Proteome data
Biochemistry
Bioinformatics
Comparative genomics
Mutant/RNAi data
Hypotheses and in silico models
6Molecules to Cells to Organisms
Protein
E.coli Genome
Genomes
7Systems Biology
Methyl
Methyl
CheB
ATP
CheA
CheW
CheW
ADP
Pi
Pi
CheY
Flim C
Output
8Molecular Basis of Disease
p53 tumour suppressor core domain cancers of
many types
Cu-Zn Superoxide Dismutase - Autosomal
dominant Amyotrophic lateral sclerosis
9 From Structure to Functional Annotation
10Linking to Domain data, eFamily
Sequence Mapping, SIFTS
MSDchem ligand data
PQS biological assemblies
Electron Density Visualisation AstexViewer
MSDPro, MSDlite
SSM fold matching
Surface Matching
MSDsite Active sites
11From Structure To Biochemical Function
- Gene ? Protein ? 3D Structure ? Function
- Given a protein structure
- Where is the functional site?
- What is the multimeric state of the protein?
- Which ligands bind to the protein?
- What is biochemical function?
12High throughput
- A new sequence every 4 seconds
- 600 000 web requests a day
- 100 000 users
- 5-10 core databases
- 20 000 000 cross-references
- About 160 other databases
13Data Growth
14Web requests per day(excluding Ensembl)
15ftp year million files Terabytes 2001
4.5 11914 2002 5.6
11809 2003 13.5 43860 2004 17.3
60508 2005 26.3 85396
16Web Servers Requests millions 2002 118631650 11
8 2003 255399724 255 2004 354235704
354 2005 482076196 482
17Distinct hosts served Number users(millions) 200
2 1586883 1.5 2003 2784974
2.7 2004 3656109 3.6 2005 3919564 3.9
18 dynamic pages domains (2005) 1. .uk (United
Kingdom) 21.14 2. .com (Commercial)
17.16 3. unknown domain 13.37
4. unresolved numerical addresses 11.05
5. .edu (USA Higher Education) 5.29 6. .net
(Networks) 5.27 7. .fr (France) 4.76
8. .it (Italy) 4.68 9. .de (Germany)
2.81 10. .nl (Netherlands) 2.00
19The Services of the EBI
- Nucleotide sequences
- Genes
- Transcription information
- Protein sequences
- Protein families
- Macromolecular structures
- Molecular interactions
- Pathways
- Metabolic information
- Scientific Literature
20Structure of EBI Services
21Structure of EBI Services
Database Integration and External Services Lopez
Apweiler,Stoesser
Stoehr, Zhu
Henrick
Brazma
Birney
22Structure of EBI Research
23Structure of EBI Research
Text Mining
Computational Genomics
Structural Proteomics
Phylogeny Evolution
Neuroinformatics
24EBI DATA BASES
25(No Transcript)
26EMBL-BankDNA sequences
27EMBL-BankDNA sequences
SWISS-PROT TrEMBL Protein Sequences
28EMBL-BankDNA sequences
SWISS-PROT TrEMBL Protein Sequences
EMSD Macromolecular Structure Data
29EMBL-BankDNA sequences
SWISS-PROT TrEMBL Protein Sequences
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
30EMBL-BankDNA sequences
EnsEMBL Human Genome Gene Annotation
SWISS-PROT TrEMBL Protein Sequences
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
31EMBL-BankDNA sequences
EnsEMBL Human Genome Gene Annotation
SWISS-PROT TrEMBL Protein Sequences
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
IntActProtein Interactions
32GKB Pathways
EMBL-BankDNA sequences
EnsEMBL Human Genome Gene Annotation
SWISS-PROT TrEMBL Protein Sequences
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
IntActProtein Interactions
33Integration
34Integrative science demandsintegrative resources
- EBI databases have a backbone of integrative
links - 20 000 000 cross-references support
trans-database navigation - Is this good enough?
- sparse and coarse-grain
- not straight-forward to use
35Integrative science demandsintegrative resources
- Major efforts involved in integration
- Interpro database of protein families, domains
and functional sites. - Interg8 data integration project co-ordinated
by the EBI, to provide an integrated layer for
the exploitation of genomic and proteomic data. - GRID technologies
36European Patent Office
- Support the inclusion of sequence data in the
public databases - Development of tools to capture sequence data
- Run their searches at the EBI
- (similar arrangements in USA and Japan ensure
exchange) - Analogous systems being developed for structure
information
37Industry Support
38Industry Support
- Current successful Industry programme for Pharma
- Quarterly meetings
- RD Training - workshops
- Industry Forum
- Funded by subscriptions
- New SME programme under development
39New Data
40http//www.ebi.ac.uk/2can/
41The Magic Search Box
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)