Title: The Swiss Institute of Bioinformatics
1The Swiss Institute of Bioinformatics
2SIB activities
- The SIB brings Swiss experts in bioinformatics
together and provides high quality services to
the national and international scientific
community. - The SIB is a privileged partner of Swiss
Universities - Members of the SIB include research groups in
Geneva, Lausanne, Basel and Zurich. - The SIB participates in Masters degrees of
partner universities and organises a doctoral
school in Bioinformatics.
3The SIB in Switzerland
4Group Leaders
- 12 membres
- Ron Appel, Proteome Informatics, UniGE
- Amos Bairoch, Swiss-Prot, UniGE
- Bastien Chopard, Computer simulations, UniGE
- Philipp Bucher, Computational Cancer Genomics,
ISREC - Mauro Delorenzi, Bioinformatics Core Facility,
ISREC - Félix Naef, Computational Sytems Biology, ISREC
- C. Victor Jongeneel, Vital-IT et Transcriptome
Analysis, LICR - Olivier Michielin, Molecular Modeling, UniL et
LICR - Michael Primig, Genome Bioinformatics, UniBas
- Torsten Schwede, Protein Structure
Bioinformatics, UniBas - Erik van Nimwegen, Genome Systems Biology, UniBas
- Mihaela Zavolan, RNA Regulatory Networks, UniBas
- Gaston Gonnet, EPFZ
- Joerg Stelling, EPFZ
- Evgeny Zdobnov, UniGE
- Bernard Moret, EPFL
- Marc Robinson-Rechavi, UniL
5Size of SIB groups
6SIB collaborators
7SIB revenues
8Swiss repartition (2006)
9SIB activities
- The SIB has three missions research
development, education and service. -
- research and development activities related to
the databases and software developed within the
Institute. - Masters degrees of partner universities and
Swiss doctoral school in Bioinformatics. - databases of international standing (Swiss-Prot,
Prosite, EPD, Swiss-2Dpage, Human Chromosome 21,
TrEST, TrGen, AGBD, Hits, Swiss Model Repository,
GermOnline). - software and services that can be accessed from
the SIB web servers (Melanie, T-COFFEE, PFTOOLS,
ESTScan, Dotlet, SEView, Snp_detect, Mmsearch,
Swiss-Model, DeepView/Swiss-PdbViewer, MIMAS). - services to the Swiss biomedical research
community within the framework of EMBnet and NCCR - Together with the Universities of Lausanne,
Geneva and Basel, the Swiss Federal Institutes of
Technology of Lausanne (EPFL) and Zurich (EPFZ),
and three private partners, Hewlett-Packard Inc.,
Intel Corp. and Oracle, the SIB contributed to
the creation of a high-performance informatics
platform (Vital-IT) exclusively dedicated to life
sciences.
10Scientific Council
- Seven members
- Peer Bork, European Molecular Biology Laboratory,
Germany. - Michael Dunn, Conway Institute of Biomolecular
and Biomedical Research, University College
Dublin, Ireland. - Takashi Gojobori, National Institute of Genetics,
Japan. - Manolo Gouy, C.N.R.S., Université Claude
Bernard-Lyon 1, France. - Wilhelm Gruissem, Institute of Plant Sciences,
ETH Zentrum, Zürich. - Thomas Lengauer, Chairman, Max-Planck-Institut
für Informatik, Germany. - Christine Orengo, Dept. of Biochemistry
Molecular Biology, University College London, UK.
11The Computational Biology Challenge
- "In principle, the string of genetic bits holds
long-sought secrets of human development,
physiology and medicine. In practice, our ability
to transform such information into understanding
remains woefully inadequate". - The Genome International Sequencing Consortium,
Initial sequencing and analysis of the human
genome, Nature 409 860-921 (2001) Emphasis
added
12Computational Biology Today
- Genome analysis from raw sequence data to fully
assembled and annotated genomes - Proteome analysis from mass spectra of complex
protein mixtures to full identification of their
components and analysis of their structure - Expression profiling microarrays, SAGE, MPSS,
ESTs - Comparative genomics phylogeny, polymorphisms,
fingerprinting - Modelling of macromolecular systems deducing
properties from atomic interactions - Modelling of complex systems protein
interactions, pathways, regulatory networks,
whole organ models Systems Biology
13Computational Biology needs HPC!
- Problems of scale
- Genomes with millions to billions of nucleotides
- Profiling experiments with tens of thousands of
data points measured on hundreds or thousands of
samples - Thousands of protein mass spectra representing
GigaBytes of data/experiment - Problems of complexity
- Combinatorial gt3104 interacting gene products
can create more functions than there are atoms in
the Universe - Structural gt105 dynamically interacting atoms
make up the smallest of molecular machines
14Life Science ICT Needs
Network
Storage
Computing Speed
Problem
100 Mbps
300 TB
gt 10 TFlops
Genome Assembly
500 Mbps
1s PB
gt 100 TFlops
Protein Structure Prediction
2 Gbps
10s PB
100 TFlops Per DNA-protein interaction
Classical Molecular Dynamics
10 Gbps
100s PB
1 PFlops
First Principle Molecular Dynamics
???
1000s PB
gt1 PFlops
Simulation of Biological Networks
15The Vital-IT Center
- Joint venture between academic and industrial
partners - Universities of Lausanne, Geneva and Basel, Swiss
Federal Inst. of Technology Lausanne, Ludwig
Institute for Cancer Research - Hewlett-Packard, Intel Corp. and Oracle
- Managed by the Swiss Institute of Bioinformatics
- An HPC center exclusively dedicated to life
sciences - Software development and optimization
- HPC resources for biology and medicine
- Consulting for the life science and health
industries
16Scope of Vital-IT
- RD projects
- Porting of existing code to Itanium
- Optimisation of code for Itanium architecture
- Adaptation of software to cluster environment
- Ad hoc software development for technology
platforms - Infrastructure projects
- Compute engine behind Web interfaces
- Database engine for genomic/proteomic data
- Computational resource for bioinformatics
research projects - Providing resources to SwissBioGrid, SystemsX
- Transnational Resource for EU Countries
17Vital-IT in SwissBioGRID
- SwissBioGRID collaboration
- large-scale computational applications in
bioinformatics, biosimulation, chemoinformatics
and bio-medical sciences by utilizing distributed
high-performance computing, high speed networks,
massive data collections and archives - CSCS manages GRID infrastructure
- Vital-IT has primary responsibility for providing
bioinformatics Web services, validation and
optimization
18Vital-IT in SystemsX
- ETHZ, Uni ZH, UniBS (and others to come)
- CHF 10 mio funding for 2006-07
- Scientific Nodes
- Center of Biosystems
- Competence Center for Systems Physiology
- Center for Model Organism Proteomics
- Institute for Molecular Systems Biology
- Glue Projects (planned)
- Center for Information Sciences and Databases
- Center for Molecular Analysis and Bioinformatics
- Center for Cellular Nano Analytics
- Vital-IT will collaborate to provide core
computing resources for SystemsX
19Thank you
- THANK YOU
- http//www.isb-sib.ch
20ExPASy server
- Expert Protein Analysis System
- http//www.expasy.org
- Access Statistics January 31, 2006
- Total number of connections since August 1993
- 743605459
- June 2006 (connections)
- 22190251 (approx. 9/sec)
- Mirror sites
- USA, Canada, Australia, China, Brasil
21access to ExPASy
22ExPASy connections / country
23ExPASy connections / country / inhab.
24 25Thank you !
26The two components of bioinformatics
- macromolecular data banks
- Sequence data banks of DNA (EMBL/GenBank) or
proteins (Swiss-Prot) genomes (FlyBase),
3D-structures (PDB), references (Medline), etc - software tools
- analysis of intrinsic properties of sequences
- comparison of sequences
- analysis and storage of gene expression data
- analysis and storage of proteomics data
- visualization and modeling of 3D-structures
27Genome analysis Philipp Bucher
- Signal search analysis (SSA)
- a method to discover and characterize sequence
motifs that occur at a constrained distance from
a physiological site, for instance a
transcription initiation site. - The Eukaryotic Promoter Database (EPD)
- a database of experimentally characterized
eukaryotic promoters (transcription initiation
site). - CleanEx a database of heterogeneous gene
expression data, based on a consistent gene
nomenclature. - Provides access to public gene expression data
via unique gene names.
28Genome AnalysisErik van Nimwegen Biozentrum
U.Basel
- Genome-wide predictions of regulons in bacterial
genomes, using comparative genomics. - Identification and prediction of putative
transcription factor binding sites on a
genome-wide scale, using significantly conserved
fragments between promoter regions of orthologous
genes in related bacterial species. - Scaling-laws in functional gene-content
- Comparison of the number of genes in different
functional classes across genomes, ranging from
the simplest bacteria to human. - the number of genes in a given functional class
is related to the total number of genes in the
genome for a large number of high-level
functional classes.
29Regulation of gene expressionMihaela Zavolan
Biozentrum, U.Basel
- development of computational methods for
genome-wide annotation of transcription factor
binding sites in mammalian genomes -
- analysis of the functionality of alternative
splice forms. - analyzing mouse, human and rat transcriptomes
- annotation of small RNA sequences obtained
through large-scale cloning, - discovery of novel regulatory RNAs
- characterization of the downstream targets of
miRNAs.
30the Universal Protein Resource UniProtKB
- The past 2 decades have seen the creation of
Swiss-Prot and TrEMBL operated by researchers
from the Swiss Institute of Bioinformatics (SIB)
and the European Bioinformatics Institute (EBI), - as well as the Protein Information Resource
operated by the National Biomedical Research
Foundation (NBRF). - These groups are combining the strengths of each
of their databases into a central public
resource the Universal Protein Resource or
UniProtKB
31Central Dogma of Molecular Biologyhigh-throughpu
t data production
DNA (Genome)
(Genotype)
DNA sequencing
Alternative Splicing
Transcription
RNA (Transcriptome)
microarrays
Post-translational modifications
Translation
Protein (Proteome)
mass spectrometry
(Phenotype)
Structure Function
32Genome studies
- Signal search analysis (SSA) (P. Bucher)
- Eukaryotic Promoter Database (EPD) (P. Bucher)
- CleanEx a database of heterogeneous gene
expression data, based on a consistent gene
nomenclature. (P. Bucher) - Genome-wide predictions of regulons in bacterial
genomes, using comparative genomics. (E. van
Nimwegen) - Scaling-laws in functional gene-content
- Comparison of the number of genes in different
functional classes across genomes, ranging from
the simplest bacteria to human. (E. van Nimwegen)
33Regulation of gene expression (M. Zavolan)
- development of computational methods for
genome-wide annotation of transcription factor
binding sites in mammalian genomes -
- analysis of the functionality of alternative
splice forms. - analyzing mouse, human and rat transcriptomes
- annotation of small RNA sequences obtained
through large-scale cloning, - discovery of novel regulatory RNAs
- characterization of the downstream targets of
miRNAs.
34Gene expression
- Storage and analysis of microarray data (M.
Delorenzi) - Discrimination and gene selection methods for
cancer diagnosis (M. Delorenzi) - Recognition and prediction of genetic aberrations
in gene expression data based on a hidden Markov
model (M. Delorenzi) - Development of knowledgebases and microarray data
management/analysis solutions. (M. Primig) - Expression profiling of gametogenesis in yeast
and mammals - Identification of candidate genes for the
regulation of fertility in mammals by large-scale
expression profiling - Development of a novel cross-species and
subject-oriented approach to genome annotation
and microarray data management. - Microarray Data Management and Analysis System
(MIMAS)
35Computational Systems Biology (F. Naef)
- Multi-dimensional functional data, i.e. from
expression arrays, open the door to a systems
level understanding of biological complexity. - theoretical and computational methodologies for
studying functional properties and design
principles of genetic networks, relevant to
cancer biology.
36Protein Identification using Mass Spectrometry
protein from gel/ PVDF/LC fraction
tryptic digestion peptide extraction
TYGGAAR
EHICLLGK
1-DE, 2-DE, LC
PSTTGVEMFR
GANK
Mass spectrometry, peptide mass fingerprints
PMF identification
unmodified and modified peptides
MS/MS identification
MS Fragmentation
Mass spectrometry, peptide MS fragments
37Protein 3D-structure prediction by homology
- Homology modeling
- Comparative protein modeling
- Knowledge-based modeling
- Using experimental 3D-structures of related
family members (templates), calculate a model for
a new sequence (target) Swiss-Model
38Free energy calculations
Cytotoxic T Lymphocyte (CTL) activity against
tumor cells
TCR
Peptide
MHC
X-ray structure of the T cell receptor (TCR)
bound to a peptide MHC complex