Title: Primig lab
1Bioinformatics I -- Databases
Primig lab michael.primig_at_unibas.ch http//www.bi
oz.unibas.ch/primig Thomas Aust Roopa Basavaraj
(visiting scientist) Michel Bellis (visiting
scientist) Guenda Berthold Philippe
Demougin Leandro Hermida Reinhold Koch Ulrich
Schlecht Christa Wiederkehr Roland Zuest
2Bioinformatics I -- Databases
Primig lab michael.primig_at_unibas.ch http//www.bi
oz.unibas.ch/primig Thomas Aust Roopa Basavaraj
(visiting scientist) Michel Bellis (visiting
scientist) Guenda Berthold Philippe
Demougin Leandro Hermida Reinhold Koch Ulrich
Schlecht Christa Wiederkehr Roland Zuest
3Bioinformatics I -- Databases
Schwede lab Torsten.schwede_at_unibas.ch http//www.
bioz.unibas.ch/schwede Jozef Aerts Juergen
Kopp Flavio Monigatti Franziska Roeder Rainer
Poehlmann
4Bioinformatics I -- Databases
What is a database? How do you make
one? Biological Databases Knowledgebases Novel
ideas more Info at http//www.biozentrum.unibas
.ch/personal/primig/ Follow the gtgtgtteachingltltlt
link.
What is a database? How do you make
one? Biological Databases Knowledgebases Novel
ideas more Info at http//www.biozentrum.unibas
.ch/personal/primig/ Follow the gtgtgtteachingltltlt
link.
What is a database? How do you make
one? Biological Databases Knowledgebases Novel
ideas more Info at http//www.biozentrum.unibas
.ch/personal/primig/ Follow the gtgtgtteachingltltlt
link.
5Bioinformatics I -- Databases
What is a database?
A database is a structured collection of data
Data INPUT gtgtgt Information OUTPUT
Data INPUT gtgtgt Information OUTPUT
6Bioinformatics I -- Databases
What is a relational database?
A relational database is a set of tables
containing data belonging to defined categories
Data INPUT gtgtgt Information OUTPUT
7Bioinformatics I -- Databases
How do you make one?
A relational database management system (RDBMS)
lets you construct, update, and administrate a
relational database. An RDBMS takes Structured
Query Language (SQL) statements entered by a user
and creates, updates, or provides access to the
database.
8Bioinformatics I -- Databases
RDBMS
Open Source mySQL PostgreSQL
Commercial IBM-DB2 Oracle
9Bioinformatics I -- Databases
Accessing relational databases
You also need a Graphical User Interface (GUI).
PHP (recursive acronym for "PHP Hypertext
Preprocessor") is a widely-used Open Source
general-purpose scripting language that is
especially suited for Web development and can be
embedded into HTML
Perl is derived mostly from the C programming
language. Perl's process, file, and text
manipulation facilities make it particularly
well-suited for tasks involving e.g. database
access, graphical programming, and world wide web
programming.
10Bioinformatics I -- Databases
How do you make one?
- Database Model
- Analyse aims (submission/curation system)
- Define entities tables (user, submission)
- Define attributes (name, phone, email)
- Define relationships between entities (user makes
submission) - Draw diagram
11Bioinformatics I -- Databases
New
Assign Submission
GeO
Curate Submission
Author
Curator
Author
Author
Delete Revision
Revise
Rejected
Accepted
Delete Publication
GeO
GeO
Author
Curator
GeO
Curate Revision
Assign Revision
GeO
Deleted
GeO
GeO
Christa Wiederkehr
12Bioinformatics I -- Databases
How do you make one?
- Database Model
- Analyse aims (submission/curation system)
- Define entities tables (user, submission)
- Define attributes (name, phone, email)
- Define relationships between entities (user makes
submission)
13Bioinformatics I -- Databases
Christa Wiederkehr
14Bioinformatics I -- Databases
Biological Databases DNA
DNA Sequence Data EBI http//www.ebi.ac.uk/ NCBI
http//www.ncbi.nlm.nih.gov/ DDBJhttp//www.ddbj
.nig.ac.jp/
15Bioinformatics I -- Databases
Global data synchronization
16Bioinformatics I -- Databases
EBI EMBL Release 72 contains 18,324,246
sequence entries comprising 23,090,186,146
nucleotides
17Bioinformatics I -- Databases
Biological Databases DNA
DNA Sequence Data submission at http//www3.ebi.ac
.uk/Services/webin/Sbm.cgi
18Bioinformatics I -- Databases
Biological Databases proteins
Protein Structure Data Protein Databank (PDB) at
http//www.rcsb.org/pdb/
Search 17107 Petide, Protein and Virus
Structures
19Bioinformatics I -- Databases
Biological Databases proteins
Protein Structure Data Submission at
http//deposit.pdb.org/adit/
20Bioinformatics I -- Databases
Biological Databases compounds
Small Molecules Klotho DB Biochemical Compounds
Declarative Database at http//www.biocheminfo.org
/klotho/ LIGAND DB at http//www.genome.ad.jp
/kegg/catalog/compounds.html
21Bioinformatics I -- Databases
Biological Databases RNA
- Expression data - RNA
- Microarray data repositories
- GeneOmnibus (NCBI) at
- http//www.ncbi.nlm.nih.gov/geo/
- ArrayExpress (EBI) at
- http//www.ebi.ac.uk/arrayexpress/
- MIAME Minimal Information About a Microarray
Experiment
22Bioinformatics I -- Databases
23Bioinformatics I -- Databases
Biological Databases RNA
- Expression data - RNA
- Expression data visualization
- Stanford Expression Connection at
- http//genome-www4.Stanford.EDU/cgi-bin/SGD/expres
sion/expressionConnection.pl - GermOnline at http//germonline.org
- RIKEN mouse at http//read.gsc.riken.go.jp/
24Bioinformatics I -- Databases
25Bioinformatics I -- Databases
Biological Databases RNA
- Expression data - RNA
- Yeast Cell Cycle at http//genome-www.stanford.edu
/cellcycle - Human Cell Cycle at http//genome-www.stanford.edu
/Human-CellCyle/Hela - Human Mouse tissue profiling at
http//expression.gnf.org
26Bioinformatics I -- Databases
27Bioinformatics I -- Databases
Biological Databases proteins
- Post-translational data protein-protein
interaction in Yeast - Biochemical studies
- Cellzome
- BIND
- MDS Proteomics
- Two-hybrid studies
- Curagens PathCalling
28Bioinformatics I -- Databases
Biological Databases proteins
- Post-translational data protein-protein
interaction in Yeast - Biochemical studies
- Cellzome at http//yeast.cellzome.com
- BIND at http//bind.mshri.on.ca
- MDS Proteomics at http//www.mdsp.com
- Two-hybrid studies
- Curagens PathCalling at http//portal.curagen.com
Access the data through http//germonline.bioz.uni
bas.ch and click on S. cerevisiae. Search for
any gene, e.g. SPO11 and go to the
Protein/Proteome Information section of the Locus
Report page.
29Bioinformatics I -- Databases
30Bioinformatics I -- Databases
Biological Databases literature
Pubmed contains the abstracts of peer-reviewed
publications in the field of biomedical
research http//www.ncbi.nlm.nih.gov/entrez/query
.fcgi
Scientific Journals are often available online
(sometimes even for free)! http//www.ub.unibas.c
h/vlib/vbbiol.htm
31Bioinformatics I -- Databases
Knowledgebases a common language
The GeneOntology project http//www.geneontology.
org The objective of GO is to provide controlled
vocabularies for the description of gene
products. These terms are to be used as
attributes of gene products by collaborating
databases, facilitating uniform queries across
them. The three organizing principles of GO are
molecular function, biological process and
cellular component. A gene product has one or
more molecular functions and is used in one or
more biological processes it may be, or may be
associated with, one or more cellular components.
The GeneOntology project http//www.geneontology.
org The objective of GO is to provide controlled
vocabularies for the description of gene
products. These terms are to be used as
attributes of gene products by collaborating
databases, facilitating uniform queries across
them. The three organizing principles of GO are
molecular function, biological process and
cellular component. A gene product has one or
more molecular functions and is used in one or
more biological processes it may be, or may be
associated with, one or more cellular components.
32Bioinformatics I -- Databases
Knowledgebases a common language
- The GeneOntology Evidence Code
http//www.geneontology.org/doc/GO.evidence.html - IC inferred by curator (no evidence but
reasonable) - IDA inferred from direct assay (enzyme, EMSA)
- IEA inferred from electronic annotation (BLAST
hit) - IEP inferred from expression pattern (RNA,
Protein) - IGI inferred from genetic interaction
(suppressors, synthetic lethals, complementation) - IMP inferred from mutant phenotype (deletion,
insertion) - IPI inferred from physical interaction (co-IP,
2-hybrid) - ISS inferred from sequence or structural
similarity (homolog) - NAS non-traceable author statement (quote cannot
be found) - ND no biological data available
- TAS traceable author statement
- NR not recorded
33Bioinformatics I -- Databases
Biological Databases GO based species specific
dbs
- Annotation covers knowledge from Genetics,
Molecular Biology and Functional genomis - SGD for S. cerevisiae
- http//genome-www.stanford.edu/Saccharomyces/
- TAIR for A. thaliana
- http//www.arabidopsis.org/
- Wormbase for C. elegans
- http//www.wormbase.org
- Flybase for D. melanogaster
- http//flybase.bio.indiana.edu/
- Mouse Genome Database for M. musculus
- http//www.informatics.jax.org
34Bioinformatics I -- Databases
Knowledgebases Swissprot gtgtgt Uniprot
Release 40.31 of 25-Oct-2002 of SWISS-PROT
contains 116776 sequence entries, comprising
42881496 amino acids abstracted from 100002
references.
35Bioinformatics I -- Databases
Knowledgebases Swissprot gtgtgt Uniprot
- KEY FEATURES
- Minimal redundancy data from different sources
are merged if conflicts exist between various
sequencing reports, they are indicated in the
feature table of the corresponding entry. - Annotation
- Function(s) of the protein
- Post-translational modification(s). For example
carbohydrates, phosphorylation, acetylation,
GPI-anchor, etc. - Domains and sites. For example calcium binding
regions, ATP-binding sites, zinc fingers,
homeobox, kringle, etc. - Secondary structure
- Quaternary structure. For example homodimer,
heterotrimer, etc. - Similarities to other proteins
- Disease(s) associated with deficiencie(s) in the
protein - Sequence conflicts, variants, etc.
- Integration
- Swissprot is currently links to about 60 external
databases (list at http//www.expasy.org/cgi-bin/l
ists?dbxref.txt)
36Bioinformatics I -- Databases
Knowledgebases Swissprot gtgtgt Uniprot
In SWISS-PROT, information is given in the
comment lines (CC), in the feature table (FT) and
in the keyword lines (KW). Most comments are
classified by topics' this approach permits the
easy retrieval of specific categories of data
from the database. ID SP11_YEAST STANDARD PRT
398 AA. AC P23179 CC -!- FUNCTION REQUIRED
FOR MEIOTIC RECOMBINATION. MEDIATES DNA CC
CLEAVAGE THAT FORMS THE DOUBLE-STRAND BREAKS
(DSB) THAT INITIATE CC MEIOTIC RECOMBINATION. CC
-!- SUBCELLULAR LOCATION Nuclear. CC -!-
DEVELOPMENTAL STAGE MEIOSIS-SPECIFIC. CC -!-
SIMILARITY BELONGS TO THE TOP6A FAMILY. FT
ACT_SITE 135 135 DNA CLEAVAGE (PROBABLE). FT
MUTAGEN 135 135 Y-gtF LOSS OF ACTIVITY. KW
Hydrolase DNA-binding Sporulation Meiosis
Nuclear protein.
37Bioinformatics I -- Databases
Novel ideas
A database that contains large-scale automatic
structure predicitons SWISS-MODEL
repository Models from SWISS-MODEL server
and non-curated external sources will be
available.
38Bioinformatics I -- Databases
Novel ideas
The SWISS-MODEL server at http//www.expasy.org/sw
issmod/ is an automated modelling system that
serves all scientist as a tool to study the
putative 3D structure of a protein using
Comparative Modelling.
39Bioinformatics I -- Databases
Novel ideas
The GermOnline server at http//germonline.bioz.u
nibas.ch http//germonline.org is a platform for
online submission/curation that enables scientist
who work in the field of meiosis and
gametogenesis to create, update and curate a
knowledgebase that uses controlled vocabulary
(GO) and free text to describe the roles of genes
in sexual reproduction.
40Bioinformatics I -- Databases
Major DB info EBI http//www.ebi.ac.uk/Databases
Nucl. Acid Res. 2002 http//nar.oupjournals.org/
content/vol30/issue1/ GermOnline http//germonlin
e.unibas.ch Primig lab http//www.bioz.unibas.ch/
personal/primig/ follow the teaching link, check
out literature info, download ppt presentation
dbs. Life Sciences Training Facility http//www.
bioz.unibas.ch/corelab you will find more links
on bioinformatics
41Bioinformatics I -- Projects
We would like to collaborate with you on our
ongoing GermOnline project.
You will be asked to use online sources
(species-specific and general knowledgebases,
Pubmed) to collect information about the genomes
of S. pombe, A. thaliana, C. elegans, D.
melanogaster, M. musculus and H. sapiens. This
information should be presented in a concise
paragraph like the one written by Peter
Philippsen for the genome of S. cerevisiae (click
on S. cerevisiae and follow the more link in the
Genome Information section). You should include
two complete references. Furthermore we ask you
to search for knowledge about a list of conserved
genes important for meiosis and gametogenesis.
You are asked to identify the homologs and
orthologues and provide curated information about
the yeast genes DMC1, MLH3, MRE11, MSH4, MSH5 and
SPO11. Your search should include literature,
knowledgebases and protein structures. More info
at http//www.biozentrum.unibas.ch/personal/primig
/teaching/bioinfo_I_literature.html The
information you provide will be integrated into
GermOnline by Ulrich Schlecht. You will be
credited for your contribution. The results you
produce will be recorded and (if everything works
out) they count for the exam. We look forward to
getting your feedback.