Title: Introduction to Bioinformatics 236523
1Introduction to Bioinformatics236523
- Lecturer Dr. Yael Mandel-Gutfreund
- Teaching Assistance
- Shula Shazman
- Sivan Bercovici
Course web site http//webcourse.cs.technion.ac.
il/236523
2What is Bioinformatics?
3Course Objectives
- To introduce the bioinfomatics discipline
- To make the students familiar with the major
biological questions which can be addressed by
bioinformatics tools - To introduce the major tools used for sequence
and structure analysis and explain in general
how they work (limitation etc..)
4Course Structure and Requirements
- Class Structure
- 2 hours Lecture
- 1 hour tutorial
- 2. Home work
- Homework assignments will be given every second
week - The homework will be done in pairs.
- 5/5 homework assignments will be submitted
-
- 2. A final project will be conducted and
submitted - in pairs
5Grading
- 20 Homework assignments
- 80 final project
6Literature list
- Gibas, C., Jambeck, P. Developing Bioinformatics
Computer Skills. O'Reilly, 2001. - Lesk, A. M. Introduction to Bioinformatics.
Oxford University Press, 2002. - Mount, D.W. Bioinformatics Sequence and Genome
Analysis. 2nd ed.,Cold Spring Harbor Laboratory
Press, 2004.
Advanced Reading
Jones N.C Pevzner P.A. An introduction to
Bioinformatics algorithms MIT Press, 2004
7What is Bioinformatics?
8What is Bioinformatics?
The field of science in which biology, computer
science, and information technology merge to form
a single discipline Ultimate goal to enable
the discovery of new biological insights as well
as to create a global perspective from which
unifying principles in biology can be discerned.
9Central Paradigm in Molecular Biology
mRNA
Gene (DNA)
Protein
10from purely lab-based science to an information
science
Bioinformatics Bio Informatics
11From DNA to Genome
First protein sequence
Watson and Crick DNA model
1955
1960
First protein structure
1965
1970
1975
1980
1985
121990
First bacterial genome Hemophilus Influenzae
1995
Yeast genome
First human genome draft
2000
13Complete Genomes
Total 1117 706 456 Eukaryotes
119 78 43 Bacteria 929 578 383
Archaea 69 50 29
2009 2008 2007
14The post-genomics era
1117 genomes Whats Next ?
Annotation
Comparative genomics
Structural genomics
Functional genomics
Goal to understand the living cell
15Annotation
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ...... .............. TGAAAAACGTA
16Identify the genes within a given sequence of
DNA
Identify the sites Which regulate the gene
Annotation
Predict the function
17How do we identify a gene in a genome?
A gene is characterized by several features
(promoter, ORF) some are easier and some harder
to detect
18CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ................................. ............
.. TGAAAAACGTA
19Using Bioinformatics approaches for Gene hunting
Relative easy in simple organisms (e.g. bacteria)
VERY HARD for higher organism (e.g. humans)
20Comparative genomics
21Perhaps not surprising!!!
How humans are chimps?
Comparison between the full drafts of the human
and chimp genomes revealed that they differ only
by 1.23
22So where are we different ??
Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp
ATAGGGG - - GGATGCGGGCCCTATACCC Mouse ATAGCG -
- - GGATGCGGCGC -TATACCA
23- And where are we similar ???
VERY SIMAILAR Conserved between many organisms
VERY DIFFERENT
24Functional genomics
25TO BE IN NOT ENOUGH
In any time point a gene can be functional or not
26From the gene expression pattern we can lean
What does the gene do ? When is it needed? What
other genes or proteins interact with
it? .. What's wrong??
27Structural Genomics
28The protein three dimensional structure can tell
much more then the sequence alone
29Resources and Databases
- The different types of data are collected in
database - Sequence databases
- Structural databases
- Databases of Experimental Results
- All databases are connected
30Sequence databases
- Gene database
- Genome database
- Disease related mutation database
- .
31Genome Browsers
- Easy walk through the genome
32Genome Browsers
- UCSC Genome Browser http//genome.ucsc.edu/
- Ensembl Genome Browser (http//www.ensembl.org)
- WormBase http//www.wormbase.org/
- AceDB http//www.acedb.org/
- Comprehensive Microbial Resource
http//www.tigr.org/tigr-scripts/CMR2/CMRHomePage.
spl - FlyBase http//flybase.bio.indiana.edu/
33Mutation database
-
- Single base difference in a single position among
two different individuals of the same species - Play an important role in differentiation and
disease
34Sickle Cell Anemia
- Due to 1 swapping an A for a T, causing inserted
amino acid to be valine instead of glutamine in
hemoglobin
Image source http//www.cc.nih.gov/ccc/ccnews/nov
99/
35Healthy Individual
- gtgi28302128refNM_000518.4 Homo sapiens
hemoglobin, beta (HBB), mRNA - ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
ATGGTGCATCTGACTCCTGA - GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
TTGGTGGTGAGGCCCTGGGC - AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
GGATCTGTCCACTCCTGATG - CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
GCCTTTAGTGATGGCCTGGC - TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGAT - CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
TCACTTTGGCAAAGAATTCA - CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
GCCCTGGCCCACAAGTATCA - CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
CTAAGTCCAACTACTAAACT - GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
CATTTATTTTCATTGC - gtgi4504349refNP_000509.1 beta globin Homo
sapiens - MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLG - AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVAN - ALAHKYH
36Diseased Individual
- gtgi28302128refNM_000518.4 Homo sapiens
hemoglobin, beta (HBB), mRNA - ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
ATGGTGCATCTGACTCCTGA - GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
TTGGTGGTGAGGCCCTGGGC - AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
GGATCTGTCCACTCCTGATG - CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
GCCTTTAGTGATGGCCTGGC - TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
GTGACAAGCTGCACGTGGAT - CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
TCACTTTGGCAAAGAATTCA - CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
GCCCTGGCCCACAAGTATCA - CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
CTAAGTCCAACTACTAAACT - GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
CATTTATTTTCATTGC - gtgi4504349refNP_000509.1 beta globin Homo
sapiens - MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
TPDAVMGNPKVKAHGKKVLG - AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVAN - ALAHKYH
37Structure Databases
- 3-dimensional structures of proteins, nucleic
acids, molecular complexes etc - 3-d data is available due to techniques such as
NMR and X-Ray crystallography
38(No Transcript)
39Databases of Experimental Results
- Data such as experimental microarray images- gene
expression data - Proteomic data- protein expression data
- Metabolic pathways, protein-protein interaction
data, regulatory networks - ETC.
40PubMed
Literature Databases
http//www.ncbi.nlm.nih.giv/PubMed
Service of the National Library of Medicine
41Putting it all Together
- Each Database contains specific information
- Like other biological systems also these
databases are interrelated
42PROTEIN PIR SWISS-PROT
DISEASE LocusLink OMIM OMIA
ASSEMBLED GENOMES GoldenPath WormBase TIGR
MOTIFS BLOCKS Pfam Prosite
GENOMIC DATA GenBank DDBJ EMBL
ESTs dbEST unigene
GENES RefSeq AllGenes GDB
SNPs dbSNP
GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress
PATHWAY KEGG COG
STRUCTURE PDB MMDB SCOP
LITERATURE PubMed