Introduction to Bioinformatics 236523 - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Introduction to Bioinformatics 236523

Description:

Introduction to Bioinformatics 236523 Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Shula Shazman Sivan Bercovici Course web site : http://webcourse.cs ... – PowerPoint PPT presentation

Number of Views:429
Avg rating:3.0/5.0
Slides: 43
Provided by: 7884
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics 236523


1
Introduction to Bioinformatics236523
  • Lecturer Dr. Yael Mandel-Gutfreund
  • Teaching Assistance
  • Shula Shazman
  • Sivan Bercovici

Course web site http//webcourse.cs.technion.ac.
il/236523
2
What is Bioinformatics?
3
Course Objectives
  • To introduce the bioinfomatics discipline
  • To make the students familiar with the major
    biological questions which can be addressed by
    bioinformatics tools
  • To introduce the major tools used for sequence
    and structure analysis and explain in general
    how they work (limitation etc..)

4
Course Structure and Requirements
  • Class Structure
  • 2 hours Lecture
  • 1 hour tutorial
  • 2. Home work
  • Homework assignments will be given every second
    week
  • The homework will be done in pairs.
  • 5/5 homework assignments will be submitted
  • 2. A final project will be conducted and
    submitted
  • in pairs

5
Grading
  • 20 Homework assignments
  • 80 final project

6
Literature list
  • Gibas, C., Jambeck, P. Developing Bioinformatics
    Computer Skills. O'Reilly, 2001.
  • Lesk, A. M. Introduction to Bioinformatics.
    Oxford University Press, 2002.
  • Mount, D.W. Bioinformatics Sequence and Genome
    Analysis. 2nd ed.,Cold Spring Harbor Laboratory
    Press, 2004.

Advanced Reading
Jones N.C Pevzner P.A. An introduction to
Bioinformatics algorithms MIT Press, 2004
7
What is Bioinformatics?
8
What is Bioinformatics?
The field of science in which biology, computer
science, and information technology merge to form
a single discipline Ultimate goal to enable
the discovery of new biological insights as well
as to create a global perspective from which
unifying principles in biology can be discerned.
9
Central Paradigm in Molecular Biology
mRNA
Gene (DNA)
Protein
10
from purely lab-based science to an information
science
Bioinformatics Bio Informatics
11
From DNA to Genome
First protein sequence
Watson and Crick DNA model
1955
1960
First protein structure
1965
1970
1975
1980
1985
12
1990
First bacterial genome Hemophilus Influenzae
1995
Yeast genome
First human genome draft
2000
13
Complete Genomes
Total 1117 706 456 Eukaryotes
119 78 43 Bacteria 929 578 383
Archaea 69 50 29
2009 2008 2007
14
The post-genomics era
1117 genomes Whats Next ?
Annotation
Comparative genomics
Structural genomics
Functional genomics
Goal to understand the living cell
15
Annotation
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ...... .............. TGAAAAACGTA
16
Identify the genes within a given sequence of
DNA
Identify the sites Which regulate the gene
Annotation
Predict the function
17
How do we identify a gene in a genome?
A gene is characterized by several features
(promoter, ORF) some are easier and some harder
to detect
18
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ................................. ............
.. TGAAAAACGTA
19
Using Bioinformatics approaches for Gene hunting
Relative easy in simple organisms (e.g. bacteria)
VERY HARD for higher organism (e.g. humans)
20
Comparative genomics
21
Perhaps not surprising!!!
How humans are chimps?
Comparison between the full drafts of the human
and chimp genomes revealed that they differ only
by 1.23
22
So where are we different ??
Human ATAGCGGGGGGATGCGGGCCCTATACCC Chimp
ATAGGGG - - GGATGCGGGCCCTATACCC Mouse ATAGCG -
- - GGATGCGGCGC -TATACCA
23
  • And where are we similar ???

VERY SIMAILAR Conserved between many organisms
VERY DIFFERENT
24
Functional genomics
25
TO BE IN NOT ENOUGH
In any time point a gene can be functional or not
26
From the gene expression pattern we can lean
What does the gene do ? When is it needed? What
other genes or proteins interact with
it? .. What's wrong??
27
Structural Genomics
28
The protein three dimensional structure can tell
much more then the sequence alone
29
Resources and Databases
  • The different types of data are collected in
    database
  • Sequence databases
  • Structural databases
  • Databases of Experimental Results
  • All databases are connected

30
Sequence databases
  • Gene database
  • Genome database
  • Disease related mutation database
  • .

31
Genome Browsers
  • Easy walk through the genome

32
Genome Browsers
  • UCSC Genome Browser http//genome.ucsc.edu/
  • Ensembl Genome Browser (http//www.ensembl.org)
  • WormBase http//www.wormbase.org/
  • AceDB http//www.acedb.org/
  • Comprehensive Microbial Resource
    http//www.tigr.org/tigr-scripts/CMR2/CMRHomePage.
    spl
  • FlyBase http//flybase.bio.indiana.edu/

33
Mutation database
  • Single base difference in a single position among
    two different individuals of the same species
  • Play an important role in differentiation and
    disease

34
Sickle Cell Anemia
  • Due to 1 swapping an A for a T, causing inserted
    amino acid to be valine instead of glutamine in
    hemoglobin

Image source http//www.cc.nih.gov/ccc/ccnews/nov
99/
35
Healthy Individual
  • gtgi28302128refNM_000518.4 Homo sapiens
    hemoglobin, beta (HBB), mRNA
  • ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
    ATGGTGCATCTGACTCCTGA
  • GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
    TTGGTGGTGAGGCCCTGGGC
  • AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
    GGATCTGTCCACTCCTGATG
  • CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
    GCCTTTAGTGATGGCCTGGC
  • TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
    GTGACAAGCTGCACGTGGAT
  • CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
    TCACTTTGGCAAAGAATTCA
  • CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
    GCCCTGGCCCACAAGTATCA
  • CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
    CTAAGTCCAACTACTAAACT
  • GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
    CATTTATTTTCATTGC
  • gtgi4504349refNP_000509.1 beta globin Homo
    sapiens
  • MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
    TPDAVMGNPKVKAHGKKVLG
  • AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
    KEFTPPVQAAYQKVVAGVAN
  • ALAHKYH

36
Diseased Individual
  • gtgi28302128refNM_000518.4 Homo sapiens
    hemoglobin, beta (HBB), mRNA
  • ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
    ATGGTGCATCTGACTCCTGA
  • GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
    TTGGTGGTGAGGCCCTGGGC
  • AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
    GGATCTGTCCACTCCTGATG
  • CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
    GCCTTTAGTGATGGCCTGGC
  • TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
    GTGACAAGCTGCACGTGGAT
  • CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
    TCACTTTGGCAAAGAATTCA
  • CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
    GCCCTGGCCCACAAGTATCA
  • CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
    CTAAGTCCAACTACTAAACT
  • GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
    CATTTATTTTCATTGC
  • gtgi4504349refNP_000509.1 beta globin Homo
    sapiens
  • MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
    TPDAVMGNPKVKAHGKKVLG
  • AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
    KEFTPPVQAAYQKVVAGVAN
  • ALAHKYH

37
Structure Databases
  • 3-dimensional structures of proteins, nucleic
    acids, molecular complexes etc
  • 3-d data is available due to techniques such as
    NMR and X-Ray crystallography

38
(No Transcript)
39
Databases of Experimental Results
  • Data such as experimental microarray images- gene
    expression data
  • Proteomic data- protein expression data
  • Metabolic pathways, protein-protein interaction
    data, regulatory networks
  • ETC.

40
PubMed
Literature Databases
http//www.ncbi.nlm.nih.giv/PubMed
Service of the National Library of Medicine
41
Putting it all Together
  • Each Database contains specific information
  • Like other biological systems also these
    databases are interrelated

42
PROTEIN PIR SWISS-PROT
DISEASE LocusLink OMIM OMIA
ASSEMBLED GENOMES GoldenPath WormBase TIGR
MOTIFS BLOCKS Pfam Prosite
GENOMIC DATA GenBank DDBJ EMBL
ESTs dbEST unigene
GENES RefSeq AllGenes GDB
SNPs dbSNP
GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress
PATHWAY KEGG COG
STRUCTURE PDB MMDB SCOP
LITERATURE PubMed
Write a Comment
User Comments (0)
About PowerShow.com