Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 71
About This Presentation
Title:

Introduction to Bioinformatics

Description:

Introduction to Bioinformatics Lecturer: Dr. Yael Mandel-Gutfreund Teaching Assistance: Oleg Rokhlenko Ydo Wexler http://webcourse.cs.technion.ac.il/236523 ... – PowerPoint PPT presentation

Number of Views:1400
Avg rating:3.0/5.0
Slides: 72
Provided by: 11789
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Bioinformatics


1
Introduction to Bioinformatics
  • Lecturer Dr. Yael Mandel-Gutfreund
  • Teaching Assistance
  • Oleg Rokhlenko
  • Ydo Wexler

http//webcourse.cs.technion.ac.il/236523
2
What is Bioinformatics?
3
Course Objectives
  • To introduce the bioinfomatics discipline
  • To make the students familiar with the major
    biological questions which can be addressed by
    bioinformatics tools
  • To introduce the major tools used for sequence
    and structure analysis and explain in general
    how they work (limitation etc..)

4
Course Structure and Requirements
  • Class Structure
  • Each class (except the first one) will be divided
    into two parts
  • Lecture (in lecture room)
  • A Training Lab (in computer lab)
  • For the Training Lab the class will be divided to
    2 groups.
  • Each one of the groups will meet every second
    week,
  • starting from the second week.
  • The work in the Training Labs will be in pairs.
  • Lab assignments will be submitted at the end of
    each lab.
  • Preparing yourself for the lab- A tutorial
    including self home exercise and their answers
    will be posted on the web a week before the lab
  • 2. A final home exam

5
Grading
  • 30 lab assignments
  • 70 final exam

6
Literature list
  • Gibas, C., Jambeck, P. Developing Bioinformatics
    Computer Skills. O'Reilly, 2001.
  • Lesk, A. M. Introduction to Bioinformatics.
    Oxford University Press, 2002.
  • Mount, D.W. Bioinformatics Sequence and Genome
    Analysis. 2nd ed.,Cold Spring Harbor Laboratory
    Press, 2004.

Advanced Reading
Jones N.C Pevzner P.A. An introduction to
Bioinformatics algorithms MIT Press, 2004
7
Course syllabus
8
What is Bioinformatics?
9
What is Bioinformatics?
The field of science in which biology, computer
science, and information technology merge to form
a single discipline Ultimate goal to enable
the discovery of new biological insights as well
as to create a global perspective from which
unifying principles in biology can be discerned.
10
from purely lab-based science to an information
science
Bioinformatics Bio Informatics
11
Central Paradigm in Molecular Biology
mRNA
Gene (DNA)
Protein
12
Genome
  • Chromosomal DNA of an organism
  • Coding and non-coding DNA
  • Genome size and number of genes does not
    necessarily determine organism complexity

13
Transcriptome
  • Complete collection of all possible mRNAs
    (including splice variants) of an organism.
  • Regions of an organisms genome that get
    transcribed into messenger RNA.
  • Transcriptome can be extended to include all
    transcribed elements, including non-coding RNAs
    used for structural and regulatory purposes.

14
Proteome
  • The complete collection of proteins that can be
    produced by an organism.
  • Can be studied either as static (sum of all
    proteins possible) or dynamic (all proteins found
    at a specific time point) entity

15
From DNA to Genome
First protein sequence
Watson and Crick DNA model
1955
1960
First protein structure
1965
1970
1975
1980
1985
16
1990
First bacterial genome Hemophilus Influenzae
1995
Yeast genome
First human genome draft
2000
17
The Human Genome Project
  • Initiated in 1986 Completed
    in 2003
  • Project goals were to
  • identify all the genes in human DNA,
  • determine the sequences of the 3 billion chemical
    base pairs that make up human DNA,
  • store this information in databases,
  • improve tools for data analysis and develop new
    tools
  • address the ethical, legal, and social issues
    that may arise from the project.

18
Human Genome Project
International Human Genome Organization founded
Celera Genomics founded
First working drafts published
1995
1985
1990
2000
USA Department of Energy announces project
Low resolution linkage map published
Project successfully completed
19
The Human Genome Project
  • Initiated in 1986 Completed
    in 2003
  • How did we do??
  • identify all the genes in human DNA ? ?
  • determine the sequences of the 3 billion chemical
    base pairs that make up human DNA ? ? ?
  • store this information in databases ? ? ?
  • improve tools for data analysis and develop new
    tools ? ? ?
  • address the ethical, legal, and social issues
    that may arise from the project ?

20
What makes us human?
21
How humans are chimps?
Perhaps not surprising!!! Comparison between the
full drafts of the human and chimp
genomes revealed that they differ only by 1.23
22
Complete Genomes
  • 1994 0
  • 1995 1
  • 2004 234
  • 2005 303
  • eukaryotes 24
  • bacteria 240
  • archaea 39

23
The post-genomics era
Whats Next ?
Annotation
Comparative genomics
Structural genomics
Functional genomics
Goal to understand the functional networks of a
living cell
24
Open reading frames
Functional sites
Annotation
Structure, function
25
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ...... .............. TGAAAAACGTA
26
CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG CGTGCAA
ATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA CTTTGTTACGCGT
TTTTGTCATGGCTTTGGTCCCGCTTTGTTC AGAATGCTTTTAATAAGCG
GGGTTACCGGTTTGGTTAGCGAGA AGAGCCAGTAAAAGACGCAGTGACG
GAGATGTCTGATG CAA TAT GGA CAA TTG GTT TCT TCT CTG
AAT ................................. ............
.. TGAAAAACGTA
27
Comparative genomics
28
Chimps and Us
29
Comparative genomics
30
  • Researchers have learned a great deal about the
    function of human genes by examining their
    counterparts in simpler model organisms such as
    the mouse.

Conservation of the IGFALS (Insulin-like growth
factor) Between human and mouse.
31
Functional genomics
32
Understanding the function of genes and other
parts of the genome
33
Functional genomics
34
A network of interactions can be built For all
proteins in an organism
A large network of 8184 interactions among 4140
S. Cerevisiae proteins
35
Structural genomics
Assign structure to all proteins encoded in a
genome
36
Protein Structure
37
Resources and Databases
  • The different types of data are collected in
    database
  • Sequence databases
  • Structural databases
  • Databases of Experimental Results
  • All databases are connected

38
Database Types
Sequence databases General special GenBank,
embl TF binding sites PIR, Swissprot Promoters
Genomes Structure databases General Spe
cial PDB Specific protein families folds
Databases of experimental results Co-expressed
genes, prot-prot interaction, etc.
39
Sequence databases
  • Gene database
  • Genome database
  • SNPs database
  • Disease related mutation database

40
What can we learn about a Gene
41
mRNA, full length, EST
42
EST
  • Expressed Sequence Tags
  • Partial copies of mRNA found within a particular
    cell
  • Can be used to identify genic regions splicing
    patterns of genes etc

43
Different transcripts can be related to the same
gene!
44
Gene database
  • Give information into gene functionality
  • Alternative splicing of genes
  • Alternative pattern of exons included to create
    gene product
  • EST

45
Genome Databases
  • Data organized by species
  • Clones assembled into contigous pieces contigs
    or whole chromosomes
  • Information on non-coding regions
  • Relativity

46
Genome Browsers
  • Annotation adds value to sequence
  • Easy walk through the genome
  • Comparative genomics

47
Genome Browsers
  • Ensembl Genome Browser (http//www.ensembl.org)
  • UCSC Genome Browser http//genome.ucsc.edu/
  • WormBase http//www.wormbase.org/
  • AceDB http//www.acedb.org/
  • Comprehensive Microbial Resource
    http//www.tigr.org/tigr-scripts/CMR2/CMRHomePage.
    spl
  • FlyBase http//flybase.bio.indiana.edu/

48
beta globin
49
(No Transcript)
50
RefSeq
  • Set of mRNA sequences cureted at NCBI
  • Many experimentally validated
  • Some partially validated via ESTs
  • Some computationally predicted

51
(No Transcript)
52
(No Transcript)
53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
SNP database
  • Single Nucleotide Polymorphisms (SNPs)
  • Single base difference in a single position among
    two different individuals of the same species
  • Play an important role in differentiation and
    disease

57
Sickle Cell Anemia
  • Due to 1 swapping an A for a T, causing inserted
    amino acid to be valine instead of glutamine in
    hemoglobin

Image source http//www.cc.nih.gov/ccc/ccnews/nov
99/
58
Healthy Individual
  • gtgi28302128refNM_000518.4 Homo sapiens
    hemoglobin, beta (HBB), mRNA
  • ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
    ATGGTGCATCTGACTCCTGA
  • GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
    TTGGTGGTGAGGCCCTGGGC
  • AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
    GGATCTGTCCACTCCTGATG
  • CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
    GCCTTTAGTGATGGCCTGGC
  • TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
    GTGACAAGCTGCACGTGGAT
  • CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
    TCACTTTGGCAAAGAATTCA
  • CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
    GCCCTGGCCCACAAGTATCA
  • CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
    CTAAGTCCAACTACTAAACT
  • GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
    CATTTATTTTCATTGC
  • gtgi4504349refNP_000509.1 beta globin Homo
    sapiens
  • MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
    TPDAVMGNPKVKAHGKKVLG
  • AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
    KEFTPPVQAAYQKVVAGVAN
  • ALAHKYH

59
Diseased Individual
  • gtgi28302128refNM_000518.4 Homo sapiens
    hemoglobin, beta (HBB), mRNA
  • ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACC
    ATGGTGCATCTGACTCCTGA
  • GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAG
    TTGGTGGTGAGGCCCTGGGC
  • AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGG
    GGATCTGTCCACTCCTGATG
  • CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGT
    GCCTTTAGTGATGGCCTGGC
  • TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACT
    GTGACAAGCTGCACGTGGAT
  • CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCA
    TCACTTTGGCAAAGAATTCA
  • CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAAT
    GCCCTGGCCCACAAGTATCA
  • CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCC
    CTAAGTCCAACTACTAAACT
  • GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAA
    CATTTATTTTCATTGC
  • gtgi4504349refNP_000509.1 beta globin Homo
    sapiens
  • MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS
    TPDAVMGNPKVKAHGKKVLG
  • AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
    KEFTPPVQAAYQKVVAGVAN
  • ALAHKYH

60
Disease Databases
  • Genes are involved in disease
  • Many diseases are well studied
  • Description of diseases and what is known about
    them is stored

OMIM - Online Mendelian Inheritance in Man
61
(No Transcript)
62
Structure Databases
  • 3-dimensional structures of proteins, nucleic
    acids, molecular complexes etc
  • 3-d data is available due to techniques such as
    NMR and X-Ray crystallography

63
(No Transcript)
64
(No Transcript)
65
Databases of Experimental Results
  • Data such as experimental microarray images-
    expression data
  • Clustering information
  • Metabolic pathways, protein-protein interaction
    data

66
PubMed
Literature Databases
http//www.ncbi.nlm.nih.giv/PubMed
Service of the National Library of Medicine
  • MEDLINE publication database
  • Over 17,000 journals
  • 15 million citations since 1950

67
Putting it All Together
  • Each Database contains specific information
  • Like other biological systems also these
    databases are interrelated

68
PROTEIN PIR SWISS-PROT
DISEASE LocusLink OMIM OMIA
ASSEMBLED GENOMES GoldenPath WormBase TIGR
MOTIFS BLOCKS Pfam Prosite
GENOMIC DATA GenBank DDBJ EMBL
ESTs dbEST unigene
GENES RefSeq AllGenes GDB
SNPs dbSNP
GENE EXPRESSION Stanford MGDB NetAffx ArrayExpress
PATHWAY KEGG COG
STRUCTURE PDB MMDB SCOP
LITERATURE PubMed
69
Entrez NCBI Engine
  • Entrez is the integrated, text-based search and
    retrieval system used at NCBI for the major
    databases, including PubMed, Nucleotide and
    Protein Sequences, Protein Structures, Complete
    Genomes, Taxonomy, and others.

http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi?ito
oltoolbar
70
Entrez NCBI Engine
71
  • General Bioinformatic Webpages
  • USA National Center for Biotechnology
    Information www.ncbi.nlm.nih.gov
  • European Bioinformatics Institute www.ebi.ac.uk
  • ExPASy Molecular Biology Server www.expasy.org
  • Israeli National Node inn.org.il

http//www.agr.kuleuven.ac.be/vakken/i287/bioinfor
matica.htm
Write a Comment
User Comments (0)
About PowerShow.com