Title: Computational Methods In Molecular Biology CS-67693, Spring 2005
1Computational Methods In Molecular
BiologyCS-67693, Spring 2005
- School of Computer Science Engineering
- Hebrew University, Jerusalem
2Class 1 Introduction
3Introduction
- What is Comp. Bio.? Why is it great?
- What are the aims and basic concepts of this
course - High level biological review give basic bio
background and motivation for tasks handled in
the course - Administration
4The Cell
5Example Tissues in Stomach
6DNA Components
- Four nucleotide types
- Adenine
- Guanine
- Cytosine
- Thymine
- Hydrogen bonds
- A-T
- C-G
7The Double Helix
Source Alberts et al
8DNA Organization
Source Alberts et al
9Genome Sizes
- E.Coli (bacteria) 4.6 x 106 bases
- Yeast (simple fungi) 15 x 106 bases
- Smallest human chromosome 50 x 106 bases
- Entire human genome 3 x 109 bases
10Related Computational Tasks
- Need a way to reconstruct DNA sequence from
fragments major contribution of comp. bio. ! - Related sequence comparison, sequence alignment
11DNA Duplication
Source Mathews van Holde
12Genes
- The DNA strings include
- Coding regions (genes)
- E. coli has 4,000 genes
- Yeast has 6,000 genes
- C. Elegans has 13,000 genes
- Humans have 32,000 genes
- Control regions
- These typically are adjacent to the genes
- They determine when a gene should be expressed
- Junk DNA (unknown function)
13The Tree of Life
Source Alberts et al
14Evolution
- Related organisms have similar DNA
- Similarity in sequences of proteins
- Similarity in organization of genes along the
chromosomes - Evolution plays a major role in biology
- Many mechanisms are shared across a wide range of
organisms (e.g. orthologes) - During the course of evolution existing
components are adapted for new functions (e.g
paraloges)
15Evolution
- Evolution of new organisms is driven by
- Diversity
- Different individuals carry different variants of
the same basic blue print - Mutations
- The DNA sequence can be changed due to single
base changes, deletion/insertion of DNA segments,
etc. - Selection bias
16Related Computational Tasks
- Phylogeny not just theory!
- Rebuild the tree of life
- Infer relations between genes/pathways etc.
across species - Learn models for changes and development
- Major benefit exploit the information we do
have/observe to infer about the systems on which
we have very little knowledge and observations.
17How Do Genes Code for Proteins?
DNA
18Transcription
- Coding sequences can be transcribed to RNA
- RNA nucleotides
- Similar to DNA, slightly different backbone
- Uracil (U) instead of Thymine (T)
Source Mathews van Holde
19RNA Editing
20Translation
21Translation
- Translation is mediated by the ribosome
- Ribosome is a complex of protein rRNA molecules
- The ribosome attaches to the mRNA at a
translation initiation site - Then ribosome moves along the mRNA sequence and
in the process constructs a poly-peptide - When the ribosome encounters a stop signal, it
releases the mRNA. The construct poly-peptide is
released, and folds into a protein.
22Translation
Source Alberts et al
23Translation
Source Alberts et al
24Translation
Source Alberts et al
25Translation
Source Alberts et al
26Translation
Source Alberts et al
27Genetic Code
28The Central Dogma
29Eukaryotic Transcription Regulation
TF
TF
RNA polymerase II
Basal
3
5
TFs
Promoter
Gene
5
5
3
mRNA
Transcription start site
- Classical Model
- Composition of promoter region determines rate of
transcription initiation - Combinations of TFs control the transcription of
gene sets under specific conditions
30From Data to Model
gtYKL112W Chr 11 ATGGACAAATTAGTCGTGAATTATTATGAATACA
AGCACCCTATAATTAATAAAGACCTGGCCATTGGAGCCCATGGAGGCAAA
AAATTTCCCACCTTGGGTGCTTGGTATGATGTAATTAATGAGTACGAATT
TCAGACGCGTTGCCCTATTATTTTAAAGAATTCGCATAGGAACAAACATT
TTACATTTGCCTGTCATTTGAAAAACTGTCCATTTAAAGTCTTGCTAAGC
TATGCTGGCAATGCTGCATCCTCAGAAACCTCATCTCCTTCTGCAAATAA
TAATACCAACCCTCCGGGTACTCCTGATCATATTCATCATCATAGCAACA
ACATGAACAACGAGGACAATGATAATAACAATGGCAGTAATAATAAGGTT
AGCAATGACAGTAAACTTGACTTCGTTACTGATGATCTTGAATACCATCT
GGCGAACACTCATCCGGACGACACCAATGACAAAGTGGAGTCGAGAAGCA
ATGAGGTGAATGGGAACAATGACGATGATGCTGATGCCAACAACATTTTT
AAACAGCAAGGTGTTACTATCAAGAACGACACTGAAGATGATTCGATAAA
TAAGGCCTCTAT
31Many Related Computational Tasks
- Information is in the code book ?
- How alternative splicing is determined and where?
- Build models for regulation of genes at different
levels of complexity - Relate genotype and phenotype What are the
expression patterns of some disease? How do they
relate to sequence? What model can explain the
observations? Can we predict phenomenon based on
our models?
32Who came first?
- Chicken or egg?
- Egg
- DNA or Protein?
- RNA
- Thomas Cech Sidney Altman ( 80s !)
- RNA as an independent molecule
- Probably more close to the ancient source
33RNA roles
- Messenger RNA (mRNA)
- Encodes protein sequences
- Transfer RNA (tRNA)
- Adaptor between mRNA molecules and amino-acids
(protein building blocks) - Ribosomal RNA (rRNA)
- Part of the ribosome, a machine for translating
mRNA to proteins - ...
34Transfer RNA
- Anticodon
- matches a codon (triplet of mRNA nucleotides)
- Attachment site
- matches a specific amino-acid
35Related Computational Tasks
- RNA secondary structure prediction
- based on CFG and CM
- RNA coding area prediction
36RNA Editing
Source Mathews van Holde
37Translation
38How do Proteins Perform their Rules?
- Protein interact in various ways
- Change conformations, conformations ? function
- Major Issues
- Their active/functional areas which interact
- Their 3D structure
39Protein Structure
- Proteins are poly-peptides of 70-3000 amino-acids
- This structure is (mostly) determined by the
sequence of amino-acids that make up the protein
40Protein Structure
41Related Computational Tasks
- Protein 2D, 3D structure prediction
- Identify sequence motifs/domains in proteins
- Sequence similarity vs. functional similarity
42Course Goals
- Review current tasks posed by modern molecular
biology - Review and experiment with some of the
tools/solutions currently found (e.g. BLAST,
clustalw) - Gain some tools to handle such problems
- Dynamic programming
- Probabilistic graphical models
- MM,HMM,CM,Trees
- Representation, what principles justify them,
Learning, Inference - Statistic tools how to measure our confidence in
our results?
43Course Goals
- Computational tools in molecular biology
- We will cover computational tasks that are posed
by modern molecular biology - We will discuss the biological motivation and
setup for these tasks - We will understand the the kinds of solutions
exist and what principles justify them
44Courses Main Point
45Courses Main Point
- Learn to do
- Define the problem ? Find comp. solution
- Four Aspects
- Biological
- What is the task?
- Algorithmic
- How to perform the task at hand efficiently?
- Learning
- How to adapt parameters of the task form examples
- Statistics
- How to differentiate true phenomena from artifacts
46Example Sequence Comparison
- Biological
- Evolution preserves sequences, thus similar genes
might have similar function - Algorithmic
- Consider all ways to align one sequence against
another - Learning
- How do we define similar sequences? Use
examples to define similarity - Statistics
- When we compare to 106 sequences, what is a
random match and what is true one
47Topics I
- Dealing with DNA/Protein sequences
- Genome projects and how sequences are found
- Finding similar sequences
- Models of sequences Hidden Markov Models
- Transcription regulation
- Protein Families
- Gene finding
48Topics II
- Gene Expression
- Genome-wide expression patterns
- Data organization clustering
- Reconstructing transcription regulation
- Recognizing and classifying cancers
49Topics III
- Models of genetic change
- Long term evolutionary changes among species
- Reconstructing evolutionary trees from current
day sequences - Short term genetic variations in a population
- Finding genes by linkage and association
50Topics IV
- Protein World
- How proteins fold - secondary tertiary
structure - How to predict protein folds from sequences data
alone - How to analyze proteins changes from raw
experimental measurements (MassSpec) - 2D gels
51Class Structure
- 2 weekly meeting
- Mondays 16-18 (Levin 8), Wednesdays 10-12
(Kaplan) - Grade
- Homework assignments 50 of the final grade.
There will be up to seven homework assignments.
These assignments will include theoretical
problems, using bioinformatics tools and
programming. - Final home assignment 20 of the final grade.
- Final test 30 of the grade.
- Class participation A 5 bonus grade for
students who actively participate in discussions
during classes - Possible oral presentation of any exercise to
define grade!
52Exercises Handouts
- Check regularly
- http//www.cs.huji.ac.il/cbio