Computational Methods In Molecular Biology CS-67693, Spring 2005 - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Methods In Molecular Biology CS-67693, Spring 2005

Description:

cbio course, spring 2005, Hebrew University. Computational Methods In Molecular Biology ... What are the aims and basic concepts of this course ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 49
Provided by: NirFri
Category:

less

Transcript and Presenter's Notes

Title: Computational Methods In Molecular Biology CS-67693, Spring 2005


1
Computational Methods In Molecular
BiologyCS-67693, Spring 2005
  • School of Computer Science Engineering
  • Hebrew University, Jerusalem

2
Class 1 Introduction
3
Introduction
  • What is Comp. Bio.? Why is it great?
  • What are the aims and basic concepts of this
    course
  • High level biological review give basic bio
    background and motivation for tasks handled in
    the course
  • Administration

4
The Cell
5
Example Tissues in Stomach
6
DNA Components
  • Four nucleotide types
  • Adenine
  • Guanine
  • Cytosine
  • Thymine
  • Hydrogen bonds
  • A-T
  • C-G

7
The Double Helix
Source Alberts et al
8
DNA Organization
Source Alberts et al
9
Genome Sizes
  • E.Coli (bacteria) 4.6 x 106 bases
  • Yeast (simple fungi) 15 x 106 bases
  • Smallest human chromosome 50 x 106 bases
  • Entire human genome 3 x 109 bases

10
Related Computational Tasks
  • Need a way to reconstruct DNA sequence from
    fragments major contribution of comp. bio. !
  • Related sequence comparison, sequence alignment

11
DNA Duplication
Source Mathews van Holde
12
Genes
  • The DNA strings include
  • Coding regions (genes)
  • E. coli has 4,000 genes
  • Yeast has 6,000 genes
  • C. Elegans has 13,000 genes
  • Humans have 32,000 genes
  • Control regions
  • These typically are adjacent to the genes
  • They determine when a gene should be expressed
  • Junk DNA (unknown function)

13
The Tree of Life
Source Alberts et al
14
Evolution
  • Related organisms have similar DNA
  • Similarity in sequences of proteins
  • Similarity in organization of genes along the
    chromosomes
  • Evolution plays a major role in biology
  • Many mechanisms are shared across a wide range of
    organisms (e.g. orthologes)
  • During the course of evolution existing
    components are adapted for new functions (e.g
    paraloges)

15
Evolution
  • Evolution of new organisms is driven by
  • Diversity
  • Different individuals carry different variants of
    the same basic blue print
  • Mutations
  • The DNA sequence can be changed due to single
    base changes, deletion/insertion of DNA segments,
    etc.
  • Selection bias

16
Related Computational Tasks
  • Phylogeny not just theory!
  • Rebuild the tree of life
  • Infer relations between genes/pathways etc.
    across species
  • Learn models for changes and development
  • Major benefit exploit the information we do
    have/observe to infer about the systems on which
    we have very little knowledge and observations.

17
How Do Genes Code for Proteins?
DNA
18
Transcription
  • Coding sequences can be transcribed to RNA
  • RNA nucleotides
  • Similar to DNA, slightly different backbone
  • Uracil (U) instead of Thymine (T)

Source Mathews van Holde
19
RNA Editing
20
Translation
21
Translation
  • Translation is mediated by the ribosome
  • Ribosome is a complex of protein rRNA molecules
  • The ribosome attaches to the mRNA at a
    translation initiation site
  • Then ribosome moves along the mRNA sequence and
    in the process constructs a poly-peptide
  • When the ribosome encounters a stop signal, it
    releases the mRNA. The construct poly-peptide is
    released, and folds into a protein.

22
Translation
Source Alberts et al
23
Translation
Source Alberts et al
24
Translation
Source Alberts et al
25
Translation
Source Alberts et al
26
Translation
Source Alberts et al
27
Genetic Code
28
The Central Dogma
29
Eukaryotic Transcription Regulation
TF
TF
RNA polymerase II
Basal
3
5
TFs
Promoter
Gene
5
5
3
mRNA
Transcription start site
  • Classical Model
  • Composition of promoter region determines rate of
    transcription initiation
  • Combinations of TFs control the transcription of
    gene sets under specific conditions

30
From Data to Model
gtYKL112W Chr 11 ATGGACAAATTAGTCGTGAATTATTATGAATACA
AGCACCCTATAATTAATAAAGACCTGGCCATTGGAGCCCATGGAGGCAAA
AAATTTCCCACCTTGGGTGCTTGGTATGATGTAATTAATGAGTACGAATT
TCAGACGCGTTGCCCTATTATTTTAAAGAATTCGCATAGGAACAAACATT
TTACATTTGCCTGTCATTTGAAAAACTGTCCATTTAAAGTCTTGCTAAGC
TATGCTGGCAATGCTGCATCCTCAGAAACCTCATCTCCTTCTGCAAATAA
TAATACCAACCCTCCGGGTACTCCTGATCATATTCATCATCATAGCAACA
ACATGAACAACGAGGACAATGATAATAACAATGGCAGTAATAATAAGGTT
AGCAATGACAGTAAACTTGACTTCGTTACTGATGATCTTGAATACCATCT
GGCGAACACTCATCCGGACGACACCAATGACAAAGTGGAGTCGAGAAGCA
ATGAGGTGAATGGGAACAATGACGATGATGCTGATGCCAACAACATTTTT
AAACAGCAAGGTGTTACTATCAAGAACGACACTGAAGATGATTCGATAAA
TAAGGCCTCTAT
31
Many Related Computational Tasks
  • Information is in the code book ?
  • How alternative splicing is determined and where?
  • Build models for regulation of genes at different
    levels of complexity
  • Relate genotype and phenotype What are the
    expression patterns of some disease? How do they
    relate to sequence? What model can explain the
    observations? Can we predict phenomenon based on
    our models?

32
Who came first?
  • Chicken or egg?
  • Egg
  • DNA or Protein?
  • RNA
  • Thomas Cech Sidney Altman ( 80s !)
  • RNA as an independent molecule
  • Probably more close to the ancient source

33
RNA roles
  • Messenger RNA (mRNA)
  • Encodes protein sequences
  • Transfer RNA (tRNA)
  • Adaptor between mRNA molecules and amino-acids
    (protein building blocks)
  • Ribosomal RNA (rRNA)
  • Part of the ribosome, a machine for translating
    mRNA to proteins
  • ...

34
Transfer RNA
  • Anticodon
  • matches a codon (triplet of mRNA nucleotides)
  • Attachment site
  • matches a specific amino-acid

35
Related Computational Tasks
  • RNA secondary structure prediction
  • based on CFG and CM
  • RNA coding area prediction

36
RNA Editing
Source Mathews van Holde
37
Translation
38
How do Proteins Perform their Rules?
  • Protein interact in various ways
  • Change conformations, conformations ? function
  • Major Issues
  • Their active/functional areas which interact
  • Their 3D structure

39
Protein Structure
  • Proteins are poly-peptides of 70-3000 amino-acids
  • This structure is (mostly) determined by the
    sequence of amino-acids that make up the protein

40
Protein Structure
41
Related Computational Tasks
  • Protein 2D, 3D structure prediction
  • Identify sequence motifs/domains in proteins
  • Sequence similarity vs. functional similarity

42
Course Goals
  • Review current tasks posed by modern molecular
    biology
  • Review and experiment with some of the
    tools/solutions currently found (e.g. BLAST,
    clustalw)
  • Gain some tools to handle such problems
  • Dynamic programming
  • Probabilistic graphical models
  • MM,HMM,CM,Trees
  • Representation, what principles justify them,
    Learning, Inference
  • Statistic tools how to measure our confidence in
    our results?

43
Course Goals
  • Computational tools in molecular biology
  • We will cover computational tasks that are posed
    by modern molecular biology
  • We will discuss the biological motivation and
    setup for these tasks
  • We will understand the the kinds of solutions
    exist and what principles justify them

44
Courses Main Point
45
Courses Main Point
  • Learn to do
  • Define the problem ? Find comp. solution
  • Four Aspects
  • Biological
  • What is the task?
  • Algorithmic
  • How to perform the task at hand efficiently?
  • Learning
  • How to adapt parameters of the task form examples
  • Statistics
  • How to differentiate true phenomena from artifacts

46
Example Sequence Comparison
  • Biological
  • Evolution preserves sequences, thus similar genes
    might have similar function
  • Algorithmic
  • Consider all ways to align one sequence against
    another
  • Learning
  • How do we define similar sequences? Use
    examples to define similarity
  • Statistics
  • When we compare to 106 sequences, what is a
    random match and what is true one

47
Topics I
  • Dealing with DNA/Protein sequences
  • Genome projects and how sequences are found
  • Finding similar sequences
  • Models of sequences Hidden Markov Models
  • Transcription regulation
  • Protein Families
  • Gene finding

48
Topics II
  • Gene Expression
  • Genome-wide expression patterns
  • Data organization clustering
  • Reconstructing transcription regulation
  • Recognizing and classifying cancers

49
Topics III
  • Models of genetic change
  • Long term evolutionary changes among species
  • Reconstructing evolutionary trees from current
    day sequences
  • Short term genetic variations in a population
  • Finding genes by linkage and association

50
Topics IV
  • Protein World
  • How proteins fold - secondary tertiary
    structure
  • How to predict protein folds from sequences data
    alone
  • How to analyze proteins changes from raw
    experimental measurements (MassSpec)
  • 2D gels

51
Class Structure
  • 2 weekly meeting
  • Mondays 16-18 (Levin 8), Wednesdays 10-12
    (Kaplan)
  • Grade
  • Homework assignments 50 of the final grade.
    There will be up to seven homework assignments.
    These assignments will include theoretical
    problems, using bioinformatics tools and
    programming.
  • Final home assignment 20 of the final grade.
  • Final test 30 of the grade.
  • Class participation A 5 bonus grade for
    students who actively participate in discussions
    during classes
  • Possible oral presentation of any exercise to
    define grade!

52
Exercises Handouts
  • Check regularly
  • http//www.cs.huji.ac.il/cbio
Write a Comment
User Comments (0)
About PowerShow.com