Outline - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Outline

Description:

Hard to study partially because structure is ... Multiple alignment: Like #1, only with multiple sequences. How to make this useful in context of evolution? ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 26
Provided by: danb195
Category:
Tags: outline | study | time

less

Transcript and Presenter's Notes

Title: Outline


1
Introduction
2
Outline
  • Topics for this class
  • Course logistics
  • A very little bit of background
  • Course topic overview
  • 482/682 will be noticeably different from
    previous years
  • The instructor has changed and they are
    specialized in different areas in bioinformatics
    ?

3
COURSE LOGISTICS
4
Course staff
  • Instructor Bin Ma (DC 3345, binma_at_uwaterloo.ca,
    http//www.cs.uwaterloo.ca/binma)
  • TA Xi Han
  • Course webpage monod.uwaterloo.ca/cs482
  • Prerequisites For undergraduates the two most
    important prereqs are CS 341 and STAT 231.

5
Marking
  • For undergrads
  • 4 assignments (40)
  • In-class midterm (20)
  • Final exam or final project (40)
  • The midterm will happen on 27 October.
  • For grad students
  • 4 assignments (40)
  • 1 final project, done by yourself (60)
  • A proposal (due Nov. 1)
  • A final report.
  • A presentation in class.
  • Undergraduates can do projects too. Will earn
    40 marks.

6
Textbooks, notes
  • Textbook R. Durbin, S. Eddy, A. Krogh, G.
    Mitchison, Biological sequence analysis
    Probabilistic models of proteins and nucleic
    acids, Cambridge University Press, 1999 , ISBN
    0521629713.
  • This is a classic book in this area.
  • Another book that is useful, although not
    required, is
  • Dan Gusfield, Algorithms on Strings, Trees and
    Sequences Computer Science and Computational
    Biology, Cambridge University Press, 1997, ISBN
    0521585198.
  • Many other books are either too specialized or
    low quality.
  • Much material lacks text support.
  • Notes
  • Notes serve as an outline of the material
    lectured. Cannot replace the lecturing.
  • Notes will appear on the web soon after they are
    presented in class, with corrections (!)

7
BRIEF REVIEW OF BIOLOGY
8
A brief review of biology
  • Modern molecular biology studies a few types of
    biologically important molecules DNA, RNA,
    protein, lipid, glycan
  • Bioinformatics has mostly studied DNA, then RNA
    and protein, and less lipid and glycan.
  • The first three have their primary structures as
    sequences.

9
DNA
3
5
G-C is stronger than A-T base pair.
5
3
10
DNA
  • Three reasons for DNAs popularity in
    bioinformatics
  • The most important information carrying molecule
    that passes information to children
  • responsible to many genetic diseases.
  • The simplest to model in a computer
  • DNA is modeled as a string over A,C,G,T
  • In bioinformatics sequence is more often used
    than string. Why?
  • Data is the cheapest to obtain
  • It is predicted that a humans complete genome
    (3Gbps) can be sequenced with lt1000 dollars in a
    day in the near future.
  • Bioinformatics played a key role
  • Google donated a X-prize (http//www.xprize.org/).

11
RNA
  • RNA was less studied before but is now becoming
    more and more important.
  • The structure is important to RNAs function.
    Not a simple string anymore.

12
Protein
Primary structure is a sequence. 20 frequent
amino acids. Fold into a complex 3D structure.
13
Protein
  • Protein is the most important molecule for the
    living of an organism
  • Structural components
  • Participate in almost all chemical reactions in
    cells as enzymes (catalyst). Allow the organism
    to react to the environment through sophisticated
    signal pathway.
  • Directly responsible to most diseases (genetic or
    not) and is the main drug target for diseases
    including Alzheimer and cancer.
  • Protein has become extremely popular in
    bioinformatics
  • Post-genome era
  • Genomics v.s. Proteomics
  • Hard to study partially because structure is
    significant to the function
  • And its more expensive to get the data until
    recently.

14
An example
HER2 is a proto-oncogene found on chromosome 17.
It encodes a protein and functions as a cell
membrane receptor.
Normal epithelial cells express low levels of
HER2 receptor on the cell surface. While some
types of breast cancer cells, over express this
gene. This signals the tumor cells to
proliferate (grow).
15
An example
16
Read more by yourself
  • If you did not have much biology background, read
    the following articles (and other related
    articles) from wikipedia
  • Protein, DNA, RNA, gene, genome, genetic code,
    tRNA.
  • We will briefly review the necessary biology
    knowledge when needed.

17
COURSE TOPICS
  • Keywords algorithm, sequence, phylogeny, protein
    sequencing

18
Keyword 1 algorithm
  • This is a bioinformatics course focusing on
    biological sequence analysis algorithms.
  • How bioinformatics is used in biology
  • Sample ? data ? software ? discovery
  • Bioinformatics research cycle
  • biological problem ? math model ? algorithm ?
    software ? biology
  • Normally the data is too large or the model is
    too complex so that efficient algorithm is
    needed.
  • polynomial is no good any more.
  • some times even linear is not good enough.

19
An examlpe role of bioinformatics
mass spectrometry
protein sample
data
  • Interesting protein information includes
  • Protein identity
  • Protein quantity
  • PTM on proteins
  • These are useful for disease study and drug
    development.

bioinformatics
protein information
20
Keyword 2 sequence
  • Fundamental information storage method in living
    cells DNA sequences.
  • Central dogma of molecular biology DNA ? RNA ?
    protein
  • Hence, to understand an organism, it helps to
    start out by understanding DNA sequences.
  • We can treat DNA sequences as strings.
  • ACCGATTGAGCCGTACC
  • So were going to spend most of the course
    learning about algorithms for strings and
    sequences.

21
Keyword 3 phylogeny
  • Darwins theory of evolution told us that all
    species share the same ancestor.
  • Knowing only the currently living species,
    especially the DNA sequences, reconstruct this
    tree.
  • Without digging the fossil

22
Keyword 4 protein sequencing
  • We will also talk about protein sequencing.
  • Proteins is the construction material and the
    controls of a living organism. It determines
    the phenotype (compared to genotype)
  • Consider genes as source codes and proteins as
    running programs (processes).
  • We will study how to read the sequence
    information of a protein from biological sample.
  • Very interesting algorithms.

They have the same genome!
23
Bioinformatics General Topics
  • This is not a general course in bioinformatics,
    which has become a very broad area
  • Genome sequencing.
  • Sequence comparison
  • Gene prediction and annotation
  • Gene expression and biomarker
  • Motif finding
  • Regulatory network
  • Protein structure comparison and prediction (CS
    483/683)
  • Protein-protein interaction
  • Protein id and quantification with mass
    spectrometry
  • RNA structure, RNA gene prediction, RNAi.
  • Glycans and Lipids
  • Genetic variations SNPs, alternative splicing,
    and diseases.
  • Phylogeny
  • Genome evolution
  • Medical/Cell image processing Molecular
    simulation (bioinformatics? health
    informatics?)
  • DNA computing. (a different area than
    bioinformatics)

24
Specific topics
  • Pairwise alignment Which part of two sequences
    are surprisingly similar to each other, if
    theyve been evolving away from each other?
  • Phylogenetic reconstruction How do I build
    evolutionary trees? How do I know theyre the
    right ones?
  • Multiple alignment Like 1, only with multiple
    sequences. How to make this useful in context of
    evolution?
  • Gene finding Which part of a DNA sequence is
    actually part of the process of producing
    proteins?
  • Protein sequencing How to identify the protein
    sequence from biological samples (wet lab ?
    data)?

25
Summary
  • We talked about
  • course logistics
  • basic biology (wikipedia good resource)
  • course topics
  • Next time sequence alignment
Write a Comment
User Comments (0)
About PowerShow.com