BME280CSE277CSE377: Bioinformatics Spring 2006 - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

BME280CSE277CSE377: Bioinformatics Spring 2006

Description:

Nematode (C. Elegans) 100 Mb. Mouse 2 Gb (Giga bases) Human 3 Gb. Wheat 16.5 Gb. Lily 32-48 Gb ... Nematode (C. Elegans) 13,000. Human 32,000 (?) 21. Central Dogma ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 62
Provided by: ion80
Learn more at: https://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: BME280CSE277CSE377: Bioinformatics Spring 2006


1
BME280/CSE277/CSE377 BioinformaticsSpring 2006
2
Administrivia
  • Lecture time TTh 1230-145pm
  • Lecture place Engineering II, Room 322
  • Instructor Ion Mandoiu
  • Office ITEB 261
  • Tel 6-3784
  • E-mail ion_at_engr.uconn.edu
  • Office hours MW 1-2pm

3
Textbooks
  • Neil C. Jones and Pavel A. Pevzner, An
    Introduction to Bioinformatics Algorithms, MIT
    Press, 2004. Textbook website http//bioalgorithm
    s.info/. (REQUIRED)
  • D. Gusfield, Algorithms on Strings, Trees, and
    Sequences, Cambridge University Press, 1997
    (OPTIONAL)

4
Grading
  • 30 homework assignments
  • Bi-weekly
  • 30 programming projects
  • Individual, 3-4 projects
  • 40 final project
  • Individual or teams of 2
  • Written report short presentation
  • Possible topics
  • Algorithm implementation empirical study
  • In-depth survey of a topic not covered in class
  • Progress on open research problems
  • Propose your own!

5
What is Bioinformatics?
  • Bioinformatics is generally defined as the
    analysis, prediction, and modeling of biological
    data with the help of computers

6
Why Bioinformatics?
  • DNA sequencing technologies have created massive
    amounts of information that can only be
    efficiently analyzed with computers
  • Hundreds of species sequenced
  • Human, rat, chimp, chicken,
  • As the information becomes ever so larger and
    more complex, more computational tools are needed
    to sort through the data.
  • Biology is becoming an information science!
  • Slowly, we are learning how cells work through
    comparative genomics -- not unlike comparative
    linguistics

7
Bioinformatics Tools
  • Bioinformatics problems involve multiple aspects
  • Example Sequence Comparison
  • Biology How are genes evolving? How is gene
    function related to gene sequence?
  • Learning/AI How do we define similar? Can
    we learn from examples?
  • Algorithms How can we efficiently find all
    similar sequences?
  • Statistics How do we distinguish a random match
    from a true one?

8
Course Description
  • Course emphasis
  • Modeling computational problems arising in
    biology as graph-theoretic, statistical, or
    mathematical optimization problems
  • Design, analysis, and implementation of efficient
    algorithms
  • Algorithmic techniques to be covered
  • Exhaustive search
  • Integer programming
  • Greedy algorithms
  • Dynamic programming
  • Divide-and-conquer
  • Graph algorithms
  • Combinatorial pattern matching
  • Clustering
  • Hidden Markov models
  • Randomized algorithms

9
Course Description
  • Biological applications
  • Restriction mapping
  • DNA sequencing
  • Motif finding
  • Pairwise sequence alignment
  • Gene prediction
  • Evolutionary trees
  • Genome rearrangements

10
Complete and return the survey!
11
Basic Molecular Biology
12
The Cell
Source D. Geiger
All cells contain the same DNA, yet there are
many types of cells!
13
Mendel and his Genes
  • Genes -- physical and functional traits passed on
    from one generation to the next
  • Discovered by Gregor Mendel in the 1860s while he
    was experimenting with the pea plant. He asked
    the question

Do traits come from a blend of both parent's
traits or from only one parent?
14
The Pea Plant Experiments
  • Mendel discovered that genes were passed on to
    offspring by both parents in two forms dominant
    and recessive.
  • The dominant form would be the phenotypic
    characteristic of the offspring

15
DNA The Code of Life
  • The structure and the four genomic letters code
    for all living organisms
  • Adenine, Guanine, Thymine, and Cytosine which
    pair A-T and C-G on complimentary strands.

16
DNA Components
Source D. Geiger
17
The Human Genome
Source D. Geiger
18
DNA Organization
Source D. Geiger
19
Genome Sizes
  • E. Coli (bacteria) 4.7 Mb (Mega bases)
  • Yeast (simple fungi) 15 Mb
  • Nematode (C. Elegans) 100 Mb
  • Mouse 2 Gb (Giga bases)
  • Human 3 Gb
  • Wheat 16.5 Gb
  • Lily 32-48 Gb

20
Genes
  • DNA strings contain
  • Coding regions (genes)
  • Control regions
  • Junk DNA (unknown function)
  • Estimated number of genes
  • E. Coli (bacteria) 4,000
  • Yeast (simple fungi) 6,000
  • Nematode (C. Elegans) 13,000
  • Human 32,000 (?)

21
Central Dogma
  • Cells express different subsets of genes under
    different environments

Transcription
Translation
Protein
mRNA
Gene
22
Gene Transcription
Source D. Geiger
  • RNA similar to DNA, but has
  • slightly different backbone
  • Uracil (U) instead of Thymine (T)

23
RNA Roles
Source D. Geiger
24
Translation
  • Catalyzed by Ribosome
  • Using two different sites, the Ribosome
    continually binds tRNA, joins the amino acids
    together and moves to the next location along the
    mRNA
  • 10 codons/second, but multiple translations can
    occur simultaneously

http//wong.scripps.edu/PIX/ribosome.jpg
25
Genetic Code
  • Human cells produce approx. 100,000 proteins
  • Proteins are poly-peptides consisting of
    70-3,000 amino acids
  • There are 20 different amino acids every 3
    nucleotides in a gene encode for 1 amino acid (or
    the STOP signal)

Source D. Geiger
26
Protein Folding
  • Proteins are not linear structures, though they
    are built that way
  • The amino acids have very different chemical
    properties they interact with each other after
    the protein is built
  • This causes the protein to start fold and
    adopting its functional structure
  • Proteins may fold in reaction to some ions, and
    several separate chains of peptides may join
    together through their hydrophobic and
    hydrophilic amino acids to form a polymer

27
Protein Folding (contd)
  • The structure that a protein adopts is vital to
    its chemistry
  • Its structure determines which of its amino acids
    are exposed carry out the proteins function
  • Its structure also determines what substrates it
    can react with

28
Protein Structure
Source D. Geiger
29
Basic Molecular BiotechnologyHow is information
accessed at molecular level?
30
Operations on DNA/RNA
  • Amplification (making many copies)
  • Cutting into shorter fragments
  • Reading fragment lengths
  • Reading DNA sequence
  • Probing presence of specific fragments

31
Why we need so many copies
  • Biologists needed to find a way to read DNA
    codes.
  • How do you read base pairs that are angstroms in
    size?
  • It is not possible to directly look at it due to
    DNAs small size.
  • Need to use chemical techniques to detect what
    you are looking for.
  • To read something so small, you need a lot of it,
    so that you can actually detect the chemistry.
  • Need a way to make many copies of the base pairs,
    and a method for reading the pairs.

32
Polymerase Chain Reaction
  • Problem Modern instrumentation cannot easily
    detect single molecules of DNA, making
    amplification a prerequisite for further analysis
  • Solution PCR doubles the number of DNA fragments
    at every iteration

1 2 4 8
33
Denaturation
Raise temperature to 94oC to separate the duplex
form of DNA into single strands
34
Design primers
  • To perform PCR, a 10-20bp sequence on either side
    of the sequence to be amplified must be known
    because DNA pol requires a primer to synthesize a
    new strand of DNA

35
Annealing
  • Anneal primers at 50-65oC

36
Annealing
  • Anneal primers at 50-65oC

37
Extension
  • Extend primers raise temp to 72oC, allowing Taq
    pol to attach at each priming site and extend a
    new DNA strand

38
Extension
  • Extend primers raise temp to 72oC, allowing Taq
    pol to attach at each priming site and extend a
    new DNA strand

39
Repeat
  • Repeat the Denature, Anneal, Extension steps at
    their respective temperatures

40
Polymerase Chain Reaction
41
Restriction Enzymes
  • Discovered in the early 1970s
  • Used as a defense mechanism by bacteria to break
    down the DNA of attacking viruses.
  • They cut the DNA into small fragments.
  • Can also be used to cut the DNA of organisms.
  • This allows the DNA sequence to be in a more
    manageable bite-size pieces.
  • It is then possible using standard purification
    techniques to single out certain fragments and
    duplicate them to macroscopic quantities.

42
Molecular Scissors
Molecular Cell Biology, 4th editionfig 9-10
43
Discovering Restriction Enzymes
  • HindII first restriction enzyme discovered by
    Hamilton Smith in 1970
  • From bacterium Haemophilus influenzae
  • Discovered accidentally while studying how the
    bacterium Haemophilus influenzae takes up DNA
    from the phage virus P22
  • Recognizes and cuts DNA at sequences
  • GTGCAC
  • GTTAAC

44
Recognition Sites of Restriction Enzymes
45
Separating DNA by Size
  • Gel electrophoresis is a process for separating
    DNA by size
  • Can separate DNA fragments that differ in length
    in only 1 nucleotide for fragments up to 500
    nucleotides long

46
Gel Electrophoresis
  • DNA fragments are injected into a gel positioned
    in an electric field
  • DNA are negatively charged near neutral pH
  • The ribose phosphate backbone of each nucleotide
    is acidic DNA has an overall negative charge
  • DNA molecules move towards the positive electrode

47
Gel Electrophoresis (contd)
  • DNA fragments of different lengths are separated
    according to size
  • Smaller molecules move through the gel matrix
    more readily than larger molecules
  • The gel matrix restricts random diffusion so
    molecules of different lengths separate into bands

48
Detecting DNA Autoradiography
  • One way to visualize separated DNA bands on a gel
    is autoradiography
  • The DNA is radioactively labeled
  • The gel is laid against a sheet of photographic
    film in the dark, exposing the film at the
    positions where the DNA is present.

49
Detecting DNA Fluorescence
  • Another way to visualize DNA bands in gel is
    fluorescence
  • The gel is incubated with a solution containing
    the fluorescent dye ethidium
  • Ethidium binds to the DNA
  • The DNA lights up when the gel is exposed to
    ultraviolet light.

50
Gel Electrophoresis Example
Direction of DNA movement
Smaller fragments travel farther
51
Sequencing
  • Biologists can reliably find the sequence of
    A/C/T/G for short strings (few hundred
    nucleotides)
  • Chain termination
  • Single strand template
  • Complementary strand synthesis blocked with small
    probability at particular nucleotides
  • Lengths of fragments read for each class of
    strings

52
Sequencing
  • Biologists can reliably find the sequence of
    A/C/T/G for short strings (few hundred
    nucleotides)
  • Chain termination
  • Single strand template
  • Complementary strand synthesis blocked with small
    probability at particular nucleotides
  • Lengths of fragments read for each class of
    strings

53
Sequencing
  • Biologists can reliably find the sequence of
    A/C/T/G for short strings (few hundred
    nucleotides)
  • Chain termination
  • Single strand template
  • Complementary strand synthesis blocked with small
    probability at particular nucleotides
  • Lengths of fragments read for each class of
    strings

ATACGGA ATACGG ATACG ATAC ATA AT A
54
Sequencing
55
DNA Hybridization
  • Single-stranded DNA will naturally bind to
    complementary strands
  • Hybridization is used to locate genes, regulate
    gene expression, and determine the degree of
    similarity between DNA from different sources
  • Hybridization is also referred to as annealing or
    renaturation

56
Microarray Technologies
  • Oligonucleotide arrays
  • Short (20-60bp) synthetic DNA strands
  • Arrays of cDNAs
  • Obtained by reverse transcription from Expressed
    Sequence Tags (ESTs)

57
DNA Array Hybridization Experiment
Images courtesy of Affymetrix.
58
Two-Color Technique
  • Sample labeled RED
  • Control labeled GREEN
  • YELLOW probes hybridize to both sample and
    control
  • BLACK probes hybridize to neither

59
Sequencing by Hybridization
  • Exploits parallel hybridization capabilities
    offered by DNA arrays
  • ALL probes of a certain length k (k8 to 10) are
    synthesized on the array
  • Target DNA hybridizes at locations which store
    probes complementary to its k-substrings
  • Sequencing by Hybridization (SBH) Problem
    Reconstruct target DNA given its k-length
    substrings (spectrum)

60
Operations on Proteins
  • Cloning in expression vectors
  • 2-Dimensional gel electrophoresis separate
    proteins by molecular weight/pH gradient
  • Antibody techniques (immunoprecipitation,
    antibody arrays,)
  • Mass spectrometry (e.g., MALDI-TOF)

61
Active research problems
  • Genome projects have already given draft genome
    sequence for hundreds of species, but lots of
    questions remain to be answered
  • Create a complete parts list gene sequences
    (including intron/exon structure), transcription
    factors,
  • Understand function of each part, e.g., protein
    structure, protein/DNA and protein/protein
    interactions
  • Understand mechanisms, e.g., pathways
  • Understand how everything fits together systems
    biology
Write a Comment
User Comments (0)
About PowerShow.com