BCB 444544 Introduction to Bioinformatics - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

BCB 444544 Introduction to Bioinformatics

Description:

Genes with related functions are often clustered in operons (e.g., lac operon) Operons are transcriptionally regulated as a single unit - one promoter controls ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 34
Provided by: off669
Category:

less

Transcript and Presenter's Notes

Title: BCB 444544 Introduction to Bioinformatics


1
BCB 444/544 - Introduction to Bioinformatics
Lecture 9 Gene Structure Prediction Protein
Function Prediction 9_Sept11
2
Assignments Reading Exercises
Homework 2
  • vMon Sept 11
  • CH Chp 2.1 pp 34 - 59, DQs MMs
  • Re Predicting Protein Function
  • Read Friedberg I, Harder T, Godzik A. (2006)
    JAFA a protein function annotation meta-server.
    Nucleic Acids Res. 34 (Web Server issue)W379-81
    PMID 16845030
  • http//nar.oxfordjournals.org/cgi/content/full/34
    /suppl_2/W379
  • Visit http//jafa.burnham.org/
  • Wed Sept 13 Fri Sept 15
  • CH Chp 2.2 pp 59 - 83
  • Also, DQs MMs
  • Homework 2 - Due Mon, Sept 18

3
Genome Sequence Acquisition Analysis
  • 2.1 How are Genomes Sequenced?
  • What Is Genomics?
  • How Are Whole Genomes Sequenced?
  • How Are Organisms Picked for Genome Sequencing?
  • Math Minute 2.1 What Can You Learn from a Dot
    Plot?
  • Math Minute 2.2 How Do You Find Motifs?
  • Can We Predict Protein Functions from DNA
    Sequence?
  • Math Minute 2.3 What Are "Positives"
  • What Do They Have to Do with E-values?

Chp 2 - Campbell Heyer Companion Website
4
Genome Sequence Acquisition Analysis
  • 2.1 How are Genomes Sequenced? - cont.
  • What Shapes Are the Proteins?
  • Does Structure Reveal Function?
  • Why Do the Databases Contain So Many Partial
    Sequences?
  • Which Sequencing Method Worked Better?
  • Annotated Genomes Online
  • How Many Proteins Can One Gene Make?
  • Can the Genome Alter Gene Expression Without
    Changing the DNA Sequence?
  • What Is the Fifth Base in DNA? Methyl-Cytosine
  • Imprinting, Methylation, and Cancer
  • SUMMARY 2.1

Chp 2 - Campbell Heyer Companion Website
5
Isn't there something puzzling about information
provided so far?
1st, Back to Chp1 QUESTIONS?
Hmmm. What is size of "average" human
chromosome? Genome 3 X 109 bp Divided among 23
(pairs of) chromosomes "Average" human chromosome
? X 106 bp (Mb)
130 Mb
6
Chp1- Questions?
  • Why do authors focus on these point mutations
    when the patients' chromosomes have either big
    deletions or translocations near/in the
    dystrophin gene?
  • We were told that certain cases of DMD are
    associated with these point mutations, which
    result in changes in protein sequence
  • 54L?R (Leu to Arg) gtgt drastic phenotype, DMD
    Duchenne MD
  • 168A?D (Ala to Asp) gtgt less severe, BMD Becker
    MD
  • But, we were are told that the dystrophin gene is
    huge!
  • Gene close to 1,000,000 bp long! (1 Megabase
    or 1Mb)
  • mRNA 14,000 nt long
  • Protein dimer (or tetramer, in crystal
    structure, PDB 1DXX)
  • How many kDa (kilodaltons)?
  • Did "the doctor" really sequence the entire gene
    from each of these patients????

7
UPDATES re Chp1 Questions? 1
  • But, we were are told that the dystrophin gene is
    huge!
  • Gene close to 1,000,000 bp long! (1 Megabase
    or 1Mb)
  • Actually bigger! gt 2.5 Mb
  • Spans 79 exons!
  • Dystrophin gene is largest human gene!
  • Accounts for 0.1 of human genome!
  • mRNA 14,000 nt long
  • Actually 14.6 kb! (with 11 kb "coding"
    region)
  • Protein? gt 3,500 amino acids (aa's)
  • Protein molecular weight? 427 kilodaltons
    (kDa)
  • Complex! Several promoters alternative
    splicing result in different proteins (in size
    sequence) in different tissues
  • Wikipedia Dystrophin has the longest gene
    known to date, measuring 2.4 megabases. Its
    gene's locus is Xp21 and has 79 exons spanning
    2.5 Mb, produces an mRNA of 14.6 kb and a protein
    of over 3500 amino acid residues. The gene is so
    large it accounts for 0.1 of the human genome!
    (Don't worry - I checked data at ENTREZ Gene
    OMIM)

8
UPDATES re Chp1 Questions? 2
  • Did "the doctor" really sequence the entire gene
    from each of these patients????
  • Probably not! But, direct sequencing of entire
    coding region was
  • been reported this year
  • Go to ENTREZ/PubMed type PMID 16331671
  • Also, now several labs to do sequence the entire
    coding region, but many provide other diagnostic
    tests instead
  • Go to OMIM, type DMD, then click on Gene Tests
    link
  • from here there is lots of info - e.g.,
    click on Testing
  • Now, go back click on Reviews - wow,terms are
    defined, too!
  • Now, click on Educational Materials at top - a
    great glossary here!
  • Note that Gene Reviews is a great dynamic
    online-only journal - try looking up another
    disease you find interesting

9
Questions?
2nd, Back to Friday's QUESTIONS?
  • What are substitution matrices?
  • 2 Major types PAM BLOSUM
  • Re Assigned Reading for Chp 2 Lab 3
  • Math Minute 2.3 - pp. 46 47
  • Note MISTAKE on p.47
  • Incorrect version
  • "BLOSUM45 for finding more closely related
    sequences BLOSUM80 for finding
    more divergent proteins"
  • Correct version
  • "BLOSUM45 for finding more divergent sequences
  • BLOSUM80 for finding more closely related
    proteins"

10
Substitution Matrices Pam vs BLOSUM
  • PAM Point Accepted Mutation - relies on
    "evolutionary model" based on observed
    differences in closely related proteins
  • Model includes rate for each type of sequence
    change
  • Suffix number (n) reflects amount of "time"
    passed rate of expected mutation if n of amino
    acids had changed
  • PAM1 - for less divergent sequences (shorter
    time)
  • PAM250 - for more divergent sequences (longer
    time)
  • BLOSUM BLOck SUbstitution Matrix - based on
    aa substitutions observed in evolutionarily
    divergent proteins
  • Doesn't rely on a specific evolutionary model
  • Suffix number (n) reflects expected similarity
    average aa identity in the MSA from which the
    matrix was generated
  • BLOSUM45 - for more divergent sequences
  • BLOSUM80 - for less divergent sequences

See Substitution Matrix (Wikipedia)
11
Gene Prediction Protein Function Prediction
  • What is a gene? Segment of DNA, some of which is
    "structural," i.e., transcribed to give a
    functional RNA product, some of which is
    "regulatory"
  • Genes can encode
  • mRNA (i.e., for protein)
  • other types of RNA (tRNA, rRNA, miRNA, etc.)
  • Genes differ in eukaryotes vs prokaryotes (
    archaea) - both structure regulation

12
Eukaryotes vs Prokaryotes Cells
  • Typical human bacterial cells drawn to scale
  • Eukaryotic cells are characterized by
    membrane-bound compartments, importantly, a
    nucleus, which is absent in prokaryotes

Brown Fig 2.1
BIOS Scientific Publishers Ltd, 1999
13
Eukaryotes vs Prokaryotes Genes Genomes
  • Genes genomes in eukaryotes vs prokaryotes
  • Have different structures and regulatory signals
  • Eukaryotic genomes
  • Are packaged in chromatin sequestered in a
    nucleus
  • Are larger and have multiple chromosomes
  • Contain mostly non-protein coding DNA (98-99)

14
Eukaryotes vs Prokaryotes Genes Genomes
  • Eukaryotic genes
  • Are larger and more complex than in prokaryotes
  • Contain introns that are spliced out to
    generate mature mRNAs
  • Often undergo alternative splicing, giving rise
    to multiple RNAs
  • Are transcribed by 3 different RNA polymerases,
    (instead of 1, as in prokaryotes)
  • In biology, statements such as this include an
    implicit usually or often

15
Eukaryotes vs Prokaryotes Genes Regulation
  • Primary level of control?
  • Prokaryotes Transcription
  • Eukaryotes Transcription is also important, but
  • Expression is regulated at multiple levels
  • e.g., RNA processing, transport, stability,
  • protein processing, post-translational
    modification, localization, stability
  • Recent discoveries small RNAs (miRNA, siRNA)
    play very important regulatory roles in
    eukaryotes, often at post-transcriptional levels

16
Eukaryotic Gene Structure
  • Genes are fragmented, containing
    non-protein-coding introns between the functional
    exons

17
Synthesis Processing of Eukaryotic mRNA
Gene in DNA
18
cDNAs ESTs
  • cDNA libraries are important for determining gene
  • structure studying the regulation of gene
    expression
  • Isolate RNA (always from a specific
  • organism, region, and time point)
  • Convert RNA to complementary DNA
  • (with reverse transcriptase)
  • Clone into cDNA vector
  • Sequence the cDNA inserts
  • Short cDNAs are called ESTs or
  • Expressed Sequence Tags
  • ESTs are strong evidence for genes
  • Full-length cDNAs can be difficult to obtain

19
UniGene unique genes via ESTs
  • Find UniGene at NCBI
  • www.ncbi.nlm.nih.gov/UniGene
  • UniGene clusters contain many ESTs
  • UniGene data come from many cDNA libraries.
  • When you look up a gene in UniGene, you can
  • obtain information re level tissue
  • distribution of expression

20
Gene Regulation
  • Eukaryotes vs prokaryotes
  • Prokaryotic operons promoters
  • Eukaryotic promoters enhancers
  • Eukaryotic transcription factors
  • Promoters enhancers
  • What does an RNA polymerase "see"?

21
Prokaryotic Genes Operons
  • Genes with related functions are often clustered
    in operons (e.g., lac operon)
  • Operons are transcriptionally regulated as a
    single unit - one promoter controls several
    proteins
  • mRNAs produced are polycistronic - one mRNA
    encodes several proteins i.e., there are
    multiple ORFs, each with AUG (START) STOP
    codons

22
Prokaryotic promoters
  • RNA polymerase complex recognizes promoter
    sequences located very close to on 5 side
    (upstream) of initiation site
  • RNA polymerase complex binds directly to these.
    with no requirement for transcription factors
  • Prokaryotic promoter sequences are highly
    conserved -10 region
  • -35 region

23
Promoter for prokaryotic RNA polymerase (e.g.,
in bacterium, E. coli)
Brown Fig 9.17
BIOS Scientific Publishers Ltd, 1999
24
Eukaryotic genes
  • Genes with related functions are not usually
    clustered, but share common regulatory regions
    (promoters, enhancers, etc.)
  • Chromatin structure must be right for
    transcription to occur

25
Eukaryotic genes have large complex regulatory
regions
Cis-acting regulatory elements include Promoters
, enhancers, silencers Trans-acting regulatory
factors include Transcription factors (TFs),
chromatin remodeling complexes, small RNAs

Brown Fig 9.17
BIOS Scientific Publishers Ltd, 1999
26
Eukaryotic Promoters Enhancers
  • Both promoters enhancers are binding sites for
    transcription factors
  • Promoters located relatively close to
    initiation site
  • (but can be located within gene,
    rather than upstream!)
  • Enhancers also required for regulated
    transcription
  • (control expression in specific cell types,
    developmental stages, in response to
    environment, etc.)
  • RNA polymerase complexes do not specifically
    recognize promoter sequences directly
  • Transcription factors bind first and serve as
    landmarks for recognition by RNA polymerase
    complexes

27
Activators vs Repressors
Regions far from the promoter can act as
"enhancers" or "repressors" of transcription by
serving as binding sites for activator or
repressor proteins (TFs)
promoter
enhancer
Gene
100 - 50,000 bp
repressor
Activator proteins (TFs) bind to enhancers
interact with RNAP to stimulate transcription
repressor prevents binding of activator
Repressors block the action of activators
28
Eukaryotic regulatory regions are complex (often
contain many different TF binding site motifs !!!)
Fig 9.13 Mount 2004
29
Eukaryotic genes are transcribed by 3 different
RNA polymerases (Regulatory regions TFs
differ, too)
rRNA
mRNA
tRNA, 5S RNA
Brown Fig 9.18
BIOS Scientific Publishers Ltd, 1999
30
Eukaryotic transcription factors
  • Transcription factors (TFs) are DNA binding
    proteins that also interact with the RNA
    polymerase complex to activate or repress
    transcription
  • TFs contain characteristic DNA binding motifs
  • (these motifs are strings of amino acids
    in TF protein sequences)
  • TFs recognize specific short DNA sequence motifs
    called transcription factor binding sites
  • (these motifs are strings of nucleotides in DNA
    sequences)
  • Several databases for these, e.g. TRANSFAC,
    JASPAR

31
Zinc finger transcription factors
  • Common in eukaryotic proteins
  • 1 of mammalian genes encode zinc-finger
    proteins
  • In C. elegans, there are 500!
  • Can be used as highly specific DNA binding
    modules
  • Potentially valuable tools for directed genome
    modification (esp. in plants) human gene therapy

Brown Fig 9.12
BIOS Scientific Publishers Ltd, 1999
32
Gene Prediction
  • Overview of steps strategies
  • What sequence signals can be used?
  • What other types of information can be used?
  • Algorithms
  • HMMs, Bayesian models, neural nets
  • Gene prediction software
  • 3 major types
  • many, many programs!

33
Overview of gene prediction strategies
What sequence signals can be used?
Transcription TF binding sites, promoter,
initiation site, terminator, GC islands, etc.
Processing signals splice donor/acceptors,
polyA signal Translation start (AUG Met)
stop (UGA,UUA, UAG) ORFs,
codon usage What other types of information can
be used? Homology (sequence comparison,
BLAST) cDNAs ESTs (experimental data,
pairwise alignment)
Write a Comment
User Comments (0)
About PowerShow.com