Jeanette P. Schmidt - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Jeanette P. Schmidt

Description:

Jeanette P. Schmidt. Stanford University. Computational Biology ... Sometimes experimental data (especially if obtained in high throughput) can be 'dirty' ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 25
Provided by: leel8
Category:
Tags: jeanette | schmidt

less

Transcript and Presenter's Notes

Title: Jeanette P. Schmidt


1
Orthologs, Paralogs and splice variants -- can
we to recognize experimental artifacts
algorithmically
  • Jeanette P. Schmidt
  • Stanford University

2
Computational Biology
  • Interplay between experiments and computations

3
Sometimes experimental data (especially if
obtained in high throughput) can be dirty
Sometimes -- one can detect bad experimental data
Normally try to correct the model
X
4
Overview
  • Context Identify characterize (protein coding)
    genes their function across organisms
  • Types of experimental Data
  • Addressing bad data
  • Definitions
  • Orthologs
  • Paralogs
  • Splice variants
  • Why they pose computational challenges

5
Data
  • Fully sequenced genome from several organisms
  • EST (expressed sequence tags) obtained through
    high throughput methods

An Expressed Sequence Tag is a portion of an
entire gene that can be used to help identify
unknown genes and to map their positions within
a genome.
6
The Problem
  • Identify the important (protein coding) genes
    their function across organisms
  • Define paralogs, pseudogenes, orthologs why
    they make computation difficult
  • Identify pseudogenes -- look computational like
    genes but are not functional -- do not make
    proteins
  • How can computation guide experiments

7
ESTs and gene discovery
An Expressed Sequence Tag is a portion of an
entire gene that can be used to help identify
unknown genes and to map their positions within a
genome.
8
Where do the ESTs come from?
Extended exon or artifact?
Splice variant or articfact?
Read through or articfact?
9
Characteristics
No stop codon in an intron of 90 nucleotides is
quite common (1-3/64)30 .24
  • Read through
  • Stop codon?
  • Length of read through? (statistical estimates)
  • The shorter the more likely to get a read through
    w/o stop codon
  • How many times observed (different mRNA pool -
    different tissues)
  • Present in other species (ortholog)
  • Extended exon
  • Donor acceptor site
  • Present in other species (ortholog)
  • Splice variant?
  • Stop codon?
  • Present in other species (ortholog)
  • Observed in certain tissues (more than ones)

10
Should we bother with ESTs?
  • A gene atlas of the mouse and human
  • protein-encoding transcriptomes
  • (PNAS, 2004) - (Hogenesh)
  • We find that although no single line of evidence
  • is universally predictive of expression,
  • EST evidence has the most predictive value

11
Determining accurancy of splice variants by EST
  • Most common approach
  • Look at tissue specificity of exon in transcript

12
When looking at tissue specific splicing, why
look at the splicing event rather than the exon?
Exon Mapping
Splice Mapping
2
1
3
FL1
EST A Heart EST B Liver EST C Brain EST D
Brain
EST A Heart EST B Liver EST C Brain EST D
Brain
EST A Heart EST B Liver EST C Brain
FL1
EST A Heart EST B Liver EST C Brain
EST A Heart EST B Liver EST C Brain
FL2
FL2
EST D Brain
EST A Heart EST B Liver EST C Brain EST D
Brain
EST A Heart EST B Liver EST C Brain EST D
Brain
Mapping ESTs to exons doesnt really distinguish
the two variants
Looking at the ESTs which share the same splice
site shows brain specific splicing
13
Tissue specific splicing
  • By changing metric slightly we get a significant
    increase.
  • Number of exons k
  • Number of potential splice junctions O(k2)
  • Number of splice junctions in practice O(k)
  • Note that even with O(k) splice junction --gt
    potential number of splice variants is 2k.

14
When looking at tissue specific splicing, why
look at the splicing event rather than the exon?
Exon Mapping
Splice Mapping
2
1
4
FL1
EST A Heart EST B Liver EST C Brain EST D
Brain EST E Brain
EST A Heart EST B Liver EST C Brain EST D
Brain
EST A Heart EST B Liver EST C Brain EST E
Brain
EST E Brain EST B Liver EST C Brain
FL1
EST A Heart EST B Liver EST C Brain
EST A Heart EST B Liver EST C Brain
FL2
FL2
EST D Brain
EST A Heart EST B Liver EST C Brain EST D
Brain
EST A Heart EST B Liver EST C Brain EST D
Brain
FL3
EST E Brain
EST B Liver EST C Braun
FL3
15
(No Transcript)
16
Comparison with other species
  • Gene duplication

If A2 is a functional gene A1 and A2 are
paralogs -- A2 is the result of a duplication
of A1
Not functional --gt A2 pseudogene
17
Events do not always happen in the order we
prescribe
18
Events do not always happen in convenient order
19
Events do not always happen in convenient order
Speciation 1
A1 B1 are orthologs
A1
B1
Gene Duplication
B1
B2
B1 B2 are paralogs
B1
C1
C2
B2
B1
C1
B2
C2
C3
A1
Now what?
20
Ortholog identification
  • Combination of methods used to identify
    orthologs
  • Homology use reciprocal best hit (need complete
    set of genes to identify best)
  • Breaks in presence of paralogs --gt More than one
    best hit
  • Syntenic confirmation
  • Provides additional solid evidence for ortholog
    relationship but requires sequenced genome
  • Cannot distinguish between paralogs --gt they are
    adjacent

21
Rat and Mouse Synteny

22
Whats the solution?
  • Change the metric
  • Allow multiple orthologs per genes
  • Extend notion of best reciprocal by including
    e-neighborhood.
  • 2 options --gt
  • allow ortholog only if e -neighborhood is 1
  • include entire neighborhood

23
Summary
  • Cleaner ortholog identification
  • Better picture of when 2 orthologs might be
    present in other species
  • Important for pre-clinical experiments (off
    targets)

24
ExampleIncyte hand-edited lipid kinase
Protein with strong similarity to
phosphatidylinositol-4-phosphate 5-kinase type
III (mouse Pip5k3), member of phosphatidylinosito
l-4-phosphate 5-kinase and TCP-1 or cpn60
Families, contains a domain of unknown function
and a FYVE zinc finger
Protein of unknown function, has strong
similarity to a region of phosphatidylinositol
-4-phosphate 5-kinase type III (mouse Pip5k3),
which is a lipid kinase that binds to
phosphatidylinositol 3-phosphate and may act in
endosomal trafficking
Protein containing a domain of unknown function,
has strong similarity to a region of
phosphatidylinositol -4-phosphate 5-kinase type
III (mouse Pip5k3), which is a lipid kinase that
binds to phosphatidylinositol 3-phosphate
25
FL lipid kinase
26
Acknowledgements
  • Kristian Stevens
  • Mirjana Marjanovic
  • Jim Wingrove
  • Ursula Vitt

27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com