Pseudogenes contd Evolution and motif finding - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Pseudogenes contd Evolution and motif finding

Description:

protein domain encoding : Pdom(b) Terminology ... domain-encoding evolution (Pdom) : profile HMM match state emission probabilities ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 28
Provided by: Saurab5
Category:

less

Transcript and Presenter's Notes

Title: Pseudogenes contd Evolution and motif finding


1
Pseudogenes (contd) Evolution and motif finding
  • Lecture 11/9

2
Quiz Answers
3
Pseudogenes Coin Durbin
  • Novel methods for separating pseudogenes from
    functional genes
  • Unprocessed genes result of gene duplication,
    and loss of function of one copy
  • Processed pseudogenes due to reverse
    transcription of processed mRNA
  • lack introns

4
Pseudogene loss
  • Pseudogene dies very quickly, therefore expect
    few pseudogenes in genome
  • Prokaryotes have few pseudogenes
  • Eukaryotes have many pseudogenes
  • 20,000 human pseudogenes

5
Pseudogene detection
  • Detect truncations in genes
  • Ratio of synonymous to non-synonymous
    substitution rate
  • Approach in this paper
  • Pattern of substitution in conserved protein
    domains
  • Profile HMMs to model protein domains

6
Program PSILC
  • Given an alignment A, an unrooted tree T, profile
    HMM D representing a protein domain aligned to A
  • Output for each leaf-node n, a score
    representing our belief that the node is a
    pseudogene
  • Assume that the rest of the tree evolves as the
    protein domain would

7
Two scores
  • Final branch to node n evolved as neutral
    (non-coding) OR as a protein domain
  • Final branch to node n evolved as protein-coding
    OR as a protein domain
  • Log odds ratio
  • If a node is a pseudogene, it does not have the
    protein domain constraint, so both scores should
    be higher than usual

8
Terminology
  • A alignment
  • T Tree
  • Xn Row n, i.e., sequence at node n
  • Xi, Column i, i.e., ith position of all
  • Fig 4 (whiteboard)

9
Terminology
  • Probability that evolution on branch b in the
    tree is due to
  • neutral DNA Pnuc(b)
  • protein-coding Pprot(b)
  • protein domain encoding Pdom(b)

10
Terminology
  • Cnuc Pnuc(bn), Pdom(T\bn) neutral Dna on
    bn, otherwise domain encoding
  • Cprot Pprot(bn), Pdom(T\bn) protein-coding
    on bn, otherwise domain encoding
  • Cdom Pdom(T) Pdom(bn), Pdom(T\bn) domain
    encoding on all T

11
Scores
  • PSILCnuc/dom(n)
  • PSILCprot/dom(n)
  • Each computed in a manner similar to
    Felsensteins algorithm
  • Fig 5 (whiteboard)

12
Likelihood calculation
  • Compute prob. distr. at parent node pn given the
    entire tree T, except node n (assume
    domain-encoding evolution)
  • Compute probability of parent pn mutating to leaf
    n, given whatever evolutionary constraint Ck

13
First step Rest of the tree
  • Reroot the tree at parent pn and remove branch to
    node n. New tree is T\bn.
  • Fig 6 (whiteboard)
  • Product of two terms
  • Probability of leaves of tree T\bn given root
  • Felsensteins algorithm
  • Prior probability of root of T\bn
  • Use equilibrium distribution

14
Second step (and part of first)The branch
mutation model
  • P(xchild,ixparent,i,bchild,Pk(bchild))
  • Phylogenetic models available
  • neutral Dna evolution (Pnuc) HKY model
  • protein-coding evolution (Pprot) WAG model
  • domain-encoding evolution (Pdom) profile HMM
    match state emission probabilities
  • These give us the rate matrix Q
  • Pk(t) exp(Qrt)
  • Free rate parameter r

15
Tests
  • On human, mouse, rat data
  • Pprot/dom outperforms all others, including
    Pnuc/dom

16
PhyloGibbs
  • Motif finding combined with model of evolution
  • Neutral evolution selection operating on
    binding sites
  • Gibbs sampling approach

17
Basics placing motif windows
ttttCGTGAT-GCGTCGtttttttttt gagaCGTGATcGCGTCGagaat
atata cccc-------------CCAAGATCAGAccc aaata-------
-----CCAACATCAGAaaa
Multiple alignment from DIALIGN Vertically
aligned caps are evolutionarily related bases.
3 legal windows, 1 illegal window All possible
legal windows identified in preprocessing.
18
Motif windows
  • A legal window does not have gaps
  • A legal window can span one species or multiple
    species
  • If spanning multiple species, the sites in the
    window are assumed to be evolutionarily related
    binding sites

19
Sampling probability
  • For a single-species motif window, sampling
    probability is
  • For a window spanning multiple species, sampling
    probability is

20
Evolutionary model
  • is given by evolutionary model

21
Evolutionary model
  • Evolving binding site must bind the same protein
  • Pr (s1,s2 W) ?a Pr(a W) ?i Pr (si a, W, t)
  • Can be generalized to more than two species
    (recursively)

22
Configurations
  • Given a set of sequences, a configuration is a
    set of (legally placed) motif window
  • AND a color assigned to each window
  • Could have multiple windows assigned with the
    same color
  • Different colors represent different
    transcription factors
  • Goal is to find multiple types of motifs
    together, by using different colors
  • Color 0 is background (random) sequence

23
Sampling
  • The algorithm samples configurations from their
    posterior probability distribution
  • P(SC) is the probability that all windows of
    color C were sampled from the same PWM, though
    that PWM is unknown
  • Integration of all PWMs

24
Sampling
  • We have to sample configuration C from the
    posterior distribution P(CS)
  • Monte Carlo Markov Chain (MCMC)
  • Take the current configuration C and move
    probabilistically to a new configuration C such
    that detailed balance holds
  • P(CS)P(C --gt C) P(CS)P(C --gt C)

25
Move set
  • What kinds of moves are allowed ?
  • The move set needs to be ergodic, i.e., every
    possible configuration C must be reachable by
    repeated moves
  • Moves are shift one or more colored windows,
    or add or remove a colored window, or recolor a
    randomly chosen window from current configuration
  • Window shift move resample the position of a
    window,

26
Anneal and track
  • After many moves, each configuration will be
    sampled as per P(CS)
  • What to report ?
  • Simulated annealing to find maximum likelihood
    configuration C
  • sample from P(CS)?
  • slowly increase ? over time

27
Tracking
  • Fix the reference configuration C
  • Continue sampling from P(CS), and for each
    window w in C, track the probability p(w,c) that
    window w belongs to the same motif (color) as the
    windows in color c of reference configuration C
  • Whiteboard explanation
Write a Comment
User Comments (0)
About PowerShow.com