Identifying%20conserved%20segments%20in%20rearranged%20and%20divergent%20genomes - PowerPoint PPT Presentation

About This Presentation
Title:

Identifying%20conserved%20segments%20in%20rearranged%20and%20divergent%20genomes

Description:

Comparing genomic architectures. Genome sequence and architecture comparison can lead to insight about organismal ... Rearrangement, gene gain, loss, and ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 27
Provided by: koad
Category:

less

Transcript and Presenter's Notes

Title: Identifying%20conserved%20segments%20in%20rearranged%20and%20divergent%20genomes


1
Identifying conserved segments in rearranged and
divergent genomes
  • Bob Mau, Aaron Darling, Nicole T. Perna
  • Presented by Aaron Darling

2
Comparing genomic architectures
  • Genome sequence and architecture comparison can
    lead to insight about organismal
  • Evolutionary forces
  • Gene functions
  • Phenotypes
  • Rearrangement, gene gain, loss, and duplication
    obfuscate homology

3
Structure of the bacterial chromosome
Breakpoints of inversions occur an equal distance
from the origin to maintain replichore
balance. (Tillier and Collins 2000, Ajana et. al.
2002) We call such rearrangements symmetric
inversions
Replication proceeds simultaneously on each
replichore
Replichore size difference gt 20 is selected
against (Guijo et. al. 2001)
4
A dot plot Each dot is a pairwise (or n-way)
local alignment
5
Blue Same strand Red Opposite strand
Goal Identify local homologous (orthologous)
segments
6
Tools for segmental homology detection
  • GRIMM-Synteny (Pevzner et. al. 2003, Bourque et.
    al. 2004)
  • - cluster markers within a fixed distance
  • FISH (Vision et. al. 2003)
  • find statistically over-represented
  • clusters of markers within a fixed distance
  • LineUp (Hampson et. al. 2003)
  • find collinear runs of markers among
  • pairs of genomes, allowing degeneracy
  • Some alignment tools
  • Shuffle-LAGAN (Brudno et. al. 2003),
  • Mauve (Darling et. al. 2004)

7
(No Transcript)
8
(No Transcript)
9
Small segments separated by lineage-specific
regions may not be detected by methods based
strictly on distance.
Key idea use a combination of conserved marker
order (collinearity) and alignment score
10
Finding conserved regions A pseudo-Gibbs
sampler method
  • Given A set of M monotypic markers M
  • Do Assign a posterior probability that any
    marker m ? M is part of a conserved region
  • Use MCMC methodology to sample the frequency of
  • each markers inclusion in high-scoring
    configurations.
  • Use frequency as an estimate of posterior
    probability

11
Finding conserved regions A pseudo-Gibbs
sampler method
  • Define a configuration X as a vector of length M
    of
  • binary random variables
  • e.g. X ( X1, X2, , XM )
  • A configuration value xj maps marker mj to either
  • signal (1) or noise (0)
  • e.g. x (0,1,0,0,1,1,,1,0)
  • There are 2M possible configurations
  • Run a Markov chain of length N over configuration
  • space (X1, X2, , XN)

12
Sample possible marker configurations
  • Start with a random initial configuration, THEN
  • Select a marker, sample whether it should be a 0
    or 1 based on the current configuration

wv is the score of marker v, xv is the
configuration value (0 or 1)
13
Transform LCB score to probability
  • The scale parameter c is used in tandem with the
    sigmoid to map a markers score to a probability

14
Sample a new value for xj
  • Set xj to 1 with probability given by the
    markers
  • score transformation
  • First allow the chain a burn-in period, then
  • continue for many iterations.
  • The frequency, or posterior probability of mj
    is

15
Our method assigns each marker a p.p.
  • Threshold ? separates signal from noise

16
Our method assigns each marker a p.p.
  • Using ? .5, the X pattern appears

17
Our method assigns each marker a p.p.
  • Using ? .5, the X pattern appears

18
Application to 4 divergent Streptococcus
  • Markers are reciprocal best blastp hits of ORFs
    among
  • S. agalactiae
  • S. pyogenes
  • S. pneumoniae
  • S. mutans

S. pneumoniae
19
What is the distribution of segment sizes in
Streptococci?
  • As resolution increases, large segments are
    broken up by
  • smaller segments

Total Segments
c 75, ? .45 Low resolution
26
c 30, ? .45 Medium resolution
32
c 20, ? .50 High-1 resolution
57
c 20, ? .30 High-2 resolution
72
Segment sizes (Markers per segment)
20
What was the ancestral genome organization?
  • Try building inversion phylogeny by applying
    GRIMM and MGR to the 57 high resolution segments

21
What was the ancestral genome organization?
  • Try building inversion phylogeny by applying
    GRIMM and MGR to the 57 high resolution segments
  • Failed The suggested rearrangements do not
    maintain replichore balance

22
What was the ancestral genome organization?
  • Try building inversion phylogeny by applying
    GRIMM and MGR to the 57 high resolution segments
  • Failed The suggested rearrangements do not
    maintain replichore balance
  • Try using the 26 larger, low resolution segments
  • Surprise! A success

23
Transforming S. agalactiae into S. pyogenes
24
Conclusions
  • - The pseudo-Gibbs sampler method detects
  • collinear segments at a variety of scales
  • - It would be nice to have an inversion phylogeny
  • inference tool that accounts for replichore
    balance!
  • - Large segments in Streptococci appear to
  • rearrange by symmetric inversions
  • - Small segments? An open problem.

25
Future directions
  • Can a biologically relevant full joint
    probability
  • distribution be expressed over configurations?
  • - If so, then a true Gibbs sampler could be
    employed
  • Problems
  • - Some rearrangements occur with different
    frequency (e.g. symmetric inversions about the
    terminus vs. IS-mediated translocation)
  • - Distinguish rearrangement by H.T., gene
    duplication and subsequent loss, symmetric
    inversion, etc.

26
Acknowledgements
  • Bob Mau did most of this work
  • My Ph.D. advisers
  • Nicole T. Perna and Mark Craven
  • Others who have contributed insight
  • Jeremy Glasner, Fred Blattner, Eric Cabot
  • GEL_at_UW-Madison
  • Grant . Money NIH Grant GM62994-02.
  • NLM Training Grant 5T15M007359-03 to A.E.D.
Write a Comment
User Comments (0)
About PowerShow.com