Picking Alignments from Steiner Trees - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Picking Alignments from Steiner Trees

Description:

subgraph G' G such that for any u,v S the length of ... Find the shortest 1-spanner connecting reds to blues. Generalizes the Manhattan network problem ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 29
Provided by: hann90
Category:

less

Transcript and Presenter's Notes

Title: Picking Alignments from Steiner Trees


1
Picking Alignments from (Steiner) Trees
Lior Pachter
Fumei Lam
Marina Alexandersson
2
Alignment
ATCG--G A-CGTCA
biologically meaningful
Steiner Networks
Pair Hidden Markov Models
fast alignments based on HMM structure
3
Some basic definitions Let G be a graph and S ?
V(G). A k-spanner for S is a subgraph G ? G
such that for any u,v ? S the length of the
shortest path between u,v in G is at most k
times the distance between u and v in G. Let
V(G)R2 and E(G)horizontal and vertical line
segments. A Manhattan network is a 1-spanner for
a set S of points in R2. Vertices in the
Manhattan network that are not in S are called
Steiner points
4
Example
S red points
5
Gudmundsson-Levcopoulos-Narasimhan 2001 Find
the shortest Manhattan network connecting the
points
4-approximation in O(n3) and 8-approximation in
O(nlogn)
6
Gudmundsson-Levcopoulos-Narasimhan 2001 proof
outline 1. it suffices to work on the Hanan grid
7
Gudmundsson-Levcopoulos-Narasimhan 2001 proof
outline 2. Construct local slides (for all four
orientations)
slide
A(v) uv is the topmost node below and
to the left of u
v
8
Gudmundsson-Levcopoulos-Narasimhan 2001 proof
outline 3. Solve each slide
The minimum slide arborescense problem
Lingas-Pinter-Rivest-Shamir 1982
O(n3) optimal solution using dynamic programming
9
Gudmundsson-Levcopoulos-Narasimhan 2001 proof
outline 4. Proof of correctness
b
v
a
u
10
What is an alignment?
ATCG--GACATTACC-AC AC-GTCA-GATTA-CAAC
11
Pair HMMs
Simple sequence-alignment PHMM
M (mis)match X insert seq1 Y insert seq2
12
Pair HMMs
transition probabilities
Hidden sequence
M
M
X
Y
M
Y
M
output probabilities
13
Using the Pair HMM
In practice, we have observed sequence
ATCGG ACGTCA
for which we wish to infer the underlying hidden
states
One solution among all possible sequences of
hidden states, determine the most likely (Viterbi
algorithm).
14
Viterbi in PHMM
Needleman Wunsch
Match prob pm Mismatch prob pr
Gap prob pg
Match score log(pm) Mismatch score log(pr) Gap
score log(pg)
15
Want to take into account that the sequences are
genomic sequences
Example a pair of syntenic genomic regions
16
PHMM
Y
X
17
  • A property of single sequence states is
  • that all paths in the Viterbi graph between
  • two vertices have the same weight

18
Strategy for Alignment
G
A
T
G
GATTACATTGATCAGACAGGTGAAGA
19
The CD4 region
50000
mouse
0
human
50000
0
20
5
3
Splice site GGTGAG
Splice site CAG
Stop codon TAG/TGA/TAA
Branchpoint CTGAC
Translation Initiation ATG
21
Suggests a new Steiner problem Find the shortest
1-spanner connecting reds to blues
22
Generalizes the Manhattan network problem (all
points red and blue) Generalizes the Rectilinear
Steiner Arborescence problem
23
History of the Rectilinear Steiner Arborescence
Problem
1985, Trubin - polynomial time algorithm
1992, Rao-Sadayappan-Hwang-Shor - error in Trubin
2000, Shi and Su - NP complete!
24
Results for unlabeled problem
  • An O(n3) 2-approximation algorithm (implemented)
  • An O(nlogn) 4-approximation algorithm
  • Testing on CD4 region in human/mouse
  • Implementation ( SLIM )
  • http//bio.math.berkeley.edu/slim/
  • SLIM for SLAM (in progress)
  • http//bio.math.berkeley.edu/slam/

25
(No Transcript)
26
The Viterbi graph for a more complicated
alignment PHMM
27
Comparison and Analysis of Performance
  • Our method has two main steps (Llength of seqs,
    nHSP)
  • Building the network O(n3) or O(nlogn)
  • Running the Viterbi algorithm O(nL) worst
    case
  • for the HMM on the network
  • Banding algorithms are O(L2) worst case for
    step 2.
  • Chaining algorithms are O(n2) in the case where
    gap
  • penalties can depend on the sequences.
  • These strategies do not generalize well for more
  • sophisticated HMMs.

28
Summary
Software
SLIM (network build) http//bio.math.berkeley.e
du/slim/ SLAM (alignment)
http//bio.math.berkeley.ed/slam/
Thanks Nick Bray and Simon Cawley
Write a Comment
User Comments (0)
About PowerShow.com