Comparative Genome Maps - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Comparative Genome Maps

Description:

Chromosome. Gene family 'key to understanding the human genome' ... Synteny: loci on the same chromosome. Colinearity: syntenic regions with conserved gene order ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 43

Provided by: vicgol

Learn more at: https://home.cs.colorado.edu

Category:

more less

Transcript and Presenter's Notes

Title: Comparative Genome Maps

1
Comparative Genome Maps

CSCI 7000-005 Computational Genomics
Debra Goldberg
debg_at_hms.harvard.edu

2
What is a comparative map?
3
Why construct comparative maps?

Identify isolate genes
Crops drought resistance, yield, nutrition...
Human disease genes, drug response,
Infer ancestral relationships
Discover principles of evolution
Chromosome
Gene family
key to understanding the human genome

4
Why automate?

Time consuming, laborious
Needs to be redone frequently
Codify a common set of principles
Nadeau and Sankoff warn of arbitrary nature of
comparative map construction

5
Definitions

Marker identifiable chromosomal locus
Homology genes with common ancester
Homeology chromosomal regions derived from a
common ancestral linkage group
Synteny loci on the same chromosome
Colinearity syntenic regions with conserved gene
order

6
Input/Output

Input
genetic maps of 2 species
marker/gene correspondences (homologs)
Output
a comparative map
homeologies identified

7
Map construction
Go from this
to this
Maize 1 (target), Rice (base) Wilson et al.
Genetics 1999
8
Chromosome labeling
Maize 1 (target), Rice (base) Wilson et al.
Genetics 1999
9
A natural model?
Maize 1 (target), Rice (base) Wilson et al.
Genetics 1999
10
Scoring
10L
3L
11
Assumptions

Accept published marker order
All linkage groups of base are unique
Simplistic homeology criteria
At least one homeologous region

12
A natural model?
13
A natural model?
14
A natural model?
15
A natural model?
16
Dynamic programming

li location of homolog to marker i
Si,a penalty (score) for an optimal labeling
of the submap from marker i to the end, when
labeling begins with label a

a 1 ... i ... n
17
Recurrence relation

Sn,a m ?(a, ln)Si,a m ?(a, li) min
(Si1,b s ?(a,b) )

a ... n ... ln
b?L
18
Problem with linear model

19
The stack model
d
f
e
c
c
b
b
b
a

Segment at top of the stack can be
pushed (remembered), later popped
replaced
Push and replace cost s -- pop is free.

20
Scoring
21
Dynamic programming

Si,j,a score for an optimal labeling of
submap from marker i to marker j
when labeling begins with label a -- i.e.,
marker i is labeled a

a 1 ... i ... j ... n
22
Recurrence relation

Si,i,a m ?(a, li)
Si,j,a min
m ?(a, li) min (Si1,j,b s ?(a,b) )
min Si,k,a Sk1,j,a

b?L
iltkltj
23
Results infers evolutionary events
Wilson et al.
Maize 1 (target) Rice (base)
24
Problem Incomplete input

Gene order not always fully resolved.
Co-located genes can be ordered to give most
parsimonious labeling.

25
The reordering algorithm

Uses a compression scheme
Within a megalocus, group genes by location of
related gene.
Order these groups
First, last groups interact with nearby genes
Any ordering of internal groups is equally
parsimonious

26
The reordering algorithm
27
The reordering algorithm
28
Definitions

? extended to distance to a set A of labels
0 if a ? A,
1 otherwise
S the set of indices of supernode start
elements
For simplicity, call supernode i ? S

?(a, A)
29
Definitions

For i ? S
ni markers in i
ni(a) markers in i with a homolog on a
li set of labels matching markers in i
li a ? L ni(a) ? 1,

30
Definitions

pi(c) gives mismatched marker and segment
boundary penalties for label c

31
Definitions

p(i,a,b) gives the total mismatched marker and
segment boundary penalties attributed to hidden
markers

? (pi(c)) m ?i (a,b) for i?S, a?b p(i,a,b)
? (m ni(c)) m ?i (a,b) for i?S,
ab 0 otherwise.
c ? a,b
c ? a
32
Definitions

For i ? S
? i(a,b) labels in a,b without matching
marker in i
? i(a,b) ?(a, li) ?(b, li)
? i(a,b) ? 0,1,2

33
Definitions

?i (a,b) corrects if mismatch marker penalties
assigned twice for same marker in the recurrence
and in p(i,a,b)
For example
?i (a,b) 0 if ? i(a,b) 0(if a, b are both
represented in supernode)
?i (a,a) -2 if ? i(a,a) gt 0(if a is not
represented in supernode)

34
Recurrence relation