Linear Reduction for Haplotype Inference - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Linear Reduction for Haplotype Inference

Description:

Number of Views:20

Avg rating:3.0/5.0

Slides: 19

Provided by: Gan999

Learn more at: https://www.cs.gsu.edu

Category:

Tags: gamete | haplotype | inference | linear | reduction

Transcript and Presenter's Notes

Title: Linear Reduction for Haplotype Inference

1
Linear Reduction forHaplotype Inference

WABI 2004
2
Outline

3
Human Genome and SNP

4
Haplotype and Disease Association

Deafness inheritance ? moral problems
SNP contribute to risk factors of complex
diseases
having certain SNP increases 10 times chances of
having diabetes
but association is too fragile for doctors 3 ?
10-6 ? 30 ? 10-6
combinations of SNPs haplotypes are
responsible for diseases
International HapMap project http//www.hapmap.or
g
SNP maps are constructed across the human genome
with density of about one SNP per thousand
nucleotides.
HapMap tries to identify 1 million tag SNPs
providing almost as much mapping information as
entire 10 million SNPs
Unfortunately, not as much known about SNP
combinations

5
Haplotypes and Genotypes

Diploid organisms two different copies of
each chromosome recombined copies of parents
chromosomes
Too expensive to examine two versions of a
chromosome separately
Much cheaper to obtain genotype (mixed) data
rather than haplotype (separated) data
Haplotype description of single copy (0wild
type,1minor allele)
Genotype description of mixed two copies
(000, 111, 201)

WABI 2004
6
Haplotype Inference Problem

Haplotype Inference (HI) Problem
Given n genotype vectors (0, 1 or 2),
Find n pairs of haplotype vectors, one pair of
haplotypes per each genotype explaining genotypes
For individual genotype with h heterozygous sites
there are 2h-1 possible haplotype pairs
explaining this genotype
This is hopeless without genetic model
Parsimonious models ? minimize number of
haplotypes

WABI 2004
7
Computational Haplotype Inference Problem

WABI 2004
8
Reducing the Set of SNPs

Often many columns corresponding to SNP sites are
analogous one column can be obtained from
another by swapping 0s and 1s
One of such columns can be dropped same as for
two equal columns
What would be generalization?
If one site is dependent (or can be
reconstructed) from k other sites, then drop
this dependent site it does not carry any
useful additional information
General reduction method
Encoding reduce number of sites be removing
dependent sites
Infer site-reduced haplotypes for the
site-reduced genotypes using known haplotype
inference method
Decoding reconstruct dependent SNPs from sites
of reduced haplotypes
Main requirement to reduction method should be
fast

WABI 2004
9
Linear Dependence of SNPs

Consider linear dependence
To make analogous sites linearly dependent
change notations 0/1 ? -1/1
Also for genotypes 0/1/2 ? -1/1/0 and genotype is
half-sum of (linearly dependent from explaining
haplotypes)
Keep only linear independent SNP (tag SNPs)
all other SNP can be reconstructed using linear
combinations
Equivalent factorization problem find
representation
G IX H

WABI 2004
10
Factorization Problem

WABI 2004
11
Linear Encoding Algorithm
WABI 2004
12
Linear Decoding Algorithm
WABI 2004
13
Graph-Based Decoding

Extend haplotype graph Xr obtained from HI
algorithm to Xm for all m sites
Very often the graphs Xr and Xm are isomorphic,
but not always
Consider example
g1 (1, 0, 1) and g2 (0, -1, -1)
reduced set (1,0) and (0,-1)
The corresponding reduced haplotype graph has 3
vertices, while Xm has 4 vertices
The simple way is to split the vertices if we
find an error

WABI 2004
14
Handling Imperfect Phylogeny

The genotype data may have indications of
inconsistency with the perfect phylogeny model, 4
gamete rule violation
We could choose h independent columns without
such violation
Algorithm in greedy manner

WABI 2004
15
Experimental Results

In Table 1, Our Results show that the advantage
in runtime of Linearly Reduced DPPH grow fast
with testcase size and reaches factor of 60 for
largest instances.
In all testcases, if DPPH find unique solution,
so does the LR DPPH and the solution is
identical.
In Table 2 and 3, we can see the running time is
drastically reduced compared to the original
PHASE while the quality measured is not larger.
In Table 4 and 5, we can see same advantage by
using Linearly Reduced HAPLOTYPER instead
original HAPLOTYPER.
The last two data, we work on the real data from
the drosophila haplotypes and human chromosome.

WABI 2004
16
Experimental Results
WABI 2004
17
Experimental Results
WABI 2004
18
Conclusions and Future work

Our method significantly speed up popular
haplotype inference tools such as DPPH,
HAPLOTYPER and PHASE in all cases thus not
compromising the quality.
We ever reach 50 faster than DPPH.
Future work includes implement handling imperfect
phylogeny algorithm.
We are going to investigate an application of
suggested linear reduction to finding a small
number of representative sites sufficient to
distinguish all haploytpes

WABI 2004

Write a Comment

User Comments (0)