Detection of Natural Selection in the Human Genome by Hidden Markov Model PowerPoint PPT Presentation

presentation player overlay
1 / 23
About This Presentation
Transcript and Presenter's Notes

Title: Detection of Natural Selection in the Human Genome by Hidden Markov Model


1
Detection of Natural Selectionin the Human
Genome by Hidden Markov Model
Capstone Project.
  • Presenter Sang-Gook Han
  • Advisor Prof. Matthew Hahn

2
Contents
  • Objective
  • Background
  • Problems and Motivation
  • Algorithm
  • Procedure
  • Results
  • Acknowledgement

3
Objective
Mutation
A mutation is not neutral if it affects a
function.
Natural selection can be divided largely into two
types Positive selection favors new mutation
(allele) Negative selection disfavors new
mutation
Speciation
Mutation causes DNA Polymorphism
4
Objective
Single Nucleotide Polymorphism (SNP)
C(5)/G(7)
T(10)/G(2)
AAACTCATAGTCCGATTTCCCCGGGAACCCTA AAACTCATAGTCCGATT
TCCCCGGGAACCCTA AAACTCATAGTCCGATTTCCCCGGGAACCCTA A
AACTCATAGTCCCATTTCCCCGGGAACCCTA AAACTCATAGTCCCATTT
CCCCGGGAACCCTA AAACTCATAGTCCCATTTCCCCGGGAACCCTA AA
ACTCATAGTCCGATTTCCCCGGGAACCCTA AAACTCATAGTCCCATTTC
CCCGGGAACCCGA AAACTCATAGTCCCATTTCCCCGGGAACCCTA AAA
CTCATAGTCCGATTTCCCCGGGAACCCTA AAACTCATAGTCCGATTTCC
CCGGGAACCCGA AAACTCATAGTCCGATTTCCCCGGGAACCCTA
Polymorphism in a population provides a
characteristic of the genome
5
Objective
From polymorphism,
Understanding the types of selection can aid in
understanding human evolution and the human genome
Infer evolutionary processes in different regions
of the genome e.g. genes evolving adaptively in
humans
6
Background Derived Allele Frequency
  • SNP has two alleles.

C
Derived Allele four Gs
C
C/G
Infer natural selection from derived allele
frequency
7
Background Allele-Freq. Spectrum
  • One single frequency is not enough to infer
    natural selection type.
  • So, need allele-frequency spectrum ( frequency of
    many alleles)

Site 1 Site 2 Site 3 Site 4 Site 5 Site 6
IND 1 0 0 0 1 0 0
IND 2 0 0 1 1 0 0
IND 3 1 1 0 0 0 0
IND 4 0 0 0 0 0 1
IND 5 0 0 0 0 1 0
Negative Selection
1 derived allele, 0 ancestral allele
5 sites(site 1, 2, 3, 5, 6) have only one derived
allele 1 site (site 4) has two derived allele 0
site has three derived allele 0 site has four
derived allele
8
Background Allele-Freq. Spectrum
Y axis Number of sites X axis Number of
derived allele
Negative Low Genetic Variation
Neutral High Genetic Variation
Positive Low Genetic Variation
Distribution of alleles enables us to infer
natural selection.
9
Background the PRF model
  • Evolutionary theory gives predictions of allele
    frequencies when they are
  • Neutral ( no selection )
  • Positively selected
  • Negatively selected

? 0 Neutral ? gt 0 Positive ? lt 0 Negative
? natural selection intensity
10
Problems of the PRF model
  • Even though ? is estimated, the value is not
    exact. IT HAS VARIANCE.
  • Thus, we cannot rely on the estimate to determine
    positive or negative selection.
  • Conduct likelihood ratio test against null
    hypothesis of no selection

11
Motivation
  • Polymorphism in a genome can be described as a
    Markov random field.
  • So, neighboring regions tend to be predicted as
    the same natural selection type.
  • Hidden Markov Model (HMM) can improve detection
    of natural selection because of similarity among
    neighboring regions.
  • Transition probability and emission function can
    alleviate variance effect on determination of a
    selection type.

12
Algorithm
  • Hidden Markov Model

? gt 0 SP Strongly Positive ? confidence interval gt 0
? gt 0 WP Weakly Positive ? confidence interval include and -
NE Neutral LRT
? lt 0 WN Weakly Negative ? confidence interval include and -
? lt 0 SN Strongly Negative ? confidence interval lt 0
13
Procedure
Allele Frequency from unrelated parent on 60
Europeans (CEU), 60 Africans (YRI), 45 Chinese
(CHB)
Natural selection
Derived allele
Trained HMM
? estimation
Allele-frequency spectrum
14
Procedure HMM
? estimation on a sliding window from the PRF
model
? 0.2 0.4 1 1.1 1.2 3 4 5 10 11
11 9 8 5 3 3 2 1 0.5 0.3
0.2 0.1 0 -0.1
NE, ? 1.1
Build emission function, P(? S) for each state
and for each chromosome
  • Training
  • input Emission function and estimated ?s from
    the PRF
  • Baum-Welch algorithm.

15
Results HMM
State Transition Probabilities
Emission function
16
Results HMM
  • A snapshot of a region on chromosome 13 of
    CEU with annotation of natural selection before
    and after PRF-HMM

Population changes/total undetermined/ changes
European CEU 3.80 5.94
African YRI 4.11 1.74
Chinese CHB 7.31 5.69
17
Results Distribution of Natural Selection
A) Whole chromosome
B) coding region
Ratio of B to A
Negative (SN and WN) B gt A Positive (SP and WP)
B lt A Coding regions are under negative
selection
18
Results Distribution of Natural Selection
  • One SNP can be shared by more than one protein.

1 No share 3 Share with at least 3 proteins 6
Share with at least 6 proteins
19
Results Interesting genes in European
CSN3 Kappa-casein
ZDHHC12 Palmitoyltransferase
MCL1 myeloid leukemia cell differentiation
protein
20
Results Interesting genes in African
TP53BP1 Tumor suppressor p53-binding protein
GFER, MPV17, GNMT Liver disease related genes
21
Results Interesting genes in Chinese
SERF2 Gastric cancer-related protein
ZMYND10 Lung cancer-related protein
22
Conclusion
  • The PRF-HMM improves the PRF model and overcomes
    numerical problem
  • Differences of SP and SN among populations might
    be related to difference environment.
  • Negative cSNP gt chromsome, Positive cSNP lt
    chromsome. It implies that human genome is under
    negative selection.
  • The larger number of shares of SNP, The larger
    proportions of SN and WP and the smaller
    proportions of SP and WP.
  • Positive selection in a specific population with
    low genetic diversity on a gene related to a
    disease might appear weakness on the population.

23
Acknowledge
Thank Prof. Matthew Hahn, Prof. Haixu Tang and
people in Hahns lab for advice of this project.
???????
merci
??
grazias
grazie
??
danke
?????
??
e??a??st?e?
Write a Comment
User Comments (0)
About PowerShow.com